-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get attribute of PCA object and custom predict function #77
Comments
Everything seems possible/doable. If the pipeline is simplified a bit, then it should be possible to implement custom PMML converters for
It's technically difficult to transfer class CalcDist(PCA):
def transform(self, X):
dist = X * X / self.singular_values_
return dist Or, if you don't want to subclass This class could be then renamed to something like
Please excuse my ignorance, but how should the output of I'm asking this, because I'd like to better understand how to encode the |
Hi Villu, Thank you for your reply. Here I am trying to implement an algorithm brought by this paper. Technically speaking,
If connected with a Does this make sense? |
Thanks for the reference - now I can relate to your idea more closely. In principle, "PCC" stands for "Principal Component Classifier". The first outlier category ("q") represents instances that are are outliers with respect to one or more of the original variables. The second outlier category ("r") represents instances that are inconsistent with the correlation structure of the data, but are not outliers with respect to the original variables. The PCC would be a regression-type model, because it outputs two numeric scores. Do you know "q" and "r" threshold values at the time when training and exporting the model? If so, then we could turn PCC into a classification-type model, which would output two booleans instead (eg. "is_outlier(q)" and "is_outlier(r)"). Anyway, from the API perspective, all this logic could be captured into one Scikit-Learn class: class PCC(RegressorMixin):
def __init__(n, q, r):
self.pca_ = PCA(n_components = n)
self.q = q
self.r = r
def fit(X, y):
self.pca_.fit(X)
def predict(X):
dist = X * X / self.pca_.singular_values_
major_comp = np.sum(dist[:, range(self.q)], axis=1)
minor_comp = np.sum(dist[:, range(self.r, X.shape[1])], axis=1)
return np.dstack((major_comp, minor_comp)) The above code violates some Scikit-Learn's API conventions, because the method I want to encapsulate everything (PCA fitting, and distance calculation) into one Python class, because this way my PMML converter can see and analyze all information together, and generate the most compact and efficient PMML representation possible. For example, I've got a feeling, that PCC prediction logic can be mapped directly to |
Using the above "all-in-one" PCC class, then the pipeline would be simplified to the following: pipeline = PMMLPipeline([
("mapper", DataFrameMapper([
(df_X.columns.values, [ContinuousDomain(), StandardScaler()])
])),
("pcc", PCC(n, q, r))
]) |
Hi Villu,
I am building an anomaly detection classifier based on PCA.
I need to
I would like to use PCC as my classifier. My PMMLpipeline would be:
Or PCC can be moved to mapper and connect with a DecisionTreeClassifier():
Would any of this be possible?
Thanks,
Bohan
The text was updated successfully, but these errors were encountered: