Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expression translator should support multi-dimensional array indexing syntax #15

Open
AbdealiLoKo opened this issue Nov 25, 2021 · 3 comments

Comments

@AbdealiLoKo
Copy link

Hi, I have a scanrio where I need to use an array as a input column to my pipeline.
I'd reduced a minimal example of the issue I'm having:

import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn2pmml.preprocessing import ExpressionTransformer

df = pd.DataFrame({'c1': [1, 2, 3], 'c2': [[1,2], [1,2], [3,1]]})

pipeline = make_pipeline(
    ColumnTransformer(
        transformers=[
            (f'get_item_0_from_c2_array', ExpressionTransformer('X["c2"][0]'), ['c2'])
        ]
    ),
    LogisticRegression(),
)
pipeline.fit(df, [0, 0, 1])
pipeline.predict(df)

The above pipeline works fine in my jupyter notebook. But converting it to a PMML gives an error:

import sklearn2pmml

pmml_pipeline = sklearn2pmml.PMMLPipeline(steps=[
    ('pipeline',pipeline)
])

sklearn2pmml.sklearn2pmml(pmml_pipeline, './pipeline.pmml', debug=True)

Gives the error:

java.lang.IllegalArgumentException: Python expression 'X["c2"][0]' is either invalid or not supported
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:36)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:23)
	at sklearn2pmml.preprocessing.ExpressionTransformer.encodeFeatures(ExpressionTransformer.java:51)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.compose.ColumnTransformer.encodeFeatures(ColumnTransformer.java:63)
	at sklearn.Transformer.encode(Transformer.java:70)
	at sklearn.Composite.encodeFeatures(Composite.java:119)
	at sklearn.Composite.encodeModel(Composite.java:135)
	at sklearn.pipeline.PipelineClassifier.encodeModel(PipelineClassifier.java:86)
	at sklearn.Estimator.encode(Estimator.java:103)
	at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
	at org.jpmml.sklearn.Main.run(Main.java:217)
	at org.jpmml.sklearn.Main.main(Main.java:143)
Caused by: org.jpmml.python.ParseException: Encountered unexpected token: "]" "]"
    at line 1, column 10.

Was expecting one of:

    ":"

	at org.jpmml.python.ExpressionTranslator.generateParseException(ExpressionTranslator.java:2110)
	at org.jpmml.python.ExpressionTranslator.jj_consume_token(ExpressionTranslator.java:1973)
	at org.jpmml.python.ExpressionTranslator.StringSlicingExpression(ExpressionTranslator.java:956)
	at org.jpmml.python.ExpressionTranslator.PrimaryExpression(ExpressionTranslator.java:637)
	at org.jpmml.python.ExpressionTranslator.UnaryExpression(ExpressionTranslator.java:597)
	at org.jpmml.python.ExpressionTranslator.MultiplicativeExpression(ExpressionTranslator.java:538)
	at org.jpmml.python.ExpressionTranslator.AdditiveExpression(ExpressionTranslator.java:494)
	at org.jpmml.python.ExpressionTranslator.ComparisonExpression(ExpressionTranslator.java:434)
	at org.jpmml.python.ExpressionTranslator.NegationExpression(ExpressionTranslator.java:389)
	at org.jpmml.python.ExpressionTranslator.LogicalAndExpression(ExpressionTranslator.java:359)
	at org.jpmml.python.ExpressionTranslator.LogicalOrExpression(ExpressionTranslator.java:338)
	at org.jpmml.python.ExpressionTranslator.IfElseExpression(ExpressionTranslator.java:319)
	at org.jpmml.python.ExpressionTranslator.Expression(ExpressionTranslator.java:312)
	at org.jpmml.python.ExpressionTranslator.translateExpressionInternal(ExpressionTranslator.java:306)
	at org.jpmml.python.ExpressionTranslator.translate(ExpressionTranslator.java:34)
	... 12 more
@vruusmann vruusmann transferred this issue from jpmml/jpmml-sklearn Nov 26, 2021
@vruusmann
Copy link
Member

vruusmann commented Nov 26, 2021

Moved this issue to its rightful project (the stack trace originates from the org.jpmml.python package).

In short, the org.jpmml.python.ExpressionTranslator component supports one-dimensional array indexing syntax (eg. X[$first_dim], but it does not support two- or higher-dimensional array indexing syntax (eg. X{$first_dim][$second_dim]).

This is pretty much "by design", because the PMML language deals with scalar-type values, not collection- or array-type values.

The one-dimensional array indexing syntax is supported, because JPMML converters keep track of data frame columns automatically.

@vruusmann
Copy link
Member

vruusmann commented Nov 26, 2021

I'm not closing this feature request outright, because multi-dimensional array indexing support is foreseeable on longer timeframes (relevant both in JPMML-SkLearn and JPMML-SparkML projects).

The main requirement is that JPMML converters need to be supplied information about "extra dimensions" first.

For example, in case of SkLearn2PMML/JPMML-SkLearn this information could be conveyed in the form of a sklearn2pmml.decoration.ArrayDomain decorator class. When the JPMML-SkLearn converter sees this pipeline step, then it updates the base feature definition accordingly. Next to ArrayDomain (2D support) there could be MatrixDomain for higher-dimensionality problems.

Something like this:

transformer = make_pipeline([
  ("decorator", ArrayDomain(second_axis = [..]),
  ("row_extractor", ExpressionTransformer("X[:][1]")
])

@vruusmann vruusmann changed the title Using an array as an input in a dataframe expression error Expression translator should support multi-dimensional array indexing syntax Nov 26, 2021
@vruusmann
Copy link
Member

vruusmann commented Nov 26, 2021

For starters, the JPMML-Converter project needs to define a specialized feature class (that the JPMML-Python expression translator component could use in this particular scenario).

Something like org.jpmml.converter.ArrayFeature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants