Add support for 'Normalizer' tranformer #56

rodrigojimenezdiego · 2019-01-16T08:32:30Z

[More of an inquiry than a proper issue but I searched for prior issues/comments about this and did not find one, so I raise the issue so the reply is available for others.]

From the documentation, transformer 'org.apache.spark.ml.feature.Normalizer' is not currently supported and the API complains when trying to convert pipelines that contains said transformation.

We'd like to know a bit more about whether there is any particular reason for this transformation not being supported, and if there are plans to support it in the future.

Keep up the great work! Yours is an invaluable contribution to the industry.

vruusmann · 2019-01-16T09:10:15Z

We'd like to know a bit more about whether there is any particular reason for this transformation not being supported

There's a conceptual mismatch between the PMML representation and Apache Spark/Scikit-Learn representations:

PMML: High-level, treats features individually, features as scalars
Apache Spark/Scikit-Learn: Low-level, treats features collectively, collections of features as vectors

The Normalizer transformer is a prime example of a low-level Transformation that operates on a collection of features. When mapped to a higher-level representation (such as PMML, or any other human readable explanation), then it needs to be broken down into elementary feature-oriented operations. Unfortunately, this often results in a markup that is computationally not very efficient during deployment time.

vruusmann · 2019-01-16T09:43:51Z

There's a conceptual mismatch between the PMML representation and Apache Spark/Scikit-Learn representations

To elaborate some more:

PMML: Optimized for model interpretation and deployment. Long term.
Apache Spark/Scikit-Learn: Optimized for model training. Short term.

Also, consider the one-hot-encoding of categorical features for model training. Objectively, this is a stupid thing to do, but is very much needed in the current state of Apache Spark/Scikit-Learn, because they can't handle categorical features directly. PMML can, and doesn't need one-hot-encoding.

rodrigojimenezdiego · 2019-01-17T09:25:55Z

Many thanks for your explanation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for 'Normalizer' tranformer #56

Add support for 'Normalizer' tranformer #56

rodrigojimenezdiego commented Jan 16, 2019

vruusmann commented Jan 16, 2019

vruusmann commented Jan 16, 2019

rodrigojimenezdiego commented Jan 17, 2019

Add support for 'Normalizer' tranformer #56

Add support for 'Normalizer' tranformer #56

Comments

rodrigojimenezdiego commented Jan 16, 2019

vruusmann commented Jan 16, 2019

vruusmann commented Jan 16, 2019

rodrigojimenezdiego commented Jan 17, 2019