-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support transformed labels #35
Comments
The JPMML-SparkML library assumes that the label column of classification models is a "native" categorical label (in PMML, corresponds to a I've been taking it granted, and forgot to actually implement this "native" vs "transformed" check around It's possible to make your example work, by applying the binarizer = Binarizer(threshold=15.0, inputCol="DepDelay_Double", outputCol="DepDelay_Bin")
data2007 = binarizer.transform(data2007) # THIS!
stringIndexer = StringIndexer(inputCol="DepDelay_Bin", outputCol="DepDelay_Bin_Label") # THIS!
featuresAssembler = VectorAssembler(inputCols=["Month", "CRSDepTime", "Distance"], outputCol="features")
rfc3 = RandomForestClassifier(labelCol="DepDelay_Bin_Label", featuresCol="features", numTrees=3, maxDepth=5, seed=10305)
pipelineRF3 = Pipeline(stages=[stringIndexer, featuresAssembler, rfc3]) # THIS: start the pipeline with StringIndexer not Binarizer
model3 = pipelineRF3.fit(data2007)
from jpmml_sparkml import toPMMLBytes
pmmlBytes = toPMMLBytes(sc, data2007, model3)
print(pmmlBytes.decode("UTF-8")) |
Technically, it shouldn't be much work to make JPMML-SparkML work with "transformed" labels, so keeping this issue open to track progress towards this functionality. |
Looks like it can be closed for current version:
|
Nope, I'd like to be able to use |
Can someone help me with this error: AttributeError: 'Pipeline' object has no attribute '_transfer_param_map_to_java' error. I get it when i try to execute the PMMLBuilder()
I cannot find any fix to this what I am doing wrong ? |
This is clearly a low-level PySpark error, which has got nothing to do with PySpark2PMML or JPMML-SparkML. Maybe your PySpark and Apache Spark versions are out of sync. |
@vruusmann Thank you. My PySpark and Apache versions are up to date. The problem was you must pass the pipeline's bestmodel in my case cvModel.bestModel do the work. |
@vruusmann Sorry for the off-topic i will delete the question but now i run into another issue when i try to buildFile from the pmmlBuilder object it says format(target_id, ".", name), value) |
Running Spark 2.1.2, using jpmml-sparkml 1.2.7.
While attempting to run the following pyspark in order to convert a simple pipeline with a
RandomForestClassifer
model with eithertoPMMLByteArray
ortoPMML
, I'm receiving the a NullPointerException.Following #22 I attempted to use the different Indexers on features and label columns to try and hint that these are categorical, but this resulted in the same error. Further, when I print the final tree, I do not see categorical feature declarations.
Dataset used, and tree output attached.
2007_short.zip
rfc.txt
The text was updated successfully, but these errors were encountered: