-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
knn.fit(training) throws an exception #32
Comments
Hi,
If you at the example:
https://github.com/saurfang/spark-knn/blob/master/spark-knn-examples/src/main/scala/com/github/saurfang/spark/ml/knn/examples/MNIST.scala
For KNNClassifier object it sets the two column names i.e. features,
prediction
.setFeaturesCol("pcaFeatures")
.setPredictionCol("predicted")
These seems to be missing in your case.
…On Tue, Jan 9, 2018 at 6:35 PM, akshaybhatt14495 ***@***.***> wrote:
followed whatever was there
val training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_
data.txt").toDF()
val knn = new KNNClassifier()
.setTopTreeSize(training.count().toInt / 500)
.setK(10)
TopTreeSize is invalid 0 (since total count of training sample is 100)
let say we set manually TreeSize as 1
then it throws an exception while running knn.fit(training)
java.util.NoSuchElementException: Failed to find a default value for
inputCols
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$
2.apply(params.scala:652)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$
2.apply(params.scala:652)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:651)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:658)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at org.apache.spark.ml.knn.KNN.fit(KNN.scala:383)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#32>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEWfs5XmsuVtOzeSTxLB34e5uUlos32sks5tI2QDgaJpZM4RXy9_>
.
|
@kaushikacharya thanks for response, actually i need k nearest neighbors (KNN) , so for that do we need classification in dataset (i.e. first entry in each case as 0 or 1)?? |
@kaushikacharya i'm talking about KNN.scala |
Got another error in command knn.fit(training) Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce. |
Which spark version are you using?
These might be helpful for resolving the ml vs mllib error:
https://stackoverflow.com/questions/38901123/how-convert-ml-vectorudt-features-from-mllib-to-ml-type
https://spark.apache.org/docs/2.1.0/ml-migration-guides.html
"While most pipeline components support backward compatibility for loading,
some existing DataFrames and pipelines in Spark versions prior to 2.0, that
contain vector or matrix columns, may need to be migrated to the new
spark.ml vector and matrix types. Utilities for converting DataFrame columns
from spark.mllib.linalg to spark.ml.linalg types (and vice versa) can be
found in spark.mllib.util.MLUtils."
…On Wed, Jan 10, 2018 at 11:42 AM, akshaybhatt14495 ***@***.*** > wrote:
Got another error in command knn.fit(training)
Exception in thread "main" java.lang.IllegalArgumentException:
requirement failed: Column features must be of type
***@***.*** but was actually
***@***.***
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(
SchemaUtils.scala:42)
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(
Predictor.scala:51)
at org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$
classification$ClassifierParams$$super$validateAndTransformSchema(
Classifier.scala:58)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEWfsza37M_ilA73w7wDmrhCp4Zj3sOBks5tJFSwgaJpZM4RXy9_>
.
|
@kaushikacharya spark version is 2.2.0 |
Have a look at Also in build.sbt you can see commonSettings which is defined in Common.scala My understanding is that this repository is updated for spark 2.1.0 |
i changed my version and now working with spark 2.1.0, then also got same error, |
Ok, i used MLUtils function convertVectorColumnsFromML(training, "features") java.lang.IllegalArgumentException: requirement failed: Sampling fraction (1.01) must be on interval [0, 1] |
You are facing the same issue as:
#21
Your error says that:
Sampling fraction (1.01) must be on interval [0, 1]
sampling fraction needs to be <= 1
I would suggest first try running on mnist data (mnist.bz2) from
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/
Put this data in your data folder and run the mnist scala example.
…On Thu, Jan 11, 2018 at 10:43 AM, akshaybhatt14495 ***@***.*** > wrote:
Ok, i used MLUtils function convertVectorColumnsFromML(training,
"features")
so then got new error for sample data given in sample_libsvm_data.txt
java.lang.IllegalArgumentException: requirement failed: Sampling fraction
(1.01) must be on interval [0, 1]
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.util.random.BernoulliSampler.(RandomSampler.scala:147)
at org.apache.spark.rdd.RDD$$anonfun$sample$2.apply(RDD.scala:496)
at org.apache.spark.rdd.RDD$$anonfun$sample$2.apply(RDD.scala:491)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEWfsxNAhZFMh4jkQg9lKEJQprHF742Xks5tJZiUgaJpZM4RXy9_>
.
|
followed whatever was there
val training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF()
val knn = new KNNClassifier()
.setTopTreeSize(training.count().toInt / 500)
.setK(10)
1st error : TopTreeSize is invalid 0 (since total count of training sample is 100)
let say we set manually TreeSize as 1
then it throws an exception while running knn.fit(training)
java.util.NoSuchElementException: Failed to find a default value for inputCols
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:651)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:658)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at org.apache.spark.ml.knn.KNN.fit(KNN.scala:383)
The text was updated successfully, but these errors were encountered: