Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

knn.fit(training) throws an exception #32

Open
akshaybhatt14495 opened this issue Jan 9, 2018 · 10 comments
Open

knn.fit(training) throws an exception #32

akshaybhatt14495 opened this issue Jan 9, 2018 · 10 comments

Comments

@akshaybhatt14495
Copy link

akshaybhatt14495 commented Jan 9, 2018

followed whatever was there
val training = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt").toDF()
val knn = new KNNClassifier()
.setTopTreeSize(training.count().toInt / 500)
.setK(10)
1st error : TopTreeSize is invalid 0 (since total count of training sample is 100)
let say we set manually TreeSize as 1
then it throws an exception while running knn.fit(training)

java.util.NoSuchElementException: Failed to find a default value for inputCols
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:651)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:658)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at org.apache.spark.ml.knn.KNN.fit(KNN.scala:383)

@kaushikacharya
Copy link
Contributor

kaushikacharya commented Jan 9, 2018 via email

@akshaybhatt14495
Copy link
Author

akshaybhatt14495 commented Jan 10, 2018

@kaushikacharya thanks for response, actually i need k nearest neighbors (KNN) , so for that do we need classification in dataset (i.e. first entry in each case as 0 or 1)??

@akshaybhatt14495
Copy link
Author

@kaushikacharya i'm talking about KNN.scala

@akshaybhatt14495
Copy link
Author

Got another error in command knn.fit(training)

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually org.apache.spark.mllib.linalg.VectorUDT@f71b0bce.
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
at org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:51)
at org.apache.spark.ml.classification.Classifier.org$apache$spark$ml$classification$ClassifierParams$$super$validateAndTransformSchema(Classifier.scala:58)

@kaushikacharya
Copy link
Contributor

kaushikacharya commented Jan 10, 2018 via email

@akshaybhatt14495
Copy link
Author

@kaushikacharya spark version is 2.2.0

@kaushikacharya
Copy link
Contributor

kaushikacharya commented Jan 10, 2018

Have a look at
https://github.com/saurfang/spark-knn/blob/master/project/Dependencies.scala
val sparktest = "org.apache.spark" %% "spark-core" % "2.1.0" % "test" classifier "tests"

Also in build.sbt you can see commonSettings which is defined in Common.scala
This mentions: sparkVersion := "2.1.0",

My understanding is that this repository is updated for spark 2.1.0
You using 2.2.0 could be the reason for the errors which you are facing.

@akshaybhatt14495
Copy link
Author

i changed my version and now working with spark 2.1.0, then also got same error,

@akshaybhatt14495
Copy link
Author

Ok, i used MLUtils function convertVectorColumnsFromML(training, "features")
so then got new error for sample data given in sample_libsvm_data.txt

java.lang.IllegalArgumentException: requirement failed: Sampling fraction (1.01) must be on interval [0, 1]
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.util.random.BernoulliSampler.(RandomSampler.scala:147)
at org.apache.spark.rdd.RDD$$anonfun$sample$2.apply(RDD.scala:496)
at org.apache.spark.rdd.RDD$$anonfun$sample$2.apply(RDD.scala:491)

@kaushikacharya
Copy link
Contributor

kaushikacharya commented Jan 11, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants