You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just faced the issue and the reason was that the number of points (defaults to 1000) was higher than the number of records in the training dataset. Perhaps obvious for ML practitioners, but I spent few minutes debugging to nail it down.
It'd be nice to know it before fitting a model or get a more user-friendly error message.
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Sampling fraction (333.3333333333333) must be on interval [0, 1]
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.util.random.BernoulliSampler.<init>(RandomSampler.scala:148)
at org.apache.spark.rdd.RDD$$anonfun$sample$2.apply(RDD.scala:495)
at org.apache.spark.rdd.RDD$$anonfun$sample$2.apply(RDD.scala:490)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.sample(RDD.scala:490)
at org.apache.spark.ml.knn.KNN.fit(KNN.scala:387)
The text was updated successfully, but these errors were encountered:
jaceklaskowski
changed the title
Check if the number of points to sample for top-level tree is greater than the number of records in training dataset
Check if the number of points to sample for top-level tree is less than the number of records in training dataset
Feb 27, 2017
Just faced the issue and the reason was that the number of points (defaults to
1000
) was higher than the number of records in the training dataset. Perhaps obvious for ML practitioners, but I spent few minutes debugging to nail it down.It'd be nice to know it before fitting a model or get a more user-friendly error message.
The text was updated successfully, but these errors were encountered: