You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part our CICD pipeline, we have a daily build that runs on relatively small amounts of data. As part of this, we discovered an interesting bug; as part of the method estimateTau, there is the following line:
valy=DenseVector(estimators.map { case (_, d) => math.log(d) })
In this case, d is the average distance between points. We are finding that on the small data used in our daily build, beta can exceed 0. When this happens, yMax, which is defined as:
valyMax= breeze.linalg.max(y)
is below negative one, and subsequently used as the bufferSize.
Specifically, the following appears in the log:
ERROR KNN: Unable to estimate Tau with positive beta: 0.1577160047542901. This maybe because data is too small.
Setting to -1.3153582722102333 which is the maximum average distance we found in the sample.
This may leads to poor accuracy. Consider manually set bufferSize instead.
You can also try setting balanceThreshold to zero so only metric trees are built.
(this does not cause the code to stop, and it continues)
Exception in thread "main" java.lang.IllegalArgumentException: knn_2166a4d536d3 parameter bufferSize given invalid value -1.3153582722102333
This then causes an error and the pipeline stops.
From my understanding, very low average distances would always cause errors if beta exceeds 0.
The text was updated successfully, but these errors were encountered:
As part our CICD pipeline, we have a daily build that runs on relatively small amounts of data. As part of this, we discovered an interesting bug; as part of the method estimateTau, there is the following line:
In this case, d is the average distance between points. We are finding that on the small data used in our daily build, beta can exceed 0. When this happens, yMax, which is defined as:
is below negative one, and subsequently used as the bufferSize.
Specifically, the following appears in the log:
(this does not cause the code to stop, and it continues)
This then causes an error and the pipeline stops.
From my understanding, very low average distances would always cause errors if beta exceeds 0.
The text was updated successfully, but these errors were encountered: