-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The batch prediction mode should support row-level exception handling #26
Comments
right, it's really friendly options; and this two options is adding under current version behavior which is just throw exception, right? like you said , it's importance to clear feedback, this options is importance either. and I was thinking the "replace with NaN" need a threshold or specified rows number to stop evaluation or not, because on some specified scene which is people use the wrong data, it would be a little annoying that still evaluation all data. what is your thinking? |
There is a third option - "omit row" aka "drop". If there are evaluation errors, then the corresponding rows are simply omitted from the results batch. The "omit row" option assumes that the user has assigned custom identifiers to the rows of the arguments batch. So, if there are 156 argument rows, and only 144 result rows (meaning that 12 rows errored out), then the user can locally identify "successful" vs "failed" rows in her application code. See #23 about row identifiers. |
As a general comment - my "design assumption" behind the My thinking is that the data is being moved between Python and Java environments using the Pickle protocol. If the pickle payload gets really big (say, 1'000'000 cells instead of 10'000 cells), then the Java component responsible for loading/dumping might start hitting unexpected memory/processing limitations. If the dataset is much bigger than 10'000 cells, then it should be partitioned into multiple chunks in Python application code. And the chunking algorithm should be prepared to handle the "omit row" option gracefully. |
The This way, the chunking logic would be nicely available at the JPMML-Evaluator-Python library level, leaving the actual Python application code clean. |
See jpmml/jpmml-evaluator#271 (comment) and jpmml/jpmml-evaluator#271 (comment)
The text was updated successfully, but these errors were encountered: