- switched to supervisor + workers architecture, improving handling of errors and subprocess lifecycle control.
- Source code split into more mostly isolated modules.
- added 3rd parallel process which handles post-processing and writing of responses. This should greatly improve performance.
- add ability to compress data in transit
- --output_delimiter flag to set delimiter for output CSV. "tab" can be used
- for tab-delimited output
- --skip_row_id flag to skip row_id column in output
- fixed hang of batch-scoring script on CSV parse errors
- added summary of run at the end of script output with full list of errors,
- warnings and total stats.
- fixed error when trying to report multiline CSV error in fast mode
- Run all tests against Windows
- --pred_name parameter is documented. Potentially backward incompatible change:
- Previously, 1.0 class was used as positive result for binary predictions, now last class in lexical order is used
- Fixed memory leak and performance problem caused by unrestricted batch-generator
- internal check and error avoidance logic for requests that are too large
- docker and docker-compose files for dockerized run of tests and script
- auto sampler target batch size increased to 2.5M
- improve url parsing. You no longer need to include "/api" in the host argument.
- return more descriptive error messages when there is a problem
- include the version of the batch-scoring script in the user-agent header
- add option to define document encoding
- add option to skip csv dialect detection.
- make adjustment to sample size used by dialect and encoding detection
- use auto_sample as default unless "--n_samples" is defined
- allow "tab" command line arg keyword. e.g. "--delimiter=tab"
- minor performance improvement for nix users
- This release is compatible with Windows
- logs are now sent to two files within the directory where the script is run
- added --auto_sample option to find the n_samples automatically.
- added --auto_sample option to find the n_samples automatically.
- change how csv dialects are passed around in attempt to fix a bug on Windows.
- use chardet module chardet to attempt to detect character encoding
- use standard lib csv module to attempt to discover CSV dialect
- use stream decoder and encoder in python 2 to transparently convert to utf-8
- provide a mode for sending all user messages to stdout
- separate process for disk IO and request payload serialization
- avoid codecs.getreader due to IO bottleneck
- dont parse CSV (fail fatally on multiline csv)
- multiline mode (to be renamed)
- keep_cols resolution
- Get rid of gevent/asyncio, use thread-based networking
- Show path to logs on every unexpected error
- Convert cmdline argument parser from docopt to argparse
- Add configuration file support
- Refactor logging/ui
- Drop support of making predictions using 'v2' Modeling API
- Fix bug under Python 2 where gevent was fatally failing on timeouts.
- Added timeout argument.
- Both asyncio and gevent now retry within the request exception handler.
- Authorization now checks schema too and thus we fail much earlier if input not correct.
- Fix bug under Python 2 where gevent was silently dropping batches.
- Better checks if run completed successfully.
- Fail fast on missing column or dtype mismatch.
- Add naming of prediction column for regression.
- Fix ignore datarobot_key.
- Update requirements for Python 3 to minimum versions.
- Updated client side error reporting to show the status message when it returns formatted as JSON object instead of just the error code
- Use utf8 encoding for CSV strings sent to prediction API server
- Use CSV instead of JSON for better throughput and reduced memory footprint on the server-side.
- Gevent dependency update to fix ssl bug on 2.7.9.
- Setuptools support.
- Use python logging and maintain a debug log to help support engineers trace errors.
- More robust delimiter handling (whitelist).
- Dont segfault on non-splittable delimiter.
- Set number of retries default to 3 instead of infinite.
- Fix: type -> task
- Initial release