Releases: awslabs/multi-model-server
Releases · awslabs/multi-model-server
v1.1.1 - Resource cleanup for terminated worker threads
This release contains minor fixes to make sure resource cleaning is done for terminated worker threads
- Terminates the STDOUT and STDERR ReaderThreads for a Worker when it is scaled down
v1.0.8 - Synchronous resource cleaning and API changes
This release contains API changes and fixes to make sure resource cleaning is handled synchronously.
- Load model API sends a conflict response instead of a bad request response when trying to register an already registered model. #851
- Unregister model API is now synchronous and will wait until all resources are cleaned before sending a response back. A timeout feature was also added to config if users don't want to wait. #853
v1.0.7 - Bug fixes to support python 2 better
This release contains a minor bug fix for Python 2 support.
- Changed the python protocol handler between frontend and backend to support python 2 better.
v1.0.6 - Features to handle OOM errors and enhancements to configurability of MMS
- Load model API takes in JSON requests. #818
- Implementation of Ping API using the plugins SDK. #814
- Newer endpoint for predictions.
POST /models/{model-id}/invoke
. #823 - Handling OOM errors. MMS returns a HTTP 507 error code when there is a OOM error during runtime of MMS. #822
- Added changes to allow MMS have the same Management and Inference addresses #826
- Changes to MMS default behavior. MMS by default runs
POST /models
in a synchronous way and if there aredefault_workers_per_model
, this value will be used when loading models. #836 - MMS configuration values can take environment variables. #841
v1.0.5 - Model Server support for plugins
This release contains multiple model server changes
Major features
- Plugins support
- SDK for plugins
- Reference plugins implementation
- MMS changes to support plugins
- Feature to support default service file configured.
- Feature to support return of custom HTTP headers from the model.
Minor features
- Option to run MMS in the foreground
....
And multiple bug fixes
v1.0.4 - Contains model-archiver features, integration test framework and bug fixes
This release contains multiple model-archiver features and bug fixes.
Features
- Added support for "no-archive"
- Added feature to support optional conversion of ONNX model to MXNet model
- Added integration test framework for model-archiver.
v1.0.3 - Base MMS containers available
Features and Bug Fixes
- Published base MMS containers for Python 2.7 and Python 3.6 with Ubuntu 16.04 and nvidia/cuda 9.2 with CUDNN 7 on ubuntu 16.04.
- model-archiver changes to handle multiple archive formats
- model-server configurable through environment variables
- Contains multiple bug fixes
v1.0.2 - Multiple features and bug fixes
In this release we have addressed all the reported bugs and also added enhancements such as
Features and Major Bug Fixes
- Frontend listening on Unix Domain Socket.
- Support Asynchronous logging.
- Added documentation for batching support.
- Added features to support
- Starting default number of workers for models that are launched at MMS Startup time.
- Configurable response time out for individual models. This is the amount of time MMS waits for the model to respond to a request.
- Configure Maximum allowable request and response sizes.
- Changes for new Container images.
- Passing all HTTP headers to the backend worker.
- Adding shufflenet to the model server model-zoo.
- Adding example to bring sockeye model onto MMS.
... And bug fixes
v1.0.1 - Apache Model Server for MXNet adds minor features and addresses bugs
In this release of MXNet Model Server, we have added the following features.
Features and Bug fixes
- Changes for batching support.
- CORS headers support added to responses.
- Handle content-type returned by the backend code and pass ContentType to the service code
- Workaround import mxnet module timeout issue. Now MXNet startup time doesn't cause significant delay upon MMS start on compute optimized hosts
- Make sure that python prints are not buffered
- Refactor metrics emission logic
- Always use utf-8 to decode bytes.
- Avoid archiving a model archive file recursively.
- Pythonpath issues for MMS
- Documentation updates
Apache Model Server for MXNet adds support for hot loading of models
In this release of MXNet Model Server, we have added the following major features.
Features
- Loading and Unloading models at run-time (hot loading models). This is now available via management REST API exposed by MMS. More on management API here
- Independently scale number of model-worker instances serving inference requests. This is available through management REST API.
- Improved model archive representation. More on model-archiver is here
- Improved docker container images.
- Improved performance compared to MMS v0.4 and decreased dependencies. One of the major changes is replacing monolithic architecture with separate frontend and backend. Netty is used as frontend webserver instead of Flask+GUnicorn combo. Python is for the backend.
- Improved logging and metrics collection. Using log4j and corresponding config to control metrics, including custom user metrics. More on logging config is here
New and updated documents:
- Migration document to migrate from MMS 0.4 to MMS 1.0.
- New Management API.
- Updated model zoo.
- Updated Inference API.
For further documentation, please refer /docs folder
Bug fixes:
This release fixes all the bugs logged on GitHub.