v0.9.1
Release 0.9.1
Fixed
- Fix the bug that failed to run pytorchjob with RDMA.
- Fix the bug that error dispaly gpu core resources on nodes.
- Fix the bug that add evaluator and tensorboard to pod group.
Changed
- Refact installtion.
- Modify restful-serving to http-serving of deployment services.
- Optimize the operators to omit the Completed jobs into the queue.
Added
- Support modeljob adapts helm3.
- Cron workload supports custom labels.
- Java SDK submits training job with --label.
- Add resource limits for tfjob.
- Add subpathexpr for job .
Please follow the Get started Guide to install.