Skip to content

v0.9.1

Compare
Choose a tag to compare
@Syulin7 Syulin7 released this 30 Aug 02:51
· 175 commits to master since this release
06ee271

Release 0.9.1

Fixed

  • Fix the bug that failed to run pytorchjob with RDMA.
  • Fix the bug that error dispaly gpu core resources on nodes.
  • Fix the bug that add evaluator and tensorboard to pod group.

Changed

  • Refact installtion.
  • Modify restful-serving to http-serving of deployment services.
  • Optimize the operators to omit the Completed jobs into the queue.

Added

  • Support modeljob adapts helm3.
  • Cron workload supports custom labels.
  • Java SDK submits training job with --label.
  • Add resource limits for tfjob.
  • Add subpathexpr for job .

Please follow the Get started Guide to install.