V0.10.0 DeepSpeed integration revamp and TPU speedup
This release adds two major new features: the DeepSpeed integration has been revamped to match the one in Transformers Trainer, with multiple new options unlocked, and the TPU integration has been sped up.
This version also officially stops supporting Python 3.6 and requires Python 3.7+
DeepSpeed integration revamp
Users can now specify a DeepSpeed config file when they want to use DeepSpeed, which unlocks many new options. More details in the new documentation.
- Migrate HFDeepSpeedConfig from trfrs to accelerate by @pacman100 in #432
- DeepSpeed Revamp by @pacman100 in #405
TPU speedup
If you're using TPUs we have sped up the dataloaders and models quite a bit, on top of a few bug fixes.
- Revamp TPU internals to be more efficient + enable mixed precision types by @muellerzr in #441
What's new?
- Fix docstring by @muellerzr in #447
- Add psutil as depenedency by @sgugger in #445
- fix fsdp torch version dependency by @pacman100 in #437
- Create Gradient Accumulation Example by @muellerzr in #431
- init by @muellerzr in #429
- Introduce
no_sync
context wrapper + clean up some more warnings for DDP by @muellerzr in #428 - updating tests to resolve runner failures wrt deepspeed revamp by @pacman100 in #427
- Fix secrets in Docker workflow by @muellerzr in #426
- Introduce a Dependency Checker to trigger new Docker Builds on main by @muellerzr in #424
- Enable slow tests nightly by @muellerzr in #421
- Push out python 3.6 + fix all tests related to the upgrade by @muellerzr in #420
- Speedup main CI by @muellerzr in #419
- Switch to evaluate for metrics by @sgugger in #417
- Create an issue template for Accelerate by @muellerzr in #415
- Introduce post-merge runners by @muellerzr in #416
- Fix debug_launcher issues by @muellerzr in #413
- Use main egg by @muellerzr in #414
- Introduce nightly runners by @muellerzr in #410
- Update requirements to pin tensorboard and include psutil by @muellerzr in #408
- Fix CUDA examples tests by @muellerzr in #407
- Move datasets and transformers to under func by @muellerzr in #411
- Fix CUDA Dockerfile by @muellerzr in #409
- Hotfix all failing GPU tests by @muellerzr in #401
- improve metrics logged in examples by @pacman100 in #399
- Refactor offload_state_dict and fix in offload_weight by @sgugger in #398
- Refactor version checking into a utility by @muellerzr in #395
- Include fastai in frameworks by @muellerzr in #396
- Add packaging to requirements by @muellerzr in #394
- Better dispatch for submodules by @sgugger in #392
- Build Docker Images nightly by @muellerzr in #391
- Small bugfix for the stalebot workflow by @muellerzr in #390
- Introduce stalebot by @muellerzr in #387
- Create Dockerfiles for Accelerate by @muellerzr in #377
- Mix precision -> Mixed precision by @muellerzr in #388
- Fix OneCycle step length when in multiprocess by @muellerzr in #385