Releases · huggingface/accelerate

15 Jun 18:07

sgugger

v0.10.0

3d92caa

V0.10.0 DeepSpeed integration revamp and TPU speedup

This release adds two major new features: the DeepSpeed integration has been revamped to match the one in Transformers Trainer, with multiple new options unlocked, and the TPU integration has been sped up.

This version also officially stops supporting Python 3.6 and requires Python 3.7+

DeepSpeed integration revamp

Users can now specify a DeepSpeed config file when they want to use DeepSpeed, which unlocks many new options. More details in the new documentation.

Migrate HFDeepSpeedConfig from trfrs to accelerate by @pacman100 in #432
DeepSpeed Revamp by @pacman100 in #405

TPU speedup

If you're using TPUs we have sped up the dataloaders and models quite a bit, on top of a few bug fixes.

Revamp TPU internals to be more efficient + enable mixed precision types by @muellerzr in #441

What's new?

Fix docstring by @muellerzr in #447
Add psutil as depenedency by @sgugger in #445
fix fsdp torch version dependency by @pacman100 in #437
Create Gradient Accumulation Example by @muellerzr in #431
init by @muellerzr in #429
Introduce no_sync context wrapper + clean up some more warnings for DDP by @muellerzr in #428
updating tests to resolve runner failures wrt deepspeed revamp by @pacman100 in #427
Fix secrets in Docker workflow by @muellerzr in #426
Introduce a Dependency Checker to trigger new Docker Builds on main by @muellerzr in #424
Enable slow tests nightly by @muellerzr in #421
Push out python 3.6 + fix all tests related to the upgrade by @muellerzr in #420
Speedup main CI by @muellerzr in #419
Switch to evaluate for metrics by @sgugger in #417
Create an issue template for Accelerate by @muellerzr in #415
Introduce post-merge runners by @muellerzr in #416
Fix debug_launcher issues by @muellerzr in #413
Use main egg by @muellerzr in #414
Introduce nightly runners by @muellerzr in #410
Update requirements to pin tensorboard and include psutil by @muellerzr in #408
Fix CUDA examples tests by @muellerzr in #407
Move datasets and transformers to under func by @muellerzr in #411
Fix CUDA Dockerfile by @muellerzr in #409
Hotfix all failing GPU tests by @muellerzr in #401
improve metrics logged in examples by @pacman100 in #399
Refactor offload_state_dict and fix in offload_weight by @sgugger in #398
Refactor version checking into a utility by @muellerzr in #395
Include fastai in frameworks by @muellerzr in #396
Add packaging to requirements by @muellerzr in #394
Better dispatch for submodules by @sgugger in #392
Build Docker Images nightly by @muellerzr in #391
Small bugfix for the stalebot workflow by @muellerzr in #390
Introduce stalebot by @muellerzr in #387
Create Dockerfiles for Accelerate by @muellerzr in #377
Mix precision -> Mixed precision by @muellerzr in #388
Fix OneCycle step length when in multiprocess by @muellerzr in #385

Contributors

muellerzr, pacman100, and sgugger

Assets 2

20 May 17:54

sgugger

v0.9.0

f626d87

v0.9.0: Refactor utils to use in Transformers

This release offers no significant new API, it is just needed to have access to some utils in Transformers.

Handle deprication errors in launch by @muellerzr in #360
Update launchers.py by @tmabraham in #363
fix tracking by @pacman100 in #361
Remove tensor call by @muellerzr in #365
Add a utility for writing a barebones config file by @muellerzr in #371
fix deepspeed model saving by @pacman100 in #370
deepspeed save model temp fix by @pacman100 in #374
Refactor tests to use accelerate launch by @muellerzr in #373
fix zero stage-1 by @pacman100 in #378
fix shuffling for ShufflerIterDataPipe instances by @loubnabnl in #376
Better check for deepspeed availability by @sgugger in #379
Refactor some parts in utils by @sgugger in #380

Contributors

muellerzr, pacman100, and 3 other contributors

Assets 2

12 May 15:01

sgugger

v0.8.0

2943172

v0.8.0: Big model inference

Big model inference

To handle very large models, new functionality has been added in Accelerate:

a context manager to initalize empty models
a function to load a sharded checkpoint directly on the right devices
a set of custom hooks that allow execution of a model split on different devices, as well as CPU or disk offload
a magic method that auto-determines a device map for a given model, maximizing the GPU spaces, available RAM before using disk offload as a last resort.
a function that wraps the last three blocks in one simple call (load_checkpoint_and_dispatch)

See more in the documentation

Big model inference by @sgugger in #345

What's new

Create peak_memory_uasge_tracker.py by @pacman100 in #336
Fixed a typo to enable running accelerate correctly by @Idodox in #339
Introduce multiprocess logger by @muellerzr in #337
Refactor utils into its own module by @muellerzr in #340
Improve num_processes question in CLI by @muellerzr in #343
Handle Manual Wrapping in FSDP. Minor fix of fsdp example. by @pacman100 in #342
Better prompt for number of training devices by @muellerzr in #344
Fix prompt for num_processes by @pacman100 in #347
Fix sample calculation in examples by @muellerzr in #352
Fixing metric eval in distributed setup by @pacman100 in #355
DeepSpeed and FSDP plugin support through script by @pacman100 in #356

Contributors

muellerzr, pacman100, and 2 other contributors

Assets 2

29 Apr 13:16

sgugger

v0.7.1

3eac8e7

v0.7.1 Patch release

Fix fdsp config in cluster 331
Add guards for batch size finder 334
Patchfix infinite loop 335

Assets 2

28 Apr 17:14

sgugger

v0.7.0

f0bb5f0

v0.7.0: Logging API, FSDP, batch size finder and examples revamp

Logging API

Use any of your favorite logging libraries (TensorBoard, Wandb, CometML...) with just a few lines of code inside your training scripts with Accelerate. All details are in the documentation.

Add logging capabilities by @muellerzr in #293

Support for FSDP (fully sharded DataParallel)

PyTorch recently released a new model wrapper for sharded DDP training called FSDP. This release adds support for it (note that it doesn't work with mixed precision yet). See all caveats in the documentation.

PyTorch FSDP Feature Incorporation by @pacman100 in #321

Batch size finder

Say goodbye to the CUDA OOM errors with the new find_executable_batch_size decorator. Just decorate your training function and pick a starting batch size, then let Accelerate do the rest.

Add a memory-aware decorator for CUDA OOM avoidance by @muellerzr in #324

Examples revamp

The Accelerate examples are now split in two: you can find in the base folder a very simple nlp and computer vision examples, as well as complete versions incorporating all features. But you can also browse the examples in the by_feature subfolder, which will show you exactly what code to add for each given feature (checkpointing, tracking, cross-validation etc.)

Refactor Examples by Feature by @muellerzr in #312

What's Changed

Document save/load state by @muellerzr in #290
Refactor precisions to its own enum by @muellerzr in #292
Load model and optimizet states on CPU to void OOMs by @sgugger in #299
Fix example for datasets v2 by @sgugger in #298
Leave default as None in mixed_precision for launch command by @sgugger in #300
Pass lr_scheduler to Accelerator.prepare by @sgugger in #301
Create new TestCase classes and clean up W&B tests by @muellerzr in #304
Have custom trackers work with the API by @muellerzr in #305
Write tests for comet_ml by @muellerzr in #306
Fix training in DeepSpeed by @sgugger in #308
Update example scripts by @muellerzr in #307
Use --no_local_rank for DeepSpeed launch by @sgugger in #309
Fix Accelerate CLI CPU option + small fix for W&B tests by @muellerzr in #311
Fix DataLoader sharding for deepspeed in accelerate by @m3rlin45 in #315
Create a testing framework for example scripts and fix current ones by @muellerzr in #313
Refactor Tracker logic and write guards for logging_dir by @muellerzr in #316
Create Cross-Validation example by @muellerzr in #317
Create alias for Accelerator.free_memory by @muellerzr in #318
fix typo in docs of accelerate tracking by @loubnabnl in #320
Update examples to show how to deal with extra validation copies by @muellerzr in #319
Fixup all checkpointing examples by @muellerzr in #323
Introduce reduce operator by @muellerzr in #326

New Contributors

@m3rlin45 made their first contribution in #315
@loubnabnl made their first contribution in #320
@pacman100 made their first contribution in #321

Full Changelog: v0.6.0...v0.7.0

Contributors

m3rlin45, muellerzr, and 3 other contributors

Assets 2

31 Mar 13:28

sgugger

v0.6.2

a5b8811

v0.6.2: Fix launcher with mixed precision

The launcher was ignoring the mixed precision attribute of the config since v0.6.0. This patch fixes that.

Assets 2

18 Mar 21:47

sgugger

v0.6.1

8bc6c83

v0.6.1: Hot fix

Patches an issue with mixed precision (see #286)

Assets 2

18 Mar 13:47

sgugger

v0.6.0

339d4e0

v0.6.0: Checkpointing and bfloat16 support

This release adds support for bloat16 mixed precision training (requires PyTorch >= 1.10) and a brand-new checkpoint utility to help with resuming interrupted trainings. We also get a completely revamped documentation frontend.

Checkpoints

Save the current state of all your objects (models, optimizers, RNG states) with accelerator.save_state(path_to_checkpoint) and reload everything by calling accelerator.load_state(path_to_checkpoint)

Add in checkpointing capability by @muellerzr in #255
Implementation of saving and loading custom states by @muellerzr in #270

BFloat16 support

Accelerate now supports bfloat16 mixed precision training. As a result the old --fp16 argument has been deprecated to be replaced by the more generic --mixed-precision.

Add bfloat16 support #243 by @ikergarcia1996 in #247

New env subcommand

You can now type accelerate env to have a copy-pastable summary of your environment and default configuration. Very convenient when opening a new issue!

add env command by @johnnv1 in #280

New doc frontend

The documentation has been switched to the new Hugging Face frontend, like Transformers and Datasets.

Convert documentation to the new front by @sgugger in #271

What's Changed

Fix send_to_device with non-tensor data by @sgugger in #177
Handle UserDict in all utils by @sgugger in #179
Use collections.abc.Mapping to handle both the dict and the UserDict types by @mariosasko in #180
fix: use store_true on argparse in nlp example by @monologg in #183
Update README.md by @TevenLeScao in #187
Add signature check for set_to_none in Optimizer.zero_grad by @sgugger in #189
fix typo in code snippet by @MrZilinXiao in #199
Add high-level API reference to README by @Chris-hughes10 in #204
fix rng_types in accelerator by @s-kumano in #206
Pass along drop_last in DispatchDataLoader by @sgugger in #212
Rename state to avoid name conflicts with pytorch's Optimizer class. by @yuxinyuan in #224
Fix lr scheduler num samples by @sgugger in #227
Add customization point for init_process_group kwargs by @sgugger in #228
Fix typo in installation docs by @jaketae in #234
make deepspeed optimizer match parameters of passed optimizer by @jmhessel in #246
Upgrade black to version ~=22.0 by @LysandreJik in #250
add support of gather_object by @ZhiyuanChen in #238
Add launch flags --module and --no_python (#256) by @parameter-concern in #258
Accelerate + Animus/Catalyst = 🚀 by @Scitator in #249
Add debug_launcher by @sgugger in #259
enhance compatibility of honor type by @ZhiyuanChen in #241
Add a flag to use CPU only in the config by @sgugger in #263
Basic fixes for DeepSpeed by @sgugger in #264
Ability to set the seed with randomness from inside Accelerate by @muellerzr in #266
Don't use dispatch_batches when torch is < 1.8.0 by @sgugger in #269
Make accelerated model with AMP possible to pickle by @BenjaminBossan in #274
Contributing guide by @LysandreJik in #254
replace texts and link (master -> main) by @johnnv1 in #282
Use workflow from doc-builder by @sgugger in #275
Pass along execution info to the exit of autocast by @sgugger in #284

New Contributors

@mariosasko made their first contribution in #180
@monologg made their first contribution in #183
@TevenLeScao made their first contribution in #187
@MrZilinXiao made their first contribution in #199
@Chris-hughes10 made their first contribution in #204
@s-kumano made their first contribution in #206
@yuxinyuan made their first contribution in #224
@jaketae made their first contribution in #234
@jmhessel made their first contribution in #246
@ikergarcia1996 made their first contribution in #247
@ZhiyuanChen made their first contribution in #238
@parameter-concern made their first contribution in #258
@Scitator made their first contribution in #249
@muellerzr made their first contribution in #255
@BenjaminBossan made their first contribution in #274
@johnnv1 made their first contribution in #280

Full Changelog: v0.5.1...v0.6.0

Contributors

jmhessel, BenjaminBossan, and 16 other contributors

Assets 2

27 Sep 15:05

sgugger

v0.5.1

19ec4a7

v0.5.1: Patch release

Fix the two following bugs:

convert_to_fp32 returned booleans instead of tensors #173
wrong dataloader lenght when dispatch_batches=True #175

Assets 2

23 Sep 14:38

sgugger

v0.5.0

56d8760

v0.5.0 Dispatch batches from main DataLoader

This release introduces support for iterating through a DataLoader only on the main process, that then dispatches the batches to all processes.

Dispatch batches from main DataLoader

The motivation behind this come from dataset streaming which introduces two difficulties:

there might be some timeouts for some elements of the dataset, which might then be different in each process launched, thus it's impossible to make sure the data is iterated though the same way on each process
when using IterableDataset, each process goes through the dataset, thus applies the preprocessing on all elements. This can yield to the training being slowed down by this preprocessing.

This new feature is activated by default for all IterableDataset.

Central dataloader #164 (@sgugger)
Dynamic default for dispatch_batches #168 (@sgugger)

Various fixes

fix fp16 covert back to fp32 for issue: unsupported operand type(s) for /: 'dict' and 'int' #149 (@Doragd)
[Docs] Machine config is yaml not json #151 (@patrickvonplaten)
Fix gather for 0d tensor #152 (@sgugger)
[DeepSpeed] allow untested optimizers deepspeed #150 (@patrickvonplaten)
Raise errors instead of warnings with better tests #170 (@sgugger)

Contributors

patrickvonplaten, Doragd, and sgugger

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed integration revamp

TPU speedup

What's new?

Contributors

v0.9.0: Refactor utils to use in Transformers

Contributors

v0.8.0: Big model inference

Big model inference

What's new

Contributors

v0.7.1 Patch release

Logging API

Support for FSDP (fully sharded DataParallel)

Batch size finder

Examples revamp

What's Changed

New Contributors

Contributors

Checkpoints

BFloat16 support

New env subcommand

New doc frontend

What's Changed

New Contributors

Contributors

v0.5.1: Patch release

v0.5.0 Dispatch batches from main DataLoader

Dispatch batches from main DataLoader

Various fixes

Contributors

Releases: huggingface/accelerate

V0.10.0 DeepSpeed integration revamp and TPU speedup

DeepSpeed integration revamp

TPU speedup

What's new?

Contributors

v0.9.0: Refactor utils to use in Transformers

v0.9.0: Refactor utils to use in Transformers

Contributors

v0.8.0: Big model inference

v0.8.0: Big model inference

Big model inference

What's new

Contributors

v0.7.1 Patch release

v0.7.1 Patch release

v0.7.0: Logging API, FSDP, batch size finder and examples revamp

Logging API

Support for FSDP (fully sharded DataParallel)

Batch size finder

Examples revamp

What's Changed

New Contributors

Contributors

v0.6.2: Fix launcher with mixed precision

v0.6.1: Hot fix

v0.6.0: Checkpointing and bfloat16 support

Checkpoints

BFloat16 support

New env subcommand

New doc frontend

What's Changed

New Contributors

Contributors

v0.5.1: Patch release

v0.5.1: Patch release

v0.5.0 Dispatch batches from main DataLoader

v0.5.0 Dispatch batches from main DataLoader

Dispatch batches from main DataLoader

Various fixes

Contributors