Release v0.28.0: DataLoaderConfig, XLA improvements, FSDP + QLORA foundations, Gradient Synchronization Tweaks, and Bug Fixes · huggingface/accelerate

Core

Introduce a DataLoaderConfiguration and begin deprecation of arguments in the Accelerator

+from accelerate import DataLoaderConfiguration
+dl_config = DataLoaderConfiguration(split_batches=True, dispatch_batches=True)
-accelerator = Accelerator(split_batches=True, dispatch_batches=True)
+accelerator = Accelerator(dataloader_config=dl_config)

Allow gradients to be synced each data batch while performing gradient accumulation, useful when training in FSDP by @fabianlim in #2531

from accelerate import GradientAccumulationPlugin
plugin = GradientAccumulationPlugin(
+    num_steps=2, 
    sync_each_batch=sync_each_batch
)
accelerator = Accelerator(gradient_accumulation_plugin=plugin)

Torch XLA

Support for XLA on the GPU by @anw90 in #2176
Enable gradient accumulation on TPU in #2453

FSDP

Support downstream FSDP + QLORA support through tweaks by allowing configuration of buffer precision by @pacman100 in #2544

`launch` changes

Support mpirun for multi-cpu training by @dmsuehir in #2493

What's Changed

Fix model metadata issue check by @muellerzr in #2435
Use py 3.9 by @muellerzr in #2436
Fix seedable sampler logic and expound docs by @muellerzr in #2434
Fix tied_pointers_to_remove type by @fxmarty in #2439
Make test assertions more idiomatic by @akx in #2420
Prefer is_torch_tensor over hasattr for torch.compile. by @PhilJd in #2387
Enable more Ruff lints & fix issues by @akx in #2419
Fix warning when dispatching model by @SunMarc in #2442
Make torch xla available on GPU by @anw90 in #2176
Include pippy_file_path by @muellerzr in #2444
[Big deprecation] Introduces a DataLoaderConfig by @muellerzr in #2441
Check for None by @muellerzr in #2452
Fix the pytest version to be less than 8.0.1 by @BenjaminBossan in #2461
Fix wrong is_namedtuple implementation by @fxmarty in #2475
Use grad-accum on TPU by @muellerzr in #2453
Add pre-commit configuration by @akx in #2451
Replace os.path.sep.join path manipulations with a helper by @akx in #2446
DOC: Fixes to Accelerator docstring by @BenjaminBossan in #2443
Context manager fixes by @akx in #2450
Fix TPU with new XLA device type by @will-cromar in #2467
Free mps memory by @SunMarc in #2483
[FIX] allow Accelerator to detect distributed type from the "LOCAL_RANK" env variable for XPU by @faaany in #2473
Fix CI tests due to pathlib issues by @muellerzr in #2491
Remove all cases of torchrun in tests and centralize as accelerate launch by @muellerzr in #2498
Fix link typo by @SunMarc in #2503
[docs] Accelerator API by @stevhliu in #2465
Docstring fixup by @muellerzr in #2504
[docs] Divide training and inference by @stevhliu in #2466
add custom dtype INT2 by @SunMarc in #2505
quanto compatibility for cpu/disk offload by @SunMarc in #2481
[docs] Quicktour by @stevhliu in #2456
Check if hub down by @muellerzr in #2506
Remove offline stuff by @muellerzr in #2509
Fixed 0MiB bug in convert_file_size_to_int by @StoyanStAtanasov in #2507
Fix edge case in infer_auto_device_map when dealing with buffers by @SunMarc in #2511
[docs] Fix typos by @omahs in #2490
fix typo in launch.py (----main_process_port to --main_process_port) by @DerrickWang005 in #2516
Add copyright + some ruff lint things by @muellerzr in #2523
Don't manage PYTORCH_NVML_BASED_CUDA_CHECK when calling accelerate.utils.imports.is_cuda_available() by @luiscape in #2524
Quanto compatibility with QBitsTensor by @SunMarc in #2526
Remove unnecessary env=os.environ.copy()s by @akx in #2449
Launch mpirun from accelerate launch for multi-CPU training by @dmsuehir in #2493
Enable using dash or underscore for CLI args by @muellerzr in #2527
Update the default behavior of zero_grad(set_to_none=None) to align with PyTorch by @yongchanghao in #2472
Update link to dynamo/compile doc by @WarmongeringBeaver in #2533
Check if the buffers fit GPU memory after device map auto inferred by @notsyncing in #2412
[Refactor] Refactor send_to_device to treat tensor-like first by @vmoens in #2438
Overdue email change... by @muellerzr in #2534
[docs] Troubleshoot by @stevhliu in #2538
Remove extra double-dash in error message by @drscotthawley in #2541
Allow Gradients to be Synced Each Data Batch While Performing Gradient Accumulation by @fabianlim in #2531
Update FSDP mixed precision setter to enable fsdp+qlora by @pacman100 in #2544
Use uv instead of pip install for github CI by @muellerzr in #2546

New Contributors

@anw90 made their first contribution in #2176
@StoyanStAtanasov made their first contribution in #2507
@omahs made their first contribution in #2490
@DerrickWang005 made their first contribution in #2516
@luiscape made their first contribution in #2524
@dmsuehir made their first contribution in #2493
@yongchanghao made their first contribution in #2472
@WarmongeringBeaver made their first contribution in #2533
@vmoens made their first contribution in #2438
@drscotthawley made their first contribution in #2541
@fabianlim made their first contribution in #2531

Full Changelog: v0.27.2...v0.28.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.28.0: DataLoaderConfig, XLA improvements, FSDP + QLORA foundations, Gradient Synchronization Tweaks, and Bug Fixes

Core

Torch XLA

FSDP

`launch` changes

What's Changed

New Contributors

Contributors

v0.28.0: DataLoaderConfig, XLA improvements, FSDP + QLORA foundations, Gradient Synchronization Tweaks, and Bug Fixes

Core

Torch XLA

FSDP

launch changes

What's Changed

New Contributors

Contributors

`launch` changes