v0.28.0: DataLoaderConfig, XLA improvements, FSDP + QLORA foundations, Gradient Synchronization Tweaks, and Bug Fixes
Core
- Introduce a
DataLoaderConfiguration
and begin deprecation of arguments in theAccelerator
+from accelerate import DataLoaderConfiguration
+dl_config = DataLoaderConfiguration(split_batches=True, dispatch_batches=True)
-accelerator = Accelerator(split_batches=True, dispatch_batches=True)
+accelerator = Accelerator(dataloader_config=dl_config)
- Allow gradients to be synced each data batch while performing gradient accumulation, useful when training in FSDP by @fabianlim in #2531
from accelerate import GradientAccumulationPlugin
plugin = GradientAccumulationPlugin(
+ num_steps=2,
sync_each_batch=sync_each_batch
)
accelerator = Accelerator(gradient_accumulation_plugin=plugin)
Torch XLA
FSDP
- Support downstream FSDP + QLORA support through tweaks by allowing configuration of buffer precision by @pacman100 in #2544
launch
changes
What's Changed
- Fix model metadata issue check by @muellerzr in #2435
- Use py 3.9 by @muellerzr in #2436
- Fix seedable sampler logic and expound docs by @muellerzr in #2434
- Fix tied_pointers_to_remove type by @fxmarty in #2439
- Make test assertions more idiomatic by @akx in #2420
- Prefer
is_torch_tensor
overhasattr
for torch.compile. by @PhilJd in #2387 - Enable more Ruff lints & fix issues by @akx in #2419
- Fix warning when dispatching model by @SunMarc in #2442
- Make torch xla available on GPU by @anw90 in #2176
- Include pippy_file_path by @muellerzr in #2444
- [Big deprecation] Introduces a
DataLoaderConfig
by @muellerzr in #2441 - Check for None by @muellerzr in #2452
- Fix the pytest version to be less than 8.0.1 by @BenjaminBossan in #2461
- Fix wrong
is_namedtuple
implementation by @fxmarty in #2475 - Use grad-accum on TPU by @muellerzr in #2453
- Add pre-commit configuration by @akx in #2451
- Replace
os.path.sep.join
path manipulations with a helper by @akx in #2446 - DOC: Fixes to Accelerator docstring by @BenjaminBossan in #2443
- Context manager fixes by @akx in #2450
- Fix TPU with new
XLA
device type by @will-cromar in #2467 - Free mps memory by @SunMarc in #2483
- [FIX] allow
Accelerator
to detect distributed type from the "LOCAL_RANK" env variable for XPU by @faaany in #2473 - Fix CI tests due to pathlib issues by @muellerzr in #2491
- Remove all cases of torchrun in tests and centralize as
accelerate launch
by @muellerzr in #2498 - Fix link typo by @SunMarc in #2503
- [docs] Accelerator API by @stevhliu in #2465
- Docstring fixup by @muellerzr in #2504
- [docs] Divide training and inference by @stevhliu in #2466
- add custom dtype INT2 by @SunMarc in #2505
- quanto compatibility for cpu/disk offload by @SunMarc in #2481
- [docs] Quicktour by @stevhliu in #2456
- Check if hub down by @muellerzr in #2506
- Remove offline stuff by @muellerzr in #2509
- Fixed 0MiB bug in convert_file_size_to_int by @StoyanStAtanasov in #2507
- Fix edge case in infer_auto_device_map when dealing with buffers by @SunMarc in #2511
- [docs] Fix typos by @omahs in #2490
- fix typo in launch.py (
----main_process_port
to--main_process_port
) by @DerrickWang005 in #2516 - Add copyright + some ruff lint things by @muellerzr in #2523
- Don't manage
PYTORCH_NVML_BASED_CUDA_CHECK
when callingaccelerate.utils.imports.is_cuda_available()
by @luiscape in #2524 - Quanto compatibility with QBitsTensor by @SunMarc in #2526
- Remove unnecessary
env=os.environ.copy()
s by @akx in #2449 - Launch mpirun from accelerate launch for multi-CPU training by @dmsuehir in #2493
- Enable using dash or underscore for CLI args by @muellerzr in #2527
- Update the default behavior of
zero_grad(set_to_none=None)
to align with PyTorch by @yongchanghao in #2472 - Update link to dynamo/compile doc by @WarmongeringBeaver in #2533
- Check if the buffers fit GPU memory after device map auto inferred by @notsyncing in #2412
- [Refactor] Refactor send_to_device to treat tensor-like first by @vmoens in #2438
- Overdue email change... by @muellerzr in #2534
- [docs] Troubleshoot by @stevhliu in #2538
- Remove extra double-dash in error message by @drscotthawley in #2541
- Allow Gradients to be Synced Each Data Batch While Performing Gradient Accumulation by @fabianlim in #2531
- Update FSDP mixed precision setter to enable fsdp+qlora by @pacman100 in #2544
- Use uv instead of pip install for github CI by @muellerzr in #2546
New Contributors
- @anw90 made their first contribution in #2176
- @StoyanStAtanasov made their first contribution in #2507
- @omahs made their first contribution in #2490
- @DerrickWang005 made their first contribution in #2516
- @luiscape made their first contribution in #2524
- @dmsuehir made their first contribution in #2493
- @yongchanghao made their first contribution in #2472
- @WarmongeringBeaver made their first contribution in #2533
- @vmoens made their first contribution in #2438
- @drscotthawley made their first contribution in #2541
- @fabianlim made their first contribution in #2531
Full Changelog: v0.27.2...v0.28.0