huggingface / accelerate Public

Notifications You must be signed in to change notification settings
Fork 971
Star 8k

Code
Issues 102
Pull requests 27
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: huggingface/accelerate

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

102 Open 1,540 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[BUG] Accelerator.__init__() got an unexpected keyword argument 'logging_dir'

#3257 opened Nov 25, 2024 by as12138

2 of 4 tasks

# [BUG] [Fix-Suggested] Model Training Stalls with FSDP when fsdp_use_orig_params=False due to inconsistent model-optimizer state

#3256 opened Nov 24, 2024 by traincheck-team

ModuleNotFoundError: No module named 'torchvision'

#3254 opened Nov 23, 2024 by Zerycii

2 of 4 tasks

Accelerate + FSDP plugin hang on after model save intermediate checkpoint

#3250 opened Nov 22, 2024 by leeruibin

2 of 4 tasks

examples/inference/pippy/llama.py Assertion error about graphs

#3249 opened Nov 22, 2024 by 685Degrees

2 of 4 tasks

🚀 Feature Request: Improve stateful_dataloader by passing snapshot_every_n_steps

#3243 opened Nov 18, 2024 by yzhangcs

Wrong epoch when resuming from checkpoint

#3242 opened Nov 17, 2024 by xiechun-tsukuba

2 of 4 tasks

deepspeed inference

#3241 opened Nov 17, 2024 by Reginald-L

OOM error when training llama 7B model using Accelerate FSDP setting

#3239 opened Nov 14, 2024 by JlPang863

2 of 4 tasks

Code Logical Bug: Using Init Handler Kwargs for Grad Scaler In FP8 Training (accelerate/accelerator.py)

#3233 opened Nov 11, 2024 by immortalCO

1 of 4 tasks

fsdp checkpoint saving leads to NCCL WARN Cuda failure 2 'out of memory'

#3232 opened Nov 10, 2024 by edchengg

2 of 4 tasks

Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer

#3230 opened Nov 8, 2024 by Isdriai

2 of 4 tasks

"mat2 must be a matrix" error when finetuning Dreambooth flux with FSDP

#3224 opened Nov 5, 2024 by weixiong-ur

2 of 4 tasks

Add case-insensitive parsing of bool environment variables

#3222 opened Nov 5, 2024 by wizeng23

Incorrect type in output of utils.pad_across_processes when input is torch.bool

#3218 opened Nov 4, 2024 by mariusarvinte

2 of 4 tasks

PyPI published Accelerate==1.1.0 is missing Source Distributions

#3216 opened Nov 4, 2024 by helloworld1

4 tasks

ConnectionError: Tried to launch distributed communication on port 29401, but another process is utilizing it. Please specify a different port (such as using the --main_process_port flag or specifying a different main_process_port in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to 0.

#3214 opened Nov 4, 2024 by qinchangchang

1 of 4 tasks

How could I convert ZeRO-0 deepspeed weights into fp32 model checkpoint?

#3210 opened Nov 1, 2024 by liming-ai

The optimizer is not receiving the FSDP model parameters.

#3209 opened Nov 1, 2024 by eljandoubi

2 of 4 tasks

Multiple node inference

#3208 opened Nov 1, 2024 by DLCM-wrz

Multinode, multigpu example fails

#3206 opened Oct 31, 2024 by ffrancesco94

2 of 4 tasks

Command line arguments related to deepspeed for accelerate launch do not override those of default_config.yaml

#3203 opened Oct 29, 2024 by JdbermeoUZH

2 of 4 tasks

Problem with metrics calculation and dataloader

#3202 opened Oct 28, 2024 by gssriram

2 of 4 tasks

Cuda OOM when accelerator.prepare

#3200 opened Oct 25, 2024 by antoinedelplace

2 of 4 tasks

using deepspeed original json config, when using bf16, get the error RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::BFloat16 != c10::Half.

#3197 opened Oct 25, 2024 by PMPBinZhang

2 of 4 tasks

Previous 1 2 3 4 5 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly