-
Notifications
You must be signed in to change notification settings - Fork 971
Issues: huggingface/accelerate
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[BUG] Accelerator.__init__() got an unexpected keyword argument 'logging_dir'
#3257
opened Nov 25, 2024 by
as12138
2 of 4 tasks
ModuleNotFoundError: No module named 'torchvision'
#3254
opened Nov 23, 2024 by
Zerycii
2 of 4 tasks
Accelerate + FSDP plugin hang on after model save intermediate checkpoint
#3250
opened Nov 22, 2024 by
leeruibin
2 of 4 tasks
examples/inference/pippy/llama.py Assertion error about graphs
#3249
opened Nov 22, 2024 by
685Degrees
2 of 4 tasks
🚀 Feature Request: Improve
stateful_dataloader
by passing snapshot_every_n_steps
#3243
opened Nov 18, 2024 by
yzhangcs
OOM error when training llama 7B model using Accelerate FSDP setting
#3239
opened Nov 14, 2024 by
JlPang863
2 of 4 tasks
Code Logical Bug: Using Init Handler Kwargs for Grad Scaler In FP8 Training (accelerate/accelerator.py)
#3233
opened Nov 11, 2024 by
immortalCO
1 of 4 tasks
fsdp checkpoint saving leads to NCCL WARN Cuda failure 2 'out of memory'
#3232
opened Nov 10, 2024 by
edchengg
2 of 4 tasks
Error while fine tuning with peft, lora, accelerate, SFTConfig and SFTTrainer
#3230
opened Nov 8, 2024 by
Isdriai
2 of 4 tasks
"mat2 must be a matrix" error when finetuning Dreambooth flux with FSDP
#3224
opened Nov 5, 2024 by
weixiong-ur
2 of 4 tasks
Incorrect type in output of
utils.pad_across_processes
when input is torch.bool
#3218
opened Nov 4, 2024 by
mariusarvinte
2 of 4 tasks
PyPI published Accelerate==1.1.0 is missing Source Distributions
#3216
opened Nov 4, 2024 by
helloworld1
4 tasks
ConnectionError: Tried to launch distributed communication on port
29401
, but another process is utilizing it. Please specify a different port (such as using the --main_process_port
flag or specifying a different main_process_port
in your config file) and rerun your script. To automatically use the next open port (on a single node), you can set this to 0
.
#3214
opened Nov 4, 2024 by
qinchangchang
1 of 4 tasks
How could I convert ZeRO-0 deepspeed weights into fp32 model checkpoint?
#3210
opened Nov 1, 2024 by
liming-ai
The optimizer is not receiving the FSDP model parameters.
#3209
opened Nov 1, 2024 by
eljandoubi
2 of 4 tasks
Command line arguments related to deepspeed for
accelerate launch
do not override those of default_config.yaml
#3203
opened Oct 29, 2024 by
JdbermeoUZH
2 of 4 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.