Skip to content

Commit

Permalink
WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
carmocca committed Apr 8, 2024
1 parent 3ece81a commit f5c5c34
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 5 deletions.
8 changes: 4 additions & 4 deletions extensions/thunder/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -575,23 +575,23 @@ Commands:
```bash
python extensions/thunder/pretrain.py --config config.yaml --compiler null --train.global_batch_size 32
python extensions/thunder/pretrain.py --config config.yaml --compiler torch --train.global_batch_size 32
python extensions/thunder/pretrain.py --config config.yaml --executors '[torchcompile_complete]' --train.global_batch_size 32
python extensions/thunder/pretrain.py --config config.yaml --executors '[sdpa, torchcompile, nvfuser, torch]' --train.global_batch_size 32

python extensions/thunder/pretrain.py --config config.yaml --compiler null --strategy ddp
python extensions/thunder/pretrain.py --config config.yaml --compiler torch --strategy ddp
python extensions/thunder/pretrain.py --config config.yaml --executors '[torchcompile_complete]' --strategy ddp
python extensions/thunder/pretrain.py --config config.yaml --executors '[sdpa, torchcompile, nvfuser, torch]' --strategy ddp

python extensions/thunder/pretrain.py --config config.yaml --compiler null --devices 1
python extensions/thunder/pretrain.py --config config.yaml --compiler torch --devices 1
python extensions/thunder/pretrain.py --config config.yaml --executors '[torchcompile_complete]' --devices 1
python extensions/thunder/pretrain.py --config config.yaml --executors '[sdpa, torchcompile, nvfuser, torch]' --devices 1

python extensions/thunder/pretrain.py --config config.yaml --executors '[sdpa, unsloth, torchcompile, nvfuser, torch]' --devices 1
```

Gradient accumulation is disabled in the FSDP setting because Thunder does not support skipping the backward synchronization yet.

`torch.compile` does not support compiling the `_FabricModule` due to this issue: https://github.com/pytorch/pytorch/issues/112787#issuecomment-1986827601
`--compiler torch` (`torch.compile` without `thunder`) is not include because it does not support compiling the `_FabricModule` due to this issue: https://github.com/pytorch/pytorch/issues/112787#issuecomment-1986827601

The CUDA devices are all NVIDIA A100-SXM4-40GB.

Expand Down
4 changes: 3 additions & 1 deletion extensions/thunder/pretrain.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,9 @@ def fit(
model = state["model"]
optimizer = state["optimizer"]

t0 = time.perf_counter()
validate(fabric, model, val_dataloader, max_iters=2) # sanity check
fabric.print(f"{timedelta(seconds=int(time.perf_counter()-t0))!s}")
throughput = ThroughputMonitor(fabric, window_size=5)

with torch.device("meta"):
Expand Down Expand Up @@ -283,7 +285,7 @@ def fit(
warmup_iters = train.warmup_iters(devices, max_iters, train_dataloader)

for train_data in train_iterator:
if state["iter_num"] >= max_iters:
if state["iter_num"] >= max_iters or state["step_count"] >= 10:
break

# determine and set the learning rate for this iteration
Expand Down

0 comments on commit f5c5c34

Please sign in to comment.