Smarter `thunder.jit` decisions #1204

carmocca · 2024-03-27T05:30:42Z

Adds support for:

Have the user call `thunder.jit` but still use the strategy

fabric = Fabric(strategy=ThunderFSDPStrategy())
model = MyModel()
model = thunder.jit(model)
model = fabric.setup(model)  # this is now smart enough to know that model was already jitted

PoC:

import os
import thunder
import torch
import torch.distributed as torch_dist

world_size = int(os.environ.get("WORLD_SIZE", 1))
local_rank = int(os.environ.get("LOCAL_RANK", 0))
global_rank = int(os.environ.get("RANK", 0))
if world_size > 1:
    torch_dist.init_process_group(backend="nccl")
    pg = torch_dist.distributed_c10d._get_default_group()
device = torch.device("cuda", local_rank)
torch.cuda.set_device(device)

model = torch.nn.Linear(5, 10, bias=False, device=device)
x = torch.randn(2, 5, device=device)

def fwd_loss(m, x):
    return m(x).sum()

model = thunder.jit(model)
model._lc_cd.fn = thunder.distributed.fsdp(model._lc_cd.fn)

out = fwd_loss(model, x)

print(out)
if local_rank == 0:
    print("FN", thunder.last_traces(model)[-1].python())

Have the user compile an arbitrary function that includes the model

def fwd_loss(m, x):
    return m(x).sum()

fabric = Fabric(strategy=ThunderFSDPStrategy(jit=False))
model = MyModel()
model = thunder.jit(fwd_and_loss)
model = fabric.setup(model)
fwd_and_loss(model, ...)

Thunder doesn't support jitting twice here, so the user needs to disable the strategy's jit call since fabric doesn't know anything about fwd_and_loss

PoC:

import os
import thunder
import torch
import torch.distributed as torch_dist

world_size = int(os.environ.get("WORLD_SIZE", 1))
local_rank = int(os.environ.get("LOCAL_RANK", 0))
global_rank = int(os.environ.get("RANK", 0))
if world_size > 1:
    torch_dist.init_process_group(backend="nccl")
    pg = torch_dist.distributed_c10d._get_default_group()
device = torch.device("cuda", local_rank)
torch.cuda.set_device(device)

model = torch.nn.Linear(5, 10, bias=False, device=device)
x = torch.randn(2, 5, device=device)

def fwd_loss(m, x):
    return m(x).sum()

fwd_loss = thunder.jit(fwd_loss)
model = thunder.distributed.fsdp(model)

out = fwd_loss(model, x)

print(out)
if local_rank == 0:
    print("FN", thunder.last_traces(fwd_loss)[-1].python())

extensions/thunder/strategies/thunder_fsdp.py

carmocca · 2024-03-27T23:03:46Z

DDP is blocked until Lightning-AI/lightning-thunder#94 is resolved

Smarter thunder.jit decisions

29aac17

carmocca self-assigned this Mar 27, 2024

awaelchli reviewed Mar 27, 2024

View reviewed changes

extensions/thunder/strategies/thunder_fsdp.py Outdated Show resolved Hide resolved

awaelchli approved these changes Mar 27, 2024

View reviewed changes

carmocca added 3 commits March 27, 2024 18:28

Test FSDP

0b44b38

DDP

8ea3588

Fixes

ef963f4

carmocca mentioned this pull request Mar 27, 2024

adding DDP/FSDP transform after JITting does not work Lightning-AI/lightning-thunder#94

Open

Workaround

7d8d596

carmocca marked this pull request as ready for review March 27, 2024 23:26

carmocca requested a review from lantiga as a code owner March 27, 2024 23:26

carmocca merged commit a67dd5c into main Mar 27, 2024
8 checks passed

carmocca deleted the carmocca/customizable-jit branch March 27, 2024 23:42

rasbt pushed a commit that referenced this pull request Apr 3, 2024

Smarter thunder.jit decisions (#1204)

6a019b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smarter `thunder.jit` decisions #1204

Smarter `thunder.jit` decisions #1204

carmocca commented Mar 27, 2024 •

edited

Loading

carmocca commented Mar 27, 2024

Smarter thunder.jit decisions #1204

Smarter thunder.jit decisions #1204

Conversation

carmocca commented Mar 27, 2024 • edited Loading

Have the user call thunder.jit but still use the strategy

PoC:

Have the user compile an arbitrary function that includes the model

PoC:

carmocca commented Mar 27, 2024

Smarter `thunder.jit` decisions #1204

Smarter `thunder.jit` decisions #1204

carmocca commented Mar 27, 2024 •

edited

Loading

Have the user call `thunder.jit` but still use the strategy