feat(fp8): [Work In Progress] enable FP8 training #369

zigzagcai · 2024-11-06T10:10:53Z

Motivation

Try to enable FP8 to speedup training on Hooper platform, via torchao.

Modification

internlm/core
internlm/quantization

BC-breaking (Optional)

None

Use cases (Optional)

None

Checklist

Before PR:

Pre-commit or other linting tools are used to fix the potential lint issues.
Bug fixes are fully covered by unit tests, the case that causes the bug should be added in the unit tests.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, like docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with those projects.
CLA has been signed and all committers have signed the CLA in this PR.

…nputs and weights

zigzagcai · 2024-11-07T03:15:45Z

Current Status:

Workable with FP8 dynamic scaling + torch.compile, delivering 10%~15% throughput speedup. In the test case of 7B InternLM1 model (layers was reduced from 32 to 16, to make it runnable on 1 node with 2x GPUs) on 2x H100 GPUs
Workable with FP8 delayed scaling (input, weight) + torch.compile, also delivering 10%~15% throughput speedup
Meet compilation error with FP8 delayed scaling (input, weight, grad) + torch.compile, seems to be some error with torch compiler in the backward pass of all_reduce amax values, currently waiting for the response from torchao developers or trying on my own to solve this issue.

zigzagcai added 4 commits October 30, 2024 07:12

naive implementation of FP8 with torchao dynamic scaling

e6f5562

update fp8 linear implementation

dc7b940

Merge branch 'develop' into fp8_enable

92e40ab

update fp8 handler

c6ddc54

mm-assistant bot assigned sunpengsdu Nov 6, 2024

zigzagcai changed the title ~~feat(fp8): enable integration with FP8 library~~ feat(fp8): [Work In Progress] enable FP8 training Nov 6, 2024

refine implementation for fp8 and make delayed scaling optional for i…

e8ca883

…nputs and weights

zigzagcai mentioned this pull request Nov 6, 2024

torchao.float8 not working on PyTorch 2.4.1 and how does torchao handle FP8 autocast? pytorch/ao#1159

Open

fix pylint

29fb61c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fp8): [Work In Progress] enable FP8 training #369

feat(fp8): [Work In Progress] enable FP8 training #369

zigzagcai commented Nov 6, 2024 •

edited

Loading

zigzagcai commented Nov 7, 2024 •

edited

Loading

feat(fp8): [Work In Progress] enable FP8 training #369

Are you sure you want to change the base?

feat(fp8): [Work In Progress] enable FP8 training #369

Conversation

zigzagcai commented Nov 6, 2024 • edited Loading

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

zigzagcai commented Nov 7, 2024 • edited Loading

zigzagcai commented Nov 6, 2024 •

edited

Loading

zigzagcai commented Nov 7, 2024 •

edited

Loading