Half precision fixes #606

carmocca · 2023-10-02T16:18:06Z

Fixes #602

carmocca · 2023-10-03T01:09:15Z

lit_gpt/model.py

@@ -158,15 +157,15 @@ def forward(
        h = self.attn(n_1, cos, sin, mask, input_pos)
        if self.config.parallel_residual:
            n_2 = n_1 if self.config.shared_attention_norm else self.norm_2(x)
-            x = x + h + self.mlp(n_2)
+            x = self.mlp(n_2) + h + x


Addition is not commutative in fp16, and this is how GptNeox implements the order

carmocca · 2023-10-03T01:12:23Z

tests/test_model.py

-            ],
+            torch.device("cuda"), torch.float16, marks=[
+                # the reference does softmax upscaled to fp32 during attention. additionally, the final layernorm input
+                # is slightly different


I wasn't able to find out why the final layernorm input is different. If you print x.sum() the value matches but some positions are not the same

If I change the dtype for cpu test from float32 to blofat16 some combinations will fail.

Probably for the same reason that the float16 tests are xfailed

carmocca · 2023-10-03T01:12:46Z

tests/test_model.py

-                pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA"),
-            ],
+            torch.device("cuda"), torch.float16, marks=[
+                # the reference does softmax upscaled to fp32 during attention. additionally, the final layernorm input


Doing this upscaling would require giving up scaled_dot_product_attention

awaelchli · 2023-10-03T09:37:16Z

lit_gpt/model.py

-    # this is to mimic the behaviour of complex32, else we will get different results
-    if dtype in (torch.float16, torch.bfloat16, torch.int8):
-        return cos.half(), sin.half()
-    return cos, sin


What are the implications? These changes were necessary to have parity with the original llama rope cache implementation.

Not doing this matches the HF Rope implementations. HF keeps these in float32.

This piece of code comes from lit-llama, where it was ported from the original facebookresearch repo. Since this repo uses the HF implementations as references instead, I think it's fine to remove this.

The mistral original release still uses complex numbers: https://github.com/mistralai/mistral-src/blob/main/mistral/rope.py

I don't think there's a solution that is numerically precise and performant across implementations. We just need to choose which reference implementation we compare this with.

Half precision fixes

8622c40

carmocca self-assigned this Oct 2, 2023

carmocca commented Oct 3, 2023

View reviewed changes

Fix parametrization

d35d5e0

carmocca force-pushed the carmocca/half-precision-fixes branch from 8afa6e7 to d35d5e0 Compare October 3, 2023 01:28

carmocca marked this pull request as ready for review October 3, 2023 01:30

carmocca requested review from awaelchli and lantiga as code owners October 3, 2023 01:30

oops

5f99a44

awaelchli reviewed Oct 3, 2023

View reviewed changes

carmocca mentioned this pull request Oct 19, 2023

Fix unused arguments #657

Merged

Merge branch 'main' into carmocca/half-precision-fixes

c719f78

carmocca merged commit 6178c7c into main Oct 24, 2023
4 of 5 checks passed

carmocca deleted the carmocca/half-precision-fixes branch October 24, 2023 15:25

carmocca assigned carmocca and unassigned carmocca Nov 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half precision fixes #606

Half precision fixes #606

carmocca commented Oct 2, 2023

carmocca Oct 3, 2023

carmocca Oct 3, 2023

Andrei-Aksionov Oct 24, 2023

carmocca Oct 24, 2023

carmocca Oct 3, 2023

awaelchli Oct 3, 2023

carmocca Oct 3, 2023

Half precision fixes #606

Half precision fixes #606

Conversation

carmocca commented Oct 2, 2023

carmocca Oct 3, 2023

Choose a reason for hiding this comment

carmocca Oct 3, 2023

Choose a reason for hiding this comment

Andrei-Aksionov Oct 24, 2023

Choose a reason for hiding this comment

carmocca Oct 24, 2023

Choose a reason for hiding this comment

carmocca Oct 3, 2023

Choose a reason for hiding this comment

awaelchli Oct 3, 2023

Choose a reason for hiding this comment

carmocca Oct 3, 2023

Choose a reason for hiding this comment