jamba liger fused linear+xentropy #102

winglian · 2024-08-26T17:15:19Z

Summary

Testing Done

Hardware Type:
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

ByronHsu · 2024-08-26T17:27:18Z

awesome! please make sure you add both conv (w logits and w/o logits) and unit tests. we are very focused on testing

ByronHsu · 2024-08-26T17:28:14Z

#63

yubofredwang · 2024-09-04T18:07:58Z

I added the following additional monkey patch for Jamba.

    from transformers.models.jamba import modeling_jamba
    if rms_norm:
        # https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/gemma/modeling_gemma.py#L109
        modeling_jamba.JambaRMSNorm = LigerRMSNorm
    if cross_entropy:
        modeling_jamba.CrossEntropyLoss = LigerCrossEntropyLoss
    if swiglu:
        modeling_jamba.JambaMLP = LigerSwiGLUMLP

However, convergence test seems to be failing for some values in the tensor:

E           Mismatch at index (0, 5): tensor1[(0, 5)] = 1.1513792276382446, tensor2[(0, 5)] = 1.1512681245803833
E           Mismatch at index (0, 27): tensor1[(0, 27)] = 0.6227690577507019, tensor2[(0, 27)] = 0.6227344870567322
E           Mismatch at index (0, 28): tensor1[(0, 28)] = 0.7790964841842651, tensor2[(0, 28)] = 0.7790292501449585
E           Mismatch at index (0, 29): tensor1[(0, 29)] = 0.524261474609375, tensor2[(0, 29)] = 0.5243569612503052
E           Mismatch at index (0, 30): tensor1[(0, 30)] = 0.8967938423156738, tensor2[(0, 30)] = 0.8968125581741333

I tracked this down to LigerRMSNorm but needs more time to investigate why there is a difference

ByronHsu · 2024-09-04T18:13:57Z

src/liger_kernel/transformers/monkey_patch.py

+    if rope:
+        modeling_jamba.apply_rotary_pos_emb = liger_rotary_pos_emb
+    if rms_norm:
+        # https://github.com/huggingface/transformers/blob/v4.44.2/src/transformers/models/gemma/modeling_gemma.py#L109


nit: the comment is wrong

ByronHsu · 2024-09-04T18:14:20Z

src/liger_kernel/transformers/monkey_patch.py

+    if cross_entropy:
+        modeling_jamba.CrossEntropyLoss = LigerCrossEntropyLoss
+    if swiglu:
+        modeling_jamba.JambaMLP = LigerSwiGLUMLP


where is lce_forward?

yubofredwang · 2024-09-05T07:05:19Z

HI @winglian created a PR towards main branch of your fork. Do you want to merge it first and then update this PR to base on that? winglian#1

Or I can create a separate PR to linkedin:main #214

@ByronHsu thoughts?

winglian · 2024-09-05T13:49:20Z

@yubofredwang if your PR captures all the changes, I'm happy to have your PR supersede mine. thanks!

ByronHsu · 2024-09-06T21:17:31Z

@yubofredwang there are few conflicts

jamba liger fused linear+xentropy

74a30f6

yundai424 linked an issue Aug 26, 2024 that may be closed by this pull request

[feat] Add jamba support #63

Open

ByronHsu reviewed Sep 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jamba liger fused linear+xentropy #102

jamba liger fused linear+xentropy #102

winglian commented Aug 26, 2024

ByronHsu commented Aug 26, 2024

ByronHsu commented Aug 26, 2024

yubofredwang commented Sep 4, 2024

ByronHsu Sep 4, 2024

ByronHsu Sep 4, 2024

yubofredwang commented Sep 5, 2024 •

edited

Loading

winglian commented Sep 5, 2024

ByronHsu commented Sep 6, 2024

jamba liger fused linear+xentropy #102

Are you sure you want to change the base?

jamba liger fused linear+xentropy #102

Conversation

winglian commented Aug 26, 2024

Summary

Testing Done

ByronHsu commented Aug 26, 2024

ByronHsu commented Aug 26, 2024

yubofredwang commented Sep 4, 2024

ByronHsu Sep 4, 2024

Choose a reason for hiding this comment

ByronHsu Sep 4, 2024

Choose a reason for hiding this comment

yubofredwang commented Sep 5, 2024 • edited Loading

winglian commented Sep 5, 2024

ByronHsu commented Sep 6, 2024

yubofredwang commented Sep 5, 2024 •

edited

Loading