adding support for GPT-J #1

ImanHosseini · 2022-12-06T03:42:37Z

Adding LinearGELU kernel. This first commit is a small change - this kernel will be used in GPT-J.

Guangxuan-Xiao · 2022-12-06T05:23:06Z

Could you add a test file in tests to verify the correctness? Thank you!

ImanHosseini · 2022-12-06T09:49:13Z

The GELU unit is working (passes the test), but the GPT-J - the Attention - is failing (test shows high r2). Any help on what's going wrong there?

Guangxuan-Xiao · 2022-12-06T21:55:08Z

Hi Iman, sorry I don't have the bandwidth recently to take a detailed look at your implementation. I can provide some common pitfalls and advice for implementation.

First, be careful with the ordering of matrices, especially the bmm functions. The CUTLASS uses column-major matrices, and the t in the function name means transposed column-major matrices, which are row-major matrices.
Second, you should be aware of the contiguity of tensor elements. The CUTLASS kernels only receive a pointer from the PyTorch tensor, so if you don't call contiguous() after some operations like transpose() and permute(), the content of the tensor will remain the same.
You can cache each module's input and output activations to compare the ground truth with your modules' outputs. Then you will be able to spot the bug and fix it :).

ImanHosseini · 2022-12-07T12:46:59Z

Thanks for the advice! It helped me narrow down and fix many of the problems.
I have added tests for each submodule and every check passes only 'test_gptj' (end to end test) is left. I have a question, in test_opt here we have:

torch-int/tests/test_opt.py

Line 49 in d03a8da

int8_model_path = '/dataset/opt/opt-13b-smoothquant'

-> what am I supposed to do here? How am I supposed to generate this (for my gptj case) -- for this test?
-> I added the new changes. the test for Int8GPTJBlock module has low R2, and this incorporates the attention module inside it. Next step is GPTJModel which is basically a stack of GPTJBlocks -> and this one has high R2, so something is going wrong there.

Guangxuan-Xiao · 2022-12-08T07:19:23Z

what am I supposed to do here? How am I supposed to generate this (for my gptj case) -- for this test?

You can refer to this script to generate the GPT-J INT8 model.

Next step is GPTJModel which is basically a stack of GPTJBlocks -> and this one has high R2, so something is going wrong there.

I am not sure why this happens. According to the difference in the activation value of the cache, from which layer is there a significant error? You can continue use this method to locate the bug.

ImanHosseini · 2022-12-11T11:32:30Z

It seems to be an issue with how the model is reconstructed, to try it out, I started with OPT - so no change to the repo, just trying to recreate results for OPT-350M-:
first, I generate act_scales and then export the model (OPT-350M):

python3 examples/generate_act_scales.py --model-name facebook/opt-350m --output-path act_scales/opt-350m.pt
python3 examples/export_int8_model.py --model-name facebook/opt-350m --act-scales act_scales/opt-350m.pt --output-path i8m/opt-350m-sq

Then I ran 'test_opt':

python3 tests/test_opt.py

Once the original (which gives an accuracy of ~64% as expected) and then with "int8" (with what I got from step 1: 'i8m/opt-350m-sq')-> this one showed an accuracy of 0!
[I did not change the dataset or anything else]
update:
This was getting flipped -not sure why- : https://github.com/huggingface/transformers/blob/799cea64ac1029d66e9e58f18bc6f47892270723/src/transformers/modeling_utils.py#L2293 (with no error, no warning, nothing)- > which is bad?
I commented those lines in my transformer installation and now I face many of these errors:

size mismatch for model.decoder.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([1, 1024]).

Guangxuan-Xiao · 2022-12-12T20:46:48Z

Currently, we haven't supported the OPT-350M model since it uses post-layernorm. Please try with a 125m or 1.7b model.

Guangxuan-Xiao · 2022-12-15T22:47:14Z

Hi Iman, could you check the box Allow edits from maintainers so I can work on this PR?

ImanHosseini · 2022-12-15T22:55:12Z

Sure [it seems like it was checked already - let me know if its not?], but can you please don't look into this until next week? I have found out all the issues and I can now recreate results for GPT-J with high accuracy, I will clean up the changes and commit by the end of next week - make any changes you want over that commit -, is that ok?

Guangxuan-Xiao · 2022-12-16T00:20:09Z

Great! If you have any further questions or need assistance, feel free to ask. 😄

ImanHosseini · 2022-12-23T04:20:08Z

To generate that "model_dec_scales.json" I used a tweaked version of smoothquant: https://gist.github.com/ImanHosseini/ad922cc9c01e05bc16c59926b6f35fd9
Which generates those based on the codegen models.
I added that "layers_to_keep" parameter to be able to test how quantization changes affect that model as I start with keeping all the original layers intact, and then start quantizing more layers.
One thing I am considering for higher accuracy is to keep the MLP module as it is (no quantization), as it seems most accuracy is lost there? - and this would end up being better than say, only quantizing half the layers: for the same memory benefit, it would lead to better accuracy? -
Thoughts?

LeoYoo247 · 2023-03-18T13:21:46Z

@Guangxuan-Xiao Did you have the chance to look into this PR?

adding LinearGELU kernel

b7aaadf

ImanHosseini added 2 commits December 6, 2022 01:29

adding a test for linear_gelu

49b6ec7

adding support for GPT-J + a test for it

60a57ad

ImanHosseini changed the title ~~adding LinearGELU kernel~~ adding support for GPT-J Dec 6, 2022

trying to fix bugs

fc015ec

ImanHosseini added 2 commits December 7, 2022 12:21

-> updates + test, mlp has high r2 :(

88e17b2

fix

e7ad7ad

Guangxuan-Xiao assigned Guangxuan-Xiao and unassigned Guangxuan-Xiao Dec 8, 2022

Guangxuan-Xiao self-requested a review December 8, 2022 18:07

changes

2163a16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding support for GPT-J #1

adding support for GPT-J #1

ImanHosseini commented Dec 6, 2022 •

edited

Loading

Guangxuan-Xiao commented Dec 6, 2022

ImanHosseini commented Dec 6, 2022 •

edited

Loading

Guangxuan-Xiao commented Dec 6, 2022

ImanHosseini commented Dec 7, 2022 •

edited

Loading

Guangxuan-Xiao commented Dec 8, 2022

ImanHosseini commented Dec 11, 2022 •

edited

Loading

Guangxuan-Xiao commented Dec 12, 2022

Guangxuan-Xiao commented Dec 15, 2022

ImanHosseini commented Dec 15, 2022 •

edited

Loading

Guangxuan-Xiao commented Dec 16, 2022

ImanHosseini commented Dec 23, 2022 •

edited

Loading

LeoYoo247 commented Mar 18, 2023

adding support for GPT-J #1

Are you sure you want to change the base?

adding support for GPT-J #1

Conversation

ImanHosseini commented Dec 6, 2022 • edited Loading

Guangxuan-Xiao commented Dec 6, 2022

ImanHosseini commented Dec 6, 2022 • edited Loading

Guangxuan-Xiao commented Dec 6, 2022

ImanHosseini commented Dec 7, 2022 • edited Loading

Guangxuan-Xiao commented Dec 8, 2022

ImanHosseini commented Dec 11, 2022 • edited Loading

Guangxuan-Xiao commented Dec 12, 2022

Guangxuan-Xiao commented Dec 15, 2022

ImanHosseini commented Dec 15, 2022 • edited Loading

Guangxuan-Xiao commented Dec 16, 2022

ImanHosseini commented Dec 23, 2022 • edited Loading

LeoYoo247 commented Mar 18, 2023

ImanHosseini commented Dec 6, 2022 •

edited

Loading

ImanHosseini commented Dec 6, 2022 •

edited

Loading

ImanHosseini commented Dec 7, 2022 •

edited

Loading

ImanHosseini commented Dec 11, 2022 •

edited

Loading

ImanHosseini commented Dec 15, 2022 •

edited

Loading

ImanHosseini commented Dec 23, 2022 •

edited

Loading