Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Gemma-7b&&Gemma-2b #171

Merged
merged 37 commits into from
Mar 22, 2024
Merged

Gemma-7b&&Gemma-2b #171

merged 37 commits into from
Mar 22, 2024

Conversation

intellinjun
Copy link
Contributor

Type of Change

feature or bug fix or documentation or others
API changed or not

Description

detail description
Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: intellinjun <[email protected]>
Signed-off-by: intellinjun <[email protected]>
Signed-off-by: intellinjun <[email protected]>
@intellinjun
Copy link
Contributor Author

image
f32 inference result

Signed-off-by: intellinjun <[email protected]>
@intellinjun
Copy link
Contributor Author

image
gemma_2b inference result

@intellinjun
Copy link
Contributor Author

<style> </style>
model_name model_type piqa
gemma_2b weight_dtype f32 77.09
gemma_2b w4cint8g32 75.68
gemma_2b pytorch 77.3
gemma_7b weight_dtype f32 80.25
gemma_7b w4cint8g32 78.56
gemma_7b pytorch 81.2

@intellinjun
Copy link
Contributor Author

<style> </style>

model_name model_type piqa
gemma_2b weight_dtype f32 77.09
gemma_2b w4cint8g32 75.68
gemma_2b pytorch 77.3
gemma_7b weight_dtype f32 80.25
gemma_7b w4cint8g32 78.56
gemma_7b pytorch 81.2

here is gemma7b&2b accuray,the inference result of gemma7b is bad when quantize config is weight_dtype int4 compute dtype int8 group_size 128, but gemma2b's result looks good.

@intellinjun intellinjun marked this pull request as ready for review March 21, 2024 01:51
@intellinjun intellinjun requested a review from Zhenzhong1 March 21, 2024 01:51
@intellinjun intellinjun changed the title Gemma-7b Gemma-7b&&Gemma-2b Mar 21, 2024
intellinjun and others added 2 commits March 21, 2024 10:17
@intellinjun
Copy link
Contributor Author

https://huggingface.co/google/gemma-7b-it/discussions/38
Here are discussions about gemma7b gguf model's bad output.

Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/supported_models.md Show resolved Hide resolved
neural_speed/application/CMakeLists.txt Outdated Show resolved Hide resolved
docs/supported_models.md Show resolved Hide resolved
Signed-off-by: intellinjun <[email protected]>
Signed-off-by: intellinjun <[email protected]>
@intellinjun
Copy link
Contributor Author

intellinjun commented Mar 21, 2024

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/103/artifact/report.html

gemma-2b performance, looks so bad.
@luoyu-intel , I profiled the gemma-2b performance, and find the main cost is on the innerproduct, gemma-2b has two innerproduct, lm_head.weight[2048, 256000] and token_embd.weight[256000,2048](use fp32 cause if gemma-7b use q40 quant this weight will get bad result), can you have solution to fix it?
image

@intellinjun
Copy link
Contributor Author

@VincyZhang VincyZhang merged commit e4c5f71 into main Mar 22, 2024
11 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants