Gemma-7b&&Gemma-2b #171

intellinjun · 2024-03-13T07:55:21Z

Type of Change

feature or bug fix or documentation or others
API changed or not

Description

detail description
Issues: xxx

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-03-14T02:29:06Z

f32 inference result

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-03-18T08:59:28Z

gemma_2b inference result

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-03-21T01:45:00Z

model_name	model_type	piqa
gemma_2b	weight_dtype f32	77.09
gemma_2b	w4cint8g32	75.68
gemma_2b	pytorch	77.3
gemma_7b	weight_dtype f32	80.25
gemma_7b	w4cint8g32	78.56
gemma_7b	pytorch	81.2

intellinjun · 2024-03-21T01:49:32Z

<style> </style>
model_name model_type piqa
gemma_2b weight_dtype f32 77.09
gemma_2b w4cint8g32 75.68
gemma_2b pytorch 77.3
gemma_7b weight_dtype f32 80.25
gemma_7b w4cint8g32 78.56
gemma_7b pytorch 81.2

here is gemma7b&2b accuray,the inference result of gemma7b is bad when quantize config is weight_dtype int4 compute dtype int8 group_size 128, but gemma2b's result looks good.

neural_speed/convert/convert_bloom.py

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-03-21T02:30:13Z

https://huggingface.co/google/gemma-7b-it/discussions/38
Here are discussions about gemma7b gguf model's bad output.

tests/model-test/cpp_graph_inference.sh

a32543254

LGTM

neural_speed/core/layers/ip_fusion_ffn.cpp

docs/supported_models.md

neural_speed/application/CMakeLists.txt

docs/supported_models.md

Signed-off-by: intellinjun <[email protected]>

intellinjun · 2024-03-21T10:48:31Z

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/103/artifact/report.html

gemma-2b performance, looks so bad.
@luoyu-intel , I profiled the gemma-2b performance, and find the main cost is on the innerproduct, gemma-2b has two innerproduct, lm_head.weight[2048, 256000] and token_embd.weight[256000,2048](use fp32 cause if gemma-7b use q40 quant this weight will get bad result), can you have solution to fix it?

intellinjun · 2024-03-22T06:18:28Z

new gemma2 performance, @luoyu-intel ,thanks.
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/107/artifact/report.html

intellinjun added 3 commits March 6, 2024 10:48

half enable gemma

06b049c

Signed-off-by: intellinjun <[email protected]>

merge main

39fe077

Signed-off-by: intellinjun <[email protected]>

enable gemma-7b

5e397f5

Signed-off-by: intellinjun <[email protected]>

fix convert error

8f33626

Signed-off-by: intellinjun <[email protected]>

VincyZhang added the v1.0a label Mar 15, 2024

intellinjun and others added 6 commits March 18, 2024 11:20

enable gemma f32

5f4c8b5

Signed-off-by: intellinjun <[email protected]>

Merge branch 'main' into gemma

b8139c2

Merge branch 'gemma' of https://github.com/intel/neural-speed into gemma

6c9b67a

enabel ffn fusion

6d89370

Signed-off-by: intellinjun <[email protected]>

enable gemma-2b

5adb0f0

Signed-off-by: intellinjun <[email protected]>

Update gemma.cpp

5cc7906

intellinjun and others added 10 commits March 19, 2024 12:23

fix format error

d75d41d

Signed-off-by: intellinjun <[email protected]>

Merge branch 'gemma' of https://github.com/intel/neural-speed into gemma

ae941d0

fix kv_cache_init

91a2d29

Signed-off-by: intellinjun <[email protected]>

add n_embd_head_k to convert

bf92567

Signed-off-by: intellinjun <[email protected]>

add test script

373f453

Signed-off-by: intellinjun <[email protected]>

Update model_utils.cpp

8e6985a

fix kv_cache

fa33ef9

Signed-off-by: intellinjun <[email protected]>

Merge branch 'gemma' of https://github.com/intel/neural-speed into gemma

a097712

fix kv_cache init

f26f2a6

Signed-off-by: intellinjun <[email protected]>

fix profiling error

5b88374

Signed-off-by: intellinjun <[email protected]>

Merge branch 'main' into gemma

28af858

intellinjun marked this pull request as ready for review March 21, 2024 01:51

intellinjun requested review from airMeng, a32543254 and zhentaoyu March 21, 2024 01:51

intellinjun requested a review from Zhenzhong1 March 21, 2024 01:51

intellinjun changed the title ~~Gemma-7b~~ Gemma-7b&&Gemma-2b Mar 21, 2024

airMeng reviewed Mar 21, 2024

View reviewed changes

neural_speed/convert/convert_bloom.py Show resolved Hide resolved

intellinjun and others added 2 commits March 21, 2024 10:17

Update ne_layers.c

d819e9a

fix format error

a840af5

Signed-off-by: intellinjun <[email protected]>

a32543254 reviewed Mar 21, 2024

View reviewed changes

tests/model-test/cpp_graph_inference.sh Show resolved Hide resolved

a32543254 approved these changes Mar 21, 2024

View reviewed changes

zhentaoyu approved these changes Mar 21, 2024

View reviewed changes

neural_speed/core/layers/ip_fusion_ffn.cpp Show resolved Hide resolved

a32543254 reviewed Mar 21, 2024

View reviewed changes

docs/supported_models.md Show resolved Hide resolved

neural_speed/application/CMakeLists.txt Outdated Show resolved Hide resolved

docs/supported_models.md Show resolved Hide resolved

a32543254 reviewed Mar 21, 2024

View reviewed changes

docs/supported_models.md Show resolved Hide resolved

intellinjun added 2 commits March 21, 2024 11:24

updata supported_models

a67520a

Signed-off-by: intellinjun <[email protected]>

updata supported_models

488785a

Signed-off-by: intellinjun <[email protected]>

Zhenzhong1 approved these changes Mar 21, 2024

View reviewed changes

intellinjun and others added 10 commits March 21, 2024 13:33

Merge branch 'main' into gemma

91dc266

Update CMakeLists.txt

a86d0bf

fix binary inference error

8c9d0ab

Signed-off-by: intellinjun <[email protected]>

Merge branch 'gemma' of https://github.com/intel/neural-speed into gemma

f07f60e

Merge branch 'main' into gemma

4bf56ff

Update convert_chatglm.py

3106b1b

fix format error

3c6fc12

Signed-off-by: intellinjun <[email protected]>

update test script

d1c5a9c

Signed-off-by: intellinjun <[email protected]>

fix mha fusion error

2e7f56e

Signed-off-by: intellinjun <[email protected]>

Update convert_chatglm.py

c09ff7a

intellinjun added 2 commits March 22, 2024 10:25

Update gemma_utils.cpp

055f840

Update gemma_utils.cpp

bb59f1f

VincyZhang merged commit e4c5f71 into main Mar 22, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma-7b&&Gemma-2b #171

Gemma-7b&&Gemma-2b #171

intellinjun commented Mar 13, 2024

intellinjun commented Mar 14, 2024

intellinjun commented Mar 18, 2024

intellinjun commented Mar 21, 2024

intellinjun commented Mar 21, 2024

intellinjun commented Mar 21, 2024

a32543254 left a comment

intellinjun commented Mar 21, 2024 •

edited

Loading

intellinjun commented Mar 22, 2024

Gemma-7b&&Gemma-2b #171

Gemma-7b&&Gemma-2b #171

Conversation

intellinjun commented Mar 13, 2024

Type of Change

Description

Expected Behavior & Potential Risk

How has this PR been tested?

Dependency Change?

intellinjun commented Mar 14, 2024

intellinjun commented Mar 18, 2024

intellinjun commented Mar 21, 2024

intellinjun commented Mar 21, 2024

intellinjun commented Mar 21, 2024

a32543254 left a comment

Choose a reason for hiding this comment

intellinjun commented Mar 21, 2024 • edited Loading

intellinjun commented Mar 22, 2024

intellinjun commented Mar 21, 2024 •

edited

Loading