Fix P-tuning for Llama based models #9297

apanteleev · 2024-05-23T17:40:33Z

What does this PR do ?

Fixes inference with P-tuning.

Collection: [Note which collection this PR will affect]

Changelog

Added the BOS token for Llama, Mistral and Mixtral.

PR Type:

New Feature
Bugfix
Documentation

Signed-off-by: Alexey Panteleev <[email protected]>

…port process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]>

Signed-off-by: apanteleev <[email protected]>

oyilmaz-nvidia

LGTM, thanks!

* Added the BOS token for Llama, Mistral and Mixtral. Signed-off-by: Alexey Panteleev <[email protected]> * Don't load an existing TRT-LLM model before export to speed up the export process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]> * Apply isort and black reformatting Signed-off-by: apanteleev <[email protected]> --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]>

* Fix P-tuning for Llama based models (#9297) * Added the BOS token for Llama, Mistral and Mixtral. Signed-off-by: Alexey Panteleev <[email protected]> * Don't load an existing TRT-LLM model before export to speed up the export process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]> * Apply isort and black reformatting Signed-off-by: apanteleev <[email protected]> --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]> * Fix the export test --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Signed-off-by: Onur Yilmaz <[email protected]> Co-authored-by: Alexey Panteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]>

* Fix P-tuning for Llama based models (NVIDIA#9297) * Added the BOS token for Llama, Mistral and Mixtral. Signed-off-by: Alexey Panteleev <[email protected]> * Don't load an existing TRT-LLM model before export to speed up the export process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]> * Apply isort and black reformatting Signed-off-by: apanteleev <[email protected]> --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]> * Fix the export test --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Signed-off-by: Onur Yilmaz <[email protected]> Co-authored-by: Alexey Panteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]> Signed-off-by: Boxiang Wang <[email protected]>

* Fix P-tuning for Llama based models (#9297) * Added the BOS token for Llama, Mistral and Mixtral. Signed-off-by: Alexey Panteleev <[email protected]> * Don't load an existing TRT-LLM model before export to speed up the export process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]> * Apply isort and black reformatting Signed-off-by: apanteleev <[email protected]> --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]> * Fix the export test --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Signed-off-by: Onur Yilmaz <[email protected]> Co-authored-by: Alexey Panteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]> Signed-off-by: Jan Lasek <[email protected]>

* Fix P-tuning for Llama based models (NVIDIA#9297) * Added the BOS token for Llama, Mistral and Mixtral. Signed-off-by: Alexey Panteleev <[email protected]> * Don't load an existing TRT-LLM model before export to speed up the export process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]> * Apply isort and black reformatting Signed-off-by: apanteleev <[email protected]> --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]> * Fix the export test --------- Signed-off-by: Alexey Panteleev <[email protected]> Signed-off-by: apanteleev <[email protected]> Signed-off-by: Onur Yilmaz <[email protected]> Co-authored-by: Alexey Panteleev <[email protected]> Co-authored-by: apanteleev <[email protected]> Co-authored-by: Onur Yilmaz <[email protected]>

apanteleev and others added 3 commits May 23, 2024 10:26

Added the BOS token for Llama, Mistral and Mixtral.

13da431

Signed-off-by: Alexey Panteleev <[email protected]>

Don't load an existing TRT-LLM model before export to speed up the ex…

bf115cd

…port process and avoid possible contamination from previous runs. Signed-off-by: Alexey Panteleev <[email protected]>

Apply isort and black reformatting

dd5e5cd

Signed-off-by: apanteleev <[email protected]>

oyilmaz-nvidia added NLP Run CICD labels May 23, 2024

oyilmaz-nvidia approved these changes May 23, 2024

View reviewed changes

Merge branch 'r2.0.0rc0' into fix-ptuning-again

359cd26

oyilmaz-nvidia removed the Run CICD label May 23, 2024

github-actions bot removed the NLP label May 23, 2024

oyilmaz-nvidia added NLP Run CICD labels May 23, 2024

oyilmaz-nvidia merged commit f073ed9 into NVIDIA:r2.0.0rc0 May 23, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix P-tuning for Llama based models #9297

Fix P-tuning for Llama based models #9297

apanteleev commented May 23, 2024

oyilmaz-nvidia left a comment

Fix P-tuning for Llama based models #9297

Fix P-tuning for Llama based models #9297

Conversation

apanteleev commented May 23, 2024

What does this PR do ?

Changelog

oyilmaz-nvidia left a comment

Choose a reason for hiding this comment