Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

[Runtime Enhence] Extend long input tokens length #157

Merged
merged 22 commits into from
Mar 13, 2024

Conversation

Zhenzhong1
Copy link
Contributor

@Zhenzhong1 Zhenzhong1 commented Mar 7, 2024

Type of Change

New feature
Bug fixed
Doc refined

Description

  • limit the maximum tokens lengths for all models.
  • set the scratch_size_ratio

Expected Behavior & Potential Risk

N/A

How has this PR been tested?

https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/84/
https://inteltf-jenk.sh.intel.com/job/neural_speed_extension/85/

Dependency Change?

N/A

@Zhenzhong1
Copy link
Contributor Author

Zhenzhong1 commented Mar 8, 2024

@Zhenzhong1 Zhenzhong1 changed the title Zhenzhong/extend tokens length [Runtime Enhence] Extend more input tokens length Mar 11, 2024
@Zhenzhong1 Zhenzhong1 changed the title [Runtime Enhence] Extend more input tokens length [Runtime Enhence] Extend long input tokens length Mar 11, 2024
@Zhenzhong1 Zhenzhong1 marked this pull request as ready for review March 11, 2024 05:52
@Zhenzhong1 Zhenzhong1 requested a review from a32543254 March 11, 2024 05:53
Copy link
Contributor

@a32543254 a32543254 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Zhenzhong1
Copy link
Contributor Author

Zhenzhong1 commented Mar 11, 2024

@VincyZhang VincyZhang merged commit eb41b91 into main Mar 13, 2024
11 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants