Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LongLora for both full and lora fine-tuning #1350
base: main
Are you sure you want to change the base?
Add LongLora for both full and lora fine-tuning #1350
Changes from 6 commits
2ac1038
95a6539
b38f0ce
7bfe9ef
9facaf3
2dfa7a5
f7c6971
b35b79c
909ce04
6f631b3
e87c9d7
4867b0c
073b027
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder here what happens if the model has a longer context already. A good test case could be LongChat (supported in LitGPT).
I wonder if this should be a factor (2x the original context length) or None by default and then infer 2x the original context length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put a double check: one in the
validate_longlora_args
where if LongLora is used andlonglora_context_length <= model.block_size
then a warning is raised and LongLora is disabled; the other one before the model creation where I increase the model block-size and RoPE-condense-ratio only if LongLora is enabled andlonglora_context_length > model.block_size
. I can remove the second check and we can fallback to None as default, and in that case infer the 2xThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the other options? Are
"wte,norm,ln"
the only allowed ones or are there more?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, i didn't get it. I was looking at the model and if i'm not missing something i think that those are the only ones left other than the LoRA layers (controlled by the arguments in the
finetune/lora.py
script). I can add a check to prevent the user to input anything other a combination of those three layers?