-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for gradient_checkpointing #9
Comments
Hi, thanks for the request. In the recent commit, I have added initial support for gradient checkpointing (it just skips memory layers). As I am writing, it is not yet present in the Hugging Face repository, so to use it you can download code from the src directory in this repository and write something like this: from transformers import LlamaTokenizer
from .modeling_longllama import LongLlamaForCausalLM
import torch
MODEL_PATH = "syzymon/long_llama_3b"
tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)
model = LongLlamaForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=torch.float32) |
Thanks for your commit! Now I would like to fine-tune longllama, but the sequence is too long and it returns CUDA OOM (4x80G). I wonder if I could fine-tune longllama under a regular framework without support for long context (e.g. the training framework of alpaca or vicuna). If I could not, could you please release the fine-tuning code of longllama? |
I apologize for the late response. We have recently published the code that allows for fine-tuning the model on a single A100 80GB GPU. We use a total context size of 2048, with You can try the instruction+chat fine-tuned model in the Colab. For the Colab model, we provide the fine-tuning config and log of train loss. |
Thanks for your awesome work! There is a small problem: when I fine-tune long_llama with gradient_checkpointing, it raises an error:
Could you please update the code in transformers to make long_llama support gradient_checkpointing. I think it is useful for the community to use long_llama.
@CStanKonrad
The text was updated successfully, but these errors were encountered: