We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I run litgpt finetune lora --data Alpaca
litgpt finetune lora --data Alpaca
error:
{'checkpoint_dir': PosixPath('checkpoints/TinyLlama/TinyLlama-1.1B-Chat-v1.0'), 'data': Alpaca(mask_prompt=False, val_split_fraction=0.03865, prompt_style=<litgpt.prompts.Alpaca object at 0x7f1976ff0d00>, ignore_index=-100, seed=42, num_workers=4, download_dir=PosixPath('data/alpaca')), 'devices': 3, 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100, initial_validation=False), 'logger_name': 'csv', 'lora_alpha': 16, 'lora_dropout': 0.05, 'lora_head': False, 'lora_key': False, 'lora_mlp': False, 'lora_projection': False, 'lora_query': True, 'lora_r': 8, 'lora_value': True, 'out_dir': PosixPath('out/finetune/lora'), 'precision': None, 'quantize': None, 'seed': 1337, 'train': TrainArgs(save_interval=1000, log_interval=1, global_batch_size=16, micro_batch_size=1, lr_warmup_steps=100, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=None, tie_embeddings=None, learning_rate=0.0003, weight_decay=0.02, beta1=0.9, beta2=0.95, max_norm=None, min_lr=6e-05)} Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/3 {'checkpoint_dir': PosixPath('checkpoints/TinyLlama/TinyLlama-1.1B-Chat-v1.0'), 'data': Alpaca(mask_prompt=False, val_split_fraction=0.03865, prompt_style=<litgpt.prompts.Alpaca object at 0x7f6b4bd8fd60>, ignore_index=-100, seed=42, num_workers=4, download_dir=PosixPath('data/alpaca')), 'devices': 3, 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100, initial_validation=False), 'logger_name': 'csv', 'lora_alpha': 16, 'lora_dropout': 0.05, 'lora_head': False, 'lora_key': False, 'lora_mlp': False, 'lora_projection': False, 'lora_query': True, 'lora_r': 8, 'lora_value': True, 'out_dir': PosixPath('out/finetune/lora'), 'precision': None, 'quantize': None, 'seed': 1337, 'train': TrainArgs(save_interval=1000, log_interval=1, global_batch_size=16, micro_batch_size=1, lr_warmup_steps=100, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=None, tie_embeddings=None, learning_rate=0.0003, weight_decay=0.02, beta1=0.9, beta2=0.95, max_norm=None, min_lr=6e-05)} {'checkpoint_dir': PosixPath('checkpoints/TinyLlama/TinyLlama-1.1B-Chat-v1.0'), 'data': Alpaca(mask_prompt=False, val_split_fraction=0.03865, prompt_style=<litgpt.prompts.Alpaca object at 0x7fca4ffcbb20>, ignore_index=-100, seed=42, num_workers=4, download_dir=PosixPath('data/alpaca')), 'devices': 3, 'eval': EvalArgs(interval=100, max_new_tokens=100, max_iters=100, initial_validation=False), 'logger_name': 'csv', 'lora_alpha': 16, ... 'precision': None, 'quantize': None, 'seed': 1337, 'train': TrainArgs(save_interval=1000, log_interval=1, global_batch_size=16, micro_batch_size=1, lr_warmup_steps=100, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=None, tie_embeddings=None, learning_rate=0.0003, weight_decay=0.02, beta1=0.9, beta2=0.95, max_norm=None, min_lr=6e-05)} Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?5a6f3aaa-cfd8-4716-a486-c1f66bb0ae64) or open in a [text editor](command:workbench.action.openLargeOutput?5a6f3aaa-cfd8-4716-a486-c1f66bb0ae64). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)... Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/3 Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/3 ---------------------------------------------------------------------------------------------------- distributed_backend=nccl All distributed processes registered. Starting with 3 processes ---------------------------------------------------------------------------------------------------- [rank: 0] Seed set to 1337 [rank: 2] Seed set to 1337 [rank: 1] Seed set to 1337 Number of trainable parameters: 1,126,400 Number of non-trainable parameters: 1,100,048,384 The longest sequence length in the train data is 1305, the model's maximum sequence length is 1305 and context length is 2048 Validating ... Traceback (most recent call last): File "/home/jwan3704/litgpt-venv/bin/litgpt", line 8, in <module> sys.exit(main()) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/__main__.py", line 143, in main fn(**kwargs) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py", line 144, in setup fabric.launch(main, devices, seed, config, data, checkpoint_dir, out_dir, train, eval) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py", line 845, in launch return self._wrap_and_launch(function, self, *args, **kwargs) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py", line 931, in _wrap_and_launch return to_run(*args, **kwargs) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/lightning/fabric/fabric.py", line 936, in _wrap_with_setup return to_run(*args, **kwargs) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py", line 197, in main fit( File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py", line 259, in fit validate(fabric, model, val_dataloader, dataclasses.replace(eval, max_iters=2)) # sanity check File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/finetune/lora.py", line 354, in validate logits = model(input_ids) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) ... lora = self.zero_pad(after_B) * self.scaling # (64, 64, 256) after zero_pad (64, 64, 384) File "/share/home/jwan3704/litgpt-venv/lib/python3.9/site-packages/litgpt/lora.py", line 345, in zero_pad self._lora_ind_cache[result.device] = lora_ind = self._lora_ind.to(result.device) NotImplementedError: Cannot copy out of meta tensor; no data!
The text was updated successfully, but these errors were encountered:
Haven't had a chance to test or try it yet, but this looks familiar @robieta re #1374:
self._lora_ind_cache[result.device] = lora_ind = self._lora_ind.to(result.device)
NotImplementedError: Cannot copy out of meta tensor; no data!
It may or may not be related. But I'm curious when you implemented #1374 have you tested in on multi-GPU?
Sorry, something went wrong.
assign=True
Should be fixed by #770
No branches or pull requests
When I run
litgpt finetune lora --data Alpaca
error:
The text was updated successfully, but these errors were encountered: