Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using finetuning command #91

Open
linyuan13 opened this issue Jul 25, 2024 · 3 comments
Open

Error when using finetuning command #91

linyuan13 opened this issue Jul 25, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@linyuan13
Copy link

Describe the bug
Error executing job with overrides: ['run_name=first_run', 'model=moirai_1.0_R_small', 'data=etth1', 'val_data=etth1']
Error in call to target 'huggingface_hub.hub_mixin.ModelHubMixin.from_pretrained':
TypeError("MoiraiModule.init() missing 7 required positional arguments: 'distr_output', 'd_model', 'num_layers', 'patch_sizes', 'max_seq_len', 'attn_dropout_p', and 'dropout_p'")
full_key: model.module

I followed the process exactly, but an error occurred when I used the command to make fine adjustments at the last step

@linyuan13 linyuan13 added the bug Something isn't working label Jul 25, 2024
@gorold
Copy link
Contributor

gorold commented Aug 2, 2024

Hi, could you edit the bug report according to the template? It's quite hard to understand what is the error from just the above.

@linyuan13
Copy link
Author

Thank you very much for your reply. I have solved this problem. But there is a new problem. It stops early during fine-tuning, and the MSE and other effects are not good during verification. The following is the early stopping log
Loading weights from local directory
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
[2024-08-02 08:25:59,964][datasets][INFO] - PyTorch version 2.3.1 available.
[2024-08-02 08:25:59,964][datasets][INFO] - JAX version 0.4.30 available.
Seed set to 1
[rank: 0] Seed set to 1
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/2
Loading weights from local directory [2024-08-02 08:26:04,015][datasets][INFO] - PyTorch version 2.3.1 available. [2024-08-02 08:26:04,015][datasets][INFO] - JAX version 0.4.30 available. [rank: 1] Seed set to 1 [rank: 1] Seed set to 1 Initial izing distributed: GLOBAL_RANK: 1, MEMBER: 2/2 ------------------------------------------------------------------------------------------------ distributed_backend=nccl All distributed processes registered. Starting with 2 processes ------------------------------------------------------------------------------------------------ /anaconda3/envs/uni/lib/python3.11/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:652: Checkpoint directory uni2ts-main/outputs/finetune/moirai_1.0_R_small/etth1/finetune1/checkpoints exists and is not empty. LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1] | Name | Type | Params | Mode -------------------------------------------------- 0 | module | MoiraiMod ule | 13.8 M | train ------------------------------------------------ 13.8 M Trainable params 0 Non-trainable params 13.8 M Total params 55.310 Total estimated model params size (MB) Epoch 0: | val/PackedMSELoss=11.40, val/Pack[rank: 0] Metric val/PackedNLLLoss improved. New best score: 2.069 [rank: 1] Metric val/PackedNLLLoss improved. New best score: 2.158 Epoch 3: | val/PackedNLLLoss=3.900, val/PackedMSELoss=11.90, val/Pack[rank: 0] Monitored metric val/PackedNLLLoss did not improve in the last 3 records. Best score: 2.069. Signaling Trainer to stop. [rank: 1] Monitored metric val/PackedNLLLoss did not improve in the last 3 records. Best score: 2.158. Signaling Trainer to stop. Epoch 3: | .py:254: UserWarning: resource_tracker: There appear to be 22 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' anaconda3/envs/uni/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 22 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The parameters are consistent with what you provided, and the GPU model is A100

@Sunwang-git
Copy link

Describe the bug Error executing job with overrides: ['run_name=first_run', 'model=moirai_1.0_R_small', 'data=etth1', 'val_data=etth1'] Error in call to target 'huggingface_hub.hub_mixin.ModelHubMixin.from_pretrained': TypeError("MoiraiModule.init() missing 7 required positional arguments: 'distr_output', 'd_model', 'num_layers', 'patch_sizes', 'max_seq_len', 'attn_dropout_p', and 'dropout_p'") full_key: model.module

I followed the process exactly, but an error occurred when I used the command to make fine adjustments at the last step

I am having the same bug now, could you please share your ways to solve it? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants