Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3 #61

Open
nico1995lee opened this issue Apr 12, 2024 · 5 comments

Comments

@nico1995lee
Copy link

TypeError: Too few parameters for <class 'reasoners.algorithm.mcts.MCTSNode'>; actual 2, expected 3
[2024-04-12 07:26:48,924] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 1263) of binary: /mlsteam/data/LLM/llama/venv/bin/python
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/rap_gsm8k/inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-12_07:26:48
  host      : 8dede9e2fb55
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 1263)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
@Ber666
Copy link
Collaborator

Ber666 commented Apr 14, 2024

Hi, I tried fixing this error. Could you try again? Thanks.

@nico1995lee
Copy link
Author

Hi, Thanks for your reply.
This error has been resolved, but there are new error:

  File "/mlsteam/data/LLM/llm-reasoners/reasoners/lm/llama_2_model.py", line 146, in generate
    assert max_prompt_size <= params.max_seq_len, f"prompt length exceeds limit: {max_prompt_size} > {params.max_seq_len}"
AssertionError: prompt length exceeds limit: 2054 > 2048
[2024-04-15 04:49:16,875] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 2659) of binary: /mlsteam/data/LLM/llama/venv/bin/python
Traceback (most recent call last):
  File "/mlsteam/data/LLM/llama/venv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/mlsteam/data/LLM/llama/venv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
examples/rap_gsm8k/inference.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-04-15_04:49:16
  host      : 8dede9e2fb55
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 2659)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

@Ber666
Copy link
Collaborator

Ber666 commented Apr 15, 2024

Could you specify the script you were running, so that I can reproduce the error?

@nico1995lee
Copy link
Author

I'm trying to implement RAP in the gsm8k dataset, so I executed the following command:

torchrun --nproc-per-node 1 --master-port 6676 examples/rap_gsm8k/inference.py --base_lm llama-2 --llama_2_ckpts /mlsteam/data/LLM/llama/ --llama_size 7B

@Ber666
Copy link
Collaborator

Ber666 commented Apr 22, 2024

Hi, I tried running this command, but couldn't reproduce the error... It seems to be due to an inappropriate processing of the edge case. It might be easier to debug by printing out the input and output.

Besides, I noticed that you are using llama-2 7b, which is a relatively weak model and may not follow the demonstration format. This could also cause unexpected errors. We have supported Llama-3 and you may try whether a stronger model would solve this problem.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants