Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError while running RAP for gsm8k dataset using exllama #95

Open
Zeyuan-Liu opened this issue Aug 15, 2024 · 1 comment
Open

Comments

@Zeyuan-Liu
Copy link

Zeyuan-Liu commented Aug 15, 2024

Following the readme.md, I tried to run RAP for gsm8k using exllama, with the recommended instruction:

CUDA_VISIBLE_DEVICES=0,1 python examples/RAP/gsm8k/inference.py --base_lm exllama --exllama_model_dir my/path/to/Llama-2-7B-Chat-GPTQ --exllama_lora_dir None --exllama_mem_map '[16,22]' --n_action 1 --n_confidence 1 --n_iters 1 --temperature 0.0

but encountered following RuntimeError:

`Using the latest cached version of the dataset since gsm8k couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'main' at /home/lzy/.cache/huggingface/datasets/gsm8k/main/0.0.0/1505e1f9da07dd20 (last modified on Sun Aug 4 12:00:13 2024).
gsm8k: 0%| | 0/1319 [00:00<?, ?it/s]
Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?

/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py:119: UserWarning: max_new_tokens is not set, we will use the default value: 200
warnings.warn(f"max_new_tokens is not set, we will use the default value: {self.max_new_tokens}")
/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py:122: UserWarning: do_sample is False while the temperature is non-positive. We will use greedy decoding for Exllama
warnings.warn(
/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py:144: UserWarning: the eos_token '\n' is encoded into tensor([29871, 13]) with length != 1, using 13 as the eos_token_id
warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '

MCTSAggregation: no answer retrieved.
Case #1: correct=False, output=None, answer='18';accuracy=0.000 (0/1)
gsm8k: 0%| | 1/1319 [00:09<3:35:53, 9.83s/it]
A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?
gsm8k: 0%| | 1/1319 [00:34<12:35:08, 34.38s/it]
Traceback (most recent call last):
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/inference.py", line 155, in
fire.Fire(main)
File "/home/lzy/anaconda3/envs/agent/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/lzy/anaconda3/envs/agent/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/lzy/anaconda3/envs/agent/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/inference.py", line 146, in main
rap_gsm8k(base_model=base_model,
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/inference.py", line 69, in rap_gsm8k
accuracy = evaluator.evaluate(reasoner, num_shot=4, resume=resume, log_dir=log_dir)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/base.py", line 232, in evaluate
algo_output = reasoner(self.input_processor(example),
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/base.py", line 183, in call
return self.search_algo(self.world_model, self.search_config, **kwargs)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 314, in call
self.search()
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 284, in search
path = self.iterate(self.root)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 188, in iterate
self._simulate(path)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 249, in _simulate
self._expand(node)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/algorithm/mcts.py", line 224, in _expand
node.state, aux = self.world_model.step(node.parent.state, node.action)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/examples/RAP/gsm8k/world_model.py", line 96, in step
outputs = self.base_model.generate([model_input] * num,
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py", line 163, in generate
decoded = self.generate_simple(self.generator, inputs[start:end], max_new_tokens=max_new_tokens,
File "/home/lzy/Desktop/Interact/llm-reasoners-main/reasoners/lm/exllama_model.py", line 200, in generate_simple
generator.gen_begin(ids, mask=mask)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/generator.py", line 186, in gen_begin
self.model.forward(self.sequence[:, :-1], self.cache, preprocess_only = True, lora = self.lora, input_mask = mask)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 972, in forward
r = self._forward(input_ids[:, chunk_begin : chunk_end],
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 1058, in _forward
hidden_states = decoder_layer.forward(hidden_states, cache, buffers[device], lora)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 536, in forward
hidden_states = self.self_attn.forward(hidden_states, cache, buffer, lora)
File "/home/lzy/Desktop/Interact/llm-reasoners-main/exllama/model.py", line 440, in forward
new_keys = cache.key_states[self.index].narrow(2, past_len, q_len).narrow(0, 0, bsz)

RuntimeError: start (3072) + length (6) exceeds dimension size (3072).`

@Ber666
Copy link
Collaborator

Ber666 commented Sep 3, 2024

Hi,

Our prompt is designed for base models (without instruction tuning or RLHF). Using a chat model may lead to unexpected output and the program may not parse it correctly. To debug, you may print the raw outputs of the LLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants