-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use whole history in case of undetermined tokenization of sequence #1254
base: master
Are you sure you want to change the base?
Conversation
49a4c9b
to
05b3302
Compare
|
||
m_tokenized_chat_history.clear(); | ||
std::copy(new_chat_tokens.input_ids.data<int64_t>(), new_chat_tokens.input_ids.data<int64_t>() + new_chat_tokens.input_ids.get_size(), | ||
std::back_inserter(m_tokenized_chat_history)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you override m_tokenized_chat_history
with newly tokenized history, you can trust it again marking m_trust_encoded_history
with true
in generate(EncodedInputs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like generate with encoded inputs can always reset m_tokenized_chat_history to true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added reset of m_tokenized_chat_history
to true
new_atten_mask.data<int64_t>() + kv_cache_len); | ||
concatenated_attention_mask = new_atten_mask; | ||
if (is_chat_conversation && m_history_available) { | ||
if (!m_trust_encoded_history) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flip if
to check not negated m_trust_encoded_history
. You have else branch anyway, but removing extra !
simplifies reading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
break; | ||
i++; | ||
} | ||
|
||
return i == tokenized_history.size() - 1 || i == tokenized_history.size() - 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is i == tokenized_history.size() - 2
condition treated as same history? The last tokens are different in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoded_history
= prev templated encoded prompts&answers
+ not templated, but decoded/encoded last answer
tokenized_history
= prev templated encoded prompts&answers
+ pure model output
In decode
part it will be removed eos token
(or some stop token) or nothing
So, we will have the same results or encoded_history will be less on one symbol
comment added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The urgency is about reaching 24.6. But this PR targets master
. If it's possible to retarget this PR, do that. If it's not possible, let's finish the review here and then port to 24.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved #1268
|
||
m_history.clear(); | ||
m_templated_chat_history.clear(); | ||
m_tokenized_chat_history.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the order of vars modification aligned between start and finish chat. Currently start_chat()
clears m_tokenized_chat_history
after m_trust_encoded_history
, but m_tokenized_chat_history
is cleared in the end in finish_chat()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
m_templated_chat_history | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_templated_chat_history | |
); | |
m_templated_chat_history | |
); |
size_t history_size = m_language.get_tensor("attention_mask").get_shape().at(1); | ||
size_t inputs_embeds_size = inputs_embeds.get_shape().at(1); | ||
// inputs_embeds contains whole history | ||
if (inputs_embeds.get_shape().at(1) == tokenized_chat_history.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (inputs_embeds.get_shape().at(1) == tokenized_chat_history.size()) { | |
if (inputs_embeds_size == tokenized_chat_history.size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but we need tests for that change. I remember you said random miniCPM diverges. This PR probably fixes that, so your case can be used as a test for VLM. LLM needs its test as well, maybe you can provoke a random weights model to generate such tokens.
@@ -191,6 +224,9 @@ class StatefulLLMPipeline final : public LLMPipelineImplBase { | |||
attention_mask = data->attention_mask; | |||
} | |||
|
|||
if (is_chat_conversation && m_chat_input_type == ov::genai::utils::GenerationChatInputsType::ENCODED_INPUTS) | |||
std::copy(input_ids.data<int64_t>(), input_ids.data<int64_t>() + input_ids.get_size(), std::back_inserter(m_tokenized_chat_history)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be added unconditionally on m_chat_input_type
? because in current method we have only EncodedInputs& inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function will called also in STRING mode (if it is ENCODED_INPUTS it will be called directly by user, if it is STRING mode it will be called in generation with StringInputs here https://github.com/openvinotoolkit/openvino.genai/blob/releases/2024/5/src/cpp/src/llm_pipeline.cpp#L136), so, no, we should check what type of generation is it
|
||
m_tokenized_chat_history.clear(); | ||
std::copy(new_chat_tokens.input_ids.data<int64_t>(), new_chat_tokens.input_ids.data<int64_t>() + new_chat_tokens.input_ids.get_size(), | ||
std::back_inserter(m_tokenized_chat_history)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like generate with encoded inputs can always reset m_tokenized_chat_history to true
// if we met sequence with such combination of symbols, we cannot correctly subtract the new history from the old history | ||
// and find the difference as a prompt | ||
// so let's check it out and use the whole history in this case | ||
if (m_history_available && m_trust_encoded_history) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like m_history_available
is redundant and we can use !m_tokenized_chat_history.empty()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
m_templated_chat_history = new_templated_chat_history; | ||
m_tokenized_chat_history = new_chat_tokens; | ||
|
||
m_tokenized_chat_history.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m_tokenized_chat_history.clear(); | |
m_tokenized_chat_history.reserve(new_chat_tokens.input_ids.get_size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with reserve and without clear copy
will add elements , not rewrite it , reserve have added , but not sure that i'm understand correct comment
m_tokenized_chat_history = new_chat_tokens; | ||
|
||
m_tokenized_chat_history.clear(); | ||
std::copy(new_chat_tokens.input_ids.data<int64_t>(), new_chat_tokens.input_ids.data<int64_t>() + new_chat_tokens.input_ids.get_size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::copy(new_chat_tokens.input_ids.data<int64_t>(), new_chat_tokens.input_ids.data<int64_t>() + new_chat_tokens.input_ids.get_size(), | |
std::copy_n(new_chat_tokens.input_ids.data<int64_t>(), new_chat_tokens.input_ids.get_size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
} | ||
|
||
if(m_adapter_controller) { | ||
m_adapter_controller->apply(m_model_runner, config.adapters); | ||
} | ||
|
||
auto input_tokens = input_ids; | ||
if (is_chat_conversation && !m_trust_encoded_history) { | ||
input_tokens = tokenized_chat_history; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like here we can also set m_trust_encoded_history = true;
comments have been addressed - #1268 |
Task: CVS-157295