Skip to content

Commit

Permalink
add tokenizer and inference configs in docstring
Browse files Browse the repository at this point in the history
Signed-off-by: HuiyingLi <[email protected]>
  • Loading branch information
HuiyingLi committed Aug 8, 2024
1 parent 49a263f commit 41de6ef
Showing 1 changed file with 7 additions and 0 deletions.
7 changes: 7 additions & 0 deletions nemo/collections/multimodal/data/neva/neva_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -609,6 +609,13 @@ def preprocess_yi_34b(
The function applies prompt templates and tokenizes the conversations according to the Yi-1.5 34b model specifications.
It involves special handling of tokens, masking of labels, and adjustments based on configuration settings.
This template works with the following tokenizer configs:
- model.tokenizer.library='huggingface'
- model.tokenizer.type='01-ai/Yi-1.5-34B'
- model.tokenizer.additional_special_tokens='{additional_special_tokens: ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>"]}'
At inference time, add end string to stop sampling:
- inference.end_strings='["<|im_end|>"]'
Parameters:
- sources (dict): A dictionary of sources containing conversations to be processed.
- tokenizer: The tokenizer to be used for processing the text.
Expand Down

0 comments on commit 41de6ef

Please sign in to comment.