You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, cl100k_base is what we should use here — not gpt2 tokeniser. This is indeed different. In fact, if we can parameterise this e.g. take the model name as input, and pass to tiktoken library directly — that's the best. That way, we can support all the tokenisers that OpenAI has.
Check and confirm if the GPT4 tokeniser is same as gpt2? From what I recall, this is wrong. The tokeniser depends on the LLM.
Originally posted by @NirantK in #8 (comment)
The text was updated successfully, but these errors were encountered: