Skip to content

Commit

Permalink
Add support: o200k_base tokenizer.
Browse files Browse the repository at this point in the history
  • Loading branch information
CaffeeLake committed May 14, 2024
1 parent eb9c1de commit 8733592
Show file tree
Hide file tree
Showing 10 changed files with 200,109 additions and 5 deletions.
1 change: 1 addition & 0 deletions scripts/download_assets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/encoder.json
https://openaipublic.blob.core.windows.net/encodings/r50k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/p50k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken
https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken
EOF
)

Expand Down
1 change: 1 addition & 0 deletions tiktoken-rs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ println!("max_tokens: {}", max_tokens);

| Encoding name | OpenAI models |
| ----------------------- | ------------------------------------------------------------------------- |
| `o200k_base` | GPT-4o models. |
| `cl100k_base` | ChatGPT models, `text-embedding-ada-002` |
| `p50k_base` | Code models, `text-davinci-002`, `text-davinci-003` |
| `p50k_edit` | Use for edit models like `text-davinci-edit-001`, `code-davinci-edit-001` |
Expand Down
Loading

0 comments on commit 8733592

Please sign in to comment.