Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose Encoder and Decoder in TiktokenTokenizer #7314

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

razshare
Copy link

Fixes #7313

We are excited to review your PR.

So we can do the best job, please check:

  • There's a descriptive title that will make sense to other developers some time from now.
  • There's associated issues. All PR's should have issue(s) associated - unless a trivial self-evident change such as fixing a typo. You can use the format Fixes #nnnn in your description to cause GitHub to automatically close the issue(s) when your PR is merged.
  • Your change description explains what the change does, why you chose your approach, and anything else that reviewers should know.
  • You have included any necessary tests in the same PR.

Copy link

codecov bot commented Nov 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.88%. Comparing base (5090327) to head (140aa42).
Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7314      +/-   ##
==========================================
+ Coverage   68.87%   68.88%   +0.01%     
==========================================
  Files        1470     1470              
  Lines      274005   274005              
  Branches    28403    28401       -2     
==========================================
+ Hits       188717   188754      +37     
+ Misses      77970    77936      -34     
+ Partials     7318     7315       -3     
Flag Coverage Δ
Debug 68.88% <100.00%> (+0.01%) ⬆️
production 63.30% <100.00%> (+0.01%) ⬆️
test 89.41% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...Microsoft.ML.Tokenizers/Model/TiktokenTokenizer.cs 78.28% <100.00%> (ø)
...est/Microsoft.ML.Tokenizers.Tests/TiktokenTests.cs 99.39% <100.00%> (+0.40%) ⬆️

... and 8 files with indirect coverage changes

@tarekgh
Copy link
Member

tarekgh commented Nov 18, 2024

@razshare I replied on the issue. Let's discuss it there first before we continue here. Thanks a lot for your submission. I converted this PR to be draft for now till we finish the discussion.

@tarekgh tarekgh self-assigned this Nov 18, 2024
@tarekgh tarekgh marked this pull request as draft November 18, 2024 20:46
@razshare
Copy link
Author

@dotnet-policy-service agree

@tarekgh tarekgh reopened this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose Encoder in TiktokenTokenizer
2 participants