You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In the context of embeddings and llm it would be great to be able to use token length instead of code points for string length.
Describe the solution you'd like
The options could be extended with a length_function, similar to the langchain implementation in python.
Context
It would be great to be able to use a library like https://github.com/connorjacobsen/tiktoken-elixir to define the chunk length. I think the best approach would be to allow a function to be passed in that measures the chunk length, so users can choose how to handle it. The default could be String.length/1
Let me know if this interests you. I can raise a PR and take care of it.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
In the context of embeddings and llm it would be great to be able to use token length instead of code points for string length.
Describe the solution you'd like
The options could be extended with a length_function, similar to the langchain implementation in python.
Context
It would be great to be able to use a library like https://github.com/connorjacobsen/tiktoken-elixir to define the chunk length. I think the best approach would be to allow a function to be passed in that measures the chunk length, so users can choose how to handle it. The default could be
String.length/1
Let me know if this interests you. I can raise a PR and take care of it.
The text was updated successfully, but these errors were encountered: