Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for a length function #35

Open
ospaarmann opened this issue Dec 12, 2024 · 0 comments
Open

Add support for a length function #35

ospaarmann opened this issue Dec 12, 2024 · 0 comments

Comments

@ospaarmann
Copy link

ospaarmann commented Dec 12, 2024

Is your feature request related to a problem? Please describe.
In the context of embeddings and llm it would be great to be able to use token length instead of code points for string length.

Describe the solution you'd like
The options could be extended with a length_function, similar to the langchain implementation in python.

Context
It would be great to be able to use a library like https://github.com/connorjacobsen/tiktoken-elixir to define the chunk length. I think the best approach would be to allow a function to be passed in that measures the chunk length, so users can choose how to handle it. The default could be String.length/1

Let me know if this interests you. I can raise a PR and take care of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant