-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add INSTRUCTOR Embedder (v2) #5836
Conversation
@anakin87 FYI |
Hello @awinml and thanks for your contribution! Some tests in the CI are failing. |
@anakin87 The code uses the I think we need to add |
Hey, @awinml! Sorry for the delay... Today I will try to understand how to manage Instructor dependency and get back to you... |
Hello @awinml and @vrunm and thanks for this PR, After an internal discussion, we prefer to not include this new dependency ( In any case, the integration with INSTRUCTOR is interesting and we appreciate your efforts. For this reason, I am creating a PR in haystack-extras to accommodate a new package called instructor-embedders-haystack with the functionality you are working on. I hope this solution sounds good to you... |
This is the subproject I prepared for your component: https://github.com/deepset-ai/haystack-extras/tree/main/components/instructor-embedders Feel free to fork https://github.com/deepset-ai/haystack-extras and create a PR there. |
@anakin87 Thanks! This solution sounds good. I will create a PR with this work in https://github.com/deepset-ai/haystack-extras. |
@anakin87 I have moved the code to haystack-extras/components/instructor-embedders and opened a PR - deepset-ai/haystack-core-integrations#32. |
Great! 🙌 Let's continue in deepset-ai/haystack-core-integrations#32 |
Related Issues
Fixes #5242
Proposed Changes:
Add support for the INSTRUCTOR family of Embedding Models. They tailor the embeddings for different tasks and domains using a prompt (instruction) for each embedding.
The implementation is based on the Embedders Proposal and Embedder Design (Implement Embedders components (2.0) #5312).
Adds
InstructorEmbeddingBackend
, responsible for performing the actual embedding computation, implemented as a singleton class in order to reuse instances.Adds
InstructorTextEmbedder
, a component that embeds a list of strings into a list of vectors.Adds
InstructorDocumentEmbedder
, a component that embeds a list of Documents. The embedding of each Document is stored in the embedding field of the Document.Since the INSTRUCTOR Embedding Models are a modification of the Sentence Transformers models, the implementation is similar to the Sentence Transformers implementation (#5567).
This code was written collaboratively with @vrunm.
How did you test it?
Notes for the reviewer
InstructorEmbedding
andsentence-transformers
would be needed to be added as an additional dependency.(
sentence-transformers
has already been added in Add Sentence Transformers to dependencies haystack-preview-package#3)Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.