-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add INSTRUCTOR Embedder (v2) #32
Conversation
@anakin87 note that CI is not running, so let's run the tests locally at least before approving. We'll take care of the CI separately. Maybe that's because this file is missing the .yaml extension 👀 https://github.com/deepset-ai/haystack-extras/blob/main/.github/workflows/components_instructor_embedders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(@dfokina can you have a quick look at the docstrings?)
I felt free to make some minor changes, but
kudos again for a great job --> @awinml and @vrunm
It would also be nice if you open a PR in https://github.com/deepset-ai/haystack-integrations,
to showcase these new INSTRUCTOR Embedders, so that they can also appear on this web page.
Feel free to ask for help (if needed). 💙
@anakin87 Thanks! The changes look good. I will open a PR in https://github.com/deepset-ai/haystack-integrations with examples to showcase these new INSTRUCTOR Embedders. |
Zero embeddings and examples fixes
Code Migrated from deepset-ai/haystack#5836.
Related Issues
Fixes #5242
Proposed Changes:
Add support for the INSTRUCTOR family of Embedding Models. They tailor the embeddings for different tasks and domains using a prompt (instruction) for each embedding.
The implementation is based on the Embedders Proposal and Embedder Design (#5312).
Adds
InstructorEmbeddingBackend
, responsible for performing the actual embedding computation, implemented as a singleton class in order to reuse instances.Adds
InstructorTextEmbedder
, a component that embeds a list of strings into a list of vectors.Adds
InstructorDocumentEmbedder
, a component that embeds a list of Documents. The embedding of each Document is stored in the embedding field of the Document.Since the INSTRUCTOR Embedding Models are a modification of the Sentence Transformers models, the implementation is similar to the Sentence Transformers implementation (#5567).
This code was written collaboratively with @vrunm.
How did you test it?
Unit Tests have been added for each component.