-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Optimum Embedders #379
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comprehensive PR! I've added a few comments.
...grations/optimum/src/haystack_integrations/components/embedders/optimum_document_embedder.py
Outdated
Show resolved
Hide resolved
...grations/optimum/src/haystack_integrations/components/embedders/optimum_document_embedder.py
Outdated
Show resolved
Hide resolved
...grations/optimum/src/haystack_integrations/components/embedders/optimum_document_embedder.py
Outdated
Show resolved
Hide resolved
...grations/optimum/src/haystack_integrations/components/embedders/optimum_document_embedder.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Outdated
Show resolved
Hide resolved
...grations/optimum/src/haystack_integrations/components/embedders/optimum_document_embedder.py
Outdated
Show resolved
Hide resolved
I took a look into this
to see if there was a way we could overcome pinning the dependencies. Thankfully, it looks like huggingface/optimum#1675 has been resolved with version 1.16.2, but the other issue still remains. Another error that I found while running the tests with versions onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid input name: token_type_ids which can be solved with the following code block added right before calling the model # Only pass required inputs otherwise onnxruntime can raise an error
inputs_to_remove = set(encoded_input.keys()).difference(self.embedding_model.inputs_names)
for key in inputs_to_remove:
encoded_input.pop(key) Even though this isn't needed with the current pinned dependencies (at least for the tested models) I think this would be good to add to avoid needing to solve this error in the future. |
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Outdated
Show resolved
Hide resolved
@awinml Just wanted to give you an update that we'll move this PR on to our plate to expedite its merging. |
Thanks! @sjrl I added this code block and unpinned the # Only pass required inputs otherwise onnxruntime can raise an error
inputs_to_remove = set(encoded_input.keys()).difference(self.embedding_model.inputs_names)
for key in inputs_to_remove:
encoded_input.pop(key) All the tests pass successfully! |
@shadeMe Thanks for the detailed review! I think this PR is nearly there. I have pushed most of the changes requested in the review. I will be adding the other pooling methods (as mentioned in #379 (comment)) very shortly. |
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Outdated
Show resolved
Hide resolved
@awinml Thanks for adding the Pooling sub module! Would it also be possible to add tests for each of the pooling methods to check their implementations? |
integrations/optimum/src/haystack_integrations/components/embedders/backends/optimum_backend.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/backends/optimum_backend.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/backends/optimum_backend.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/backends/optimum_backend.py
Outdated
Show resolved
Hide resolved
...grations/optimum/src/haystack_integrations/components/embedders/optimum_document_embedder.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/optimum_text_embedder.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/pooling.py
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/pooling.py
Outdated
Show resolved
Hide resolved
integrations/optimum/src/haystack_integrations/components/embedders/pooling.py
Show resolved
Hide resolved
Co-authored-by: Daria Fokina <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the contribution!
Related Issues
Proposed Changes:
Add support for inferencing embedding models using the Hugging Face Optimum Library. These components are designed to seamlessly inference models using the high speed ONNX runtime.
Introduces two components:
OptimumTextEmbedder
, a component for embedding strings.OptimumDocumentEmbedder
, a component for computing Document embeddings.Additional optimizations implemented to bring down inference time:
padding=True
for theAutoTokenizer
. Please see the Transformers Documentation for reference.For the TensorRT ONNX runtime, it is recommended to cache the TensorRT engine since it takes time to build. Instructions to pass the necessary parameters for caching have been added to the docstrings.
The conversion to ONNX is cached by default, similar to how Transformers caches the models. If the user wishes to modify the caching parameters or tweak it for the other runtimes, the necessary parameters can be passed using the
model_kwargs
parameter.The implementation for the different Pooling Methods is based on the Sentence Transformers implementation.
How did you test it?
Tests were added in
optimum_document_embedder.py
andoptimum_text_embedder.py
.Notes for the reviewer
For using
optimum
with GPU based runtimes, theoptimum[onnxruntime-gpu]
package is required. To install this packageoptimum
has to be fully uninstalled first. To overcome this limitation, two optional dependencies can been set:pip install optimum-haystack[cpu]
, which installsoptimum[onnxruntime]
.pip install optimum-haystack[gpu]
, which installsoptimum[onnxruntime-gpu]
.This approach is not very user friendly, since the component is not usable when using just
pip install optimum-haystack
. I am not very happy with this approach, please suggest a better approach.Currently, only
optimum[onnxruntime]
has been added to the dependencies, without the support for the gpu package.Usage Examples:
These examples demonstrate how the embedders can be used to inference the
BAAI/bge-small-en-v1.5
embedding model using different ONNX runtimes.On CPU:
On GPU using the
CUDAExecutionProvider
: