Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Optimum Embedders #379

Merged
merged 10 commits into from
Feb 21, 2024
Merged

Conversation

awinml
Copy link
Contributor

@awinml awinml commented Feb 8, 2024

Related Issues

Proposed Changes:

Add support for inferencing embedding models using the Hugging Face Optimum Library. These components are designed to seamlessly inference models using the high speed ONNX runtime.

Introduces two components:

  • OptimumTextEmbedder, a component for embedding strings.
  • OptimumDocumentEmbedder, a component for computing Document embeddings.

Additional optimizations implemented to bring down inference time:

  • Sorting by sequence length: The text sequences are sorted in descending order based on their length, before creating embeddings. Please see the Sentence Transformers Implementation for reference.
  • Dynamic Padding: The text sequences are padded to the longest sequence in the batch. This is achieved by setting padding=True for the AutoTokenizer. Please see the Transformers Documentation for reference.

For the TensorRT ONNX runtime, it is recommended to cache the TensorRT engine since it takes time to build. Instructions to pass the necessary parameters for caching have been added to the docstrings.

The conversion to ONNX is cached by default, similar to how Transformers caches the models. If the user wishes to modify the caching parameters or tweak it for the other runtimes, the necessary parameters can be passed using the model_kwargs parameter.

The implementation for the different Pooling Methods is based on the Sentence Transformers implementation.

How did you test it?

Tests were added in optimum_document_embedder.py and optimum_text_embedder.py.

Notes for the reviewer

For using optimum with GPU based runtimes, the optimum[onnxruntime-gpu] package is required. To install this package optimum has to be fully uninstalled first. To overcome this limitation, two optional dependencies can been set:

  • For the CPU version: pip install optimum-haystack[cpu], which installs optimum[onnxruntime].
  • For the GPU version: pip install optimum-haystack[gpu], which installs optimum[onnxruntime-gpu].

This approach is not very user friendly, since the component is not usable when using just pip install optimum-haystack. I am not very happy with this approach, please suggest a better approach.

Currently, only optimum[onnxruntime] has been added to the dependencies, without the support for the gpu package.

Usage Examples:

These examples demonstrate how the embedders can be used to inference the BAAI/bge-small-en-v1.5 embedding model using different ONNX runtimes.

On CPU:

from haystack_integrations.components.embedders import OptimumTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OptimumTextEmbedder(model="sentence-transformers/all-mpnet-base-v2")
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# {'embedding': [0.017020374536514282, -0.023255806416273117, ...],

On GPU using the CUDAExecutionProvider:

from haystack_integrations.components.embedders import OptimumTextEmbedder

text_to_embed = "I love pizza!"

text_embedder = OptimumTextEmbedder(
    model="sentence-transformers/all-mpnet-base-v2", 
    onnx_execution_provider="CUDAExecutionProvider"
)
text_embedder.warm_up()

print(text_embedder.run(text_to_embed))

# [0.017020374536514282, -0.023255806416273117, ...]

@awinml awinml requested a review from a team as a code owner February 8, 2024 14:47
@awinml awinml requested review from davidsbatista and removed request for a team February 8, 2024 14:47
@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Feb 8, 2024
@shadeMe shadeMe self-requested a review February 8, 2024 15:17
Copy link
Contributor

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comprehensive PR! I've added a few comments.

@sjrl
Copy link
Contributor

sjrl commented Feb 15, 2024

I took a look into this

The versions of optimum and transformers has been pinned to optimum==1.15.0 and transformers==4.36.2, due to bugs in the latest optimum release. Please refer to the following issues for more information: huggingface/optimum#1673 and huggingface/optimum#1675.

to see if there was a way we could overcome pinning the dependencies.

Thankfully, it looks like huggingface/optimum#1675 has been resolved with version 1.16.2, but the other issue still remains.

Another error that I found while running the tests with versions optimum>=1.16.2 and transformers>=4.37.2 was

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid input name: token_type_ids

which can be solved with the following code block added right before calling the model

# Only pass required inputs otherwise onnxruntime can raise an error
inputs_to_remove = set(encoded_input.keys()).difference(self.embedding_model.inputs_names)
for key in inputs_to_remove:
    encoded_input.pop(key)

Even though this isn't needed with the current pinned dependencies (at least for the tested models) I think this would be good to add to avoid needing to solve this error in the future.

@shadeMe
Copy link
Contributor

shadeMe commented Feb 19, 2024

@awinml Just wanted to give you an update that we'll move this PR on to our plate to expedite its merging.

@awinml
Copy link
Contributor Author

awinml commented Feb 19, 2024

Thanks! @sjrl

I added this code block and unpinned the optimum and transformers versions.

# Only pass required inputs otherwise onnxruntime can raise an error
inputs_to_remove = set(encoded_input.keys()).difference(self.embedding_model.inputs_names)
for key in inputs_to_remove:
    encoded_input.pop(key)

All the tests pass successfully!

@awinml
Copy link
Contributor Author

awinml commented Feb 19, 2024

@shadeMe Thanks for the detailed review! I think this PR is nearly there.

I have pushed most of the changes requested in the review. I will be adding the other pooling methods (as mentioned in #379 (comment)) very shortly.

@awinml awinml requested review from shadeMe and sjrl February 20, 2024 07:36
@sjrl
Copy link
Contributor

sjrl commented Feb 20, 2024

@awinml Thanks for adding the Pooling sub module! Would it also be possible to add tests for each of the pooling methods to check their implementations?

@shadeMe shadeMe removed the request for review from davidsbatista February 20, 2024 12:18
@awinml awinml requested a review from shadeMe February 21, 2024 10:06
Copy link
Contributor

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the contribution!

@shadeMe shadeMe merged commit ad5a290 into deepset-ai:main Feb 21, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:optimum topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add OptimumEmbedder
4 participants