Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add CohereTextEmbedder CohereDocumentEmbedder #5831

Closed
wants to merge 21 commits into from

Conversation

vrunm
Copy link
Contributor

@vrunm vrunm commented Sep 16, 2023

Related Issues

Proposed Changes:

Add CohereTextEmbedder, a component that uses Cohere embedding models to embed strings into vectors.
Add CohereDocumentEmbedder, a component that embeds a list of Documents.

  • The implementation uses the cohere sdk instead of the post requests based implementation in V1.
  • We do not need a CohereEmbeddingBackend in this case since it is already handled by the cohere sdk (library).
  • The Cohere Client has support for timeout and retries (https://github.com/cohere-ai/cohere-python/blob/main/cohere/client.py). This has been included as a optional argument.

How did you test it?

Added unit tests.

Notes for the reviewer

  • "cohere" needs to be added as a dependency in the haystack-preview-package.

  • Supported Cohere Embedding Models and their embedding dimensions:

    • "embed-english-v2.0"/ "large" (default) - 4096
    • "embed-english-light-v2.0"/ "small" - 1024
    • "embed-multilingual-v2.0"/ "multilingual-22-12" - 768
  • Cohere Embedding Models supported in 1.0 API that don't work anymore:

    • "finance-sentiment"
    • "medium"

Checklist

@vrunm vrunm requested a review from a team as a code owner September 16, 2023 11:37
@vrunm vrunm requested review from MichelBartels and removed request for a team September 16, 2023 11:37
@CLAassistant
Copy link

CLAassistant commented Sep 16, 2023

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels Sep 16, 2023
@vrunm vrunm requested a review from a team as a code owner September 16, 2023 11:52
@vrunm vrunm requested review from dfokina and removed request for a team September 16, 2023 11:52
@vrunm vrunm force-pushed the add_cohere_text_embedder branch from 893fb61 to ca47506 Compare September 18, 2023 14:41
@vrunm vrunm changed the title feat: Add CohereTextEmbedder feat: Add CohereTextEmbedder Sep 19, 2023
@vrunm vrunm changed the title feat: Add CohereTextEmbedder feat: Add CohereTextEmbedder Sep 19, 2023
@vrunm vrunm changed the title feat: Add CohereTextEmbedder feat: Add CohereTextEmbedder CohereDocumentEmbedder Sep 21, 2023
@masci masci added the 2.x Related to Haystack v2.0 label Sep 21, 2023
@masci masci requested review from masci and removed request for MichelBartels September 21, 2023 07:01
Copy link
Contributor

@masci masci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the latency here!

I took a first pass and left a couple of comments, but I have a general question: why the two embedders look so different? Say, one uses the async client and the other doesn't... Any additional info that can help me better understanding the code much appreciated!

Copy link
Contributor

@dfokina dfokina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the detailed docstrings! Left a couple of suggestions to clean them up.

@anakin87 anakin87 mentioned this pull request Oct 23, 2023
@masci masci self-assigned this Nov 23, 2023
@vrunm vrunm requested a review from masci November 23, 2023 11:40
@github-actions github-actions bot removed the topic:CI label Nov 23, 2023
@masci
Copy link
Contributor

masci commented Dec 5, 2023

Superseded by deepset-ai/haystack-core-integrations#80

@masci
Copy link
Contributor

masci commented Dec 6, 2023

deepset-ai/haystack-core-integrations#80 is ready for review, closing this one

@masci masci closed this Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cohere Embedders
4 participants