Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partners[chroma]: add retrieval of embedding vectors #28290

Merged
merged 4 commits into from
Nov 27, 2024

Conversation

mspronesti
Copy link
Contributor

@mspronesti mspronesti commented Nov 22, 2024

This PR adds an additional method to Chroma to retrieve the embedding vectors, besides the most relevant Documents. This is sometimes of use when you need to run a postprocessing algorithm on the retrieved results based on the vectors, which has been the case for me lately.

Example issue (discussion) requesting this change: #20383

Copy link

vercel bot commented Nov 22, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Nov 27, 2024 4:32pm

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module labels Nov 22, 2024
@mspronesti mspronesti changed the title feat(chroma): add retrieval of embedding vectors partners[chroma]: add retrieval of embedding vectors Nov 22, 2024
@mspronesti
Copy link
Contributor Author

@erikinfo @ccurme

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside for question on integration test.

"""Test end to end construction and scored search."""
texts = ["foo", "bar", "baz"]
metadatas = [{"page": str(i)} for i in range(len(texts))]
embeddings = FakeEmbeddings()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test pass? Do we need ConsistentFakeEmbeddings instead of FakeEmbeddings?

Copy link
Contributor Author

@mspronesti mspronesti Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be passing in my local env. Did it fail in yours?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at this. The FakeEmbeddings in Chroma is returning a fixed vector for each query. So the test isn't very stringent. Updated it to use ConsistentFakeEmbeddings, which is different for each query.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Nov 27, 2024
@ccurme ccurme enabled auto-merge (squash) November 27, 2024 16:32
@ccurme ccurme merged commit 8358666 into langchain-ai:master Nov 27, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm PR looks good. Use to confirm that a PR is ready for merging. size:M This PR changes 30-99 lines, ignoring generated files. Ɑ: vector store Related to vector store module
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants