-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add SentenceTransformersDiversityRanker
#7095
Conversation
Pull Request Test Coverage Report for Build 8232016781Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this, I did a quick look over and had two comments that I wanted to bring up.
@sjrl can you please take over, I'm overloaded with some other unrelated items 🙏 |
Yup I can review this more later this week. |
test/components/rankers/test_sentence_transformers_diversity.py
Outdated
Show resolved
Hide resolved
test/components/rankers/test_sentence_transformers_diversity.py
Outdated
Show resolved
Hide resolved
DiversityRanker
SentenceTransformersDiversityRanker
@sjrl All the tests, except those running the component on real data, have been converted to unit tests. Mocks have been used for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @awinml this looks great!
Related Issues
DiversityRanker
#7094Proposed Changes:
Adds
SentenceTransformersDiversityRanker
.The Diversity Ranker orders documents in such a way as to maximize the overall diversity of the given documents. The ranker leverages sentence-transformer models to calculate semantic embeddings for each document and the query.
The ranker first calculates embeddings for each document and the query. It starts by selecting the document that is semantically closest to the query. Then, for each remaining document, it selects the one that, on average, is least similar to the already selected documents. This process continues until all documents are selected, resulting in a list where each subsequent document contributes the most to the overall diversity of the selected set.
How did you test it?
Tests have been added in
test_sentence_transformers_diversity.py
.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.