feat: Add `SentenceTransformersDiversityRanker` #7095

awinml · 2024-02-26T11:57:28Z

Related Issues

fixes DiversityRanker #7094

Proposed Changes:

Adds SentenceTransformersDiversityRanker.

The Diversity Ranker orders documents in such a way as to maximize the overall diversity of the given documents. The ranker leverages sentence-transformer models to calculate semantic embeddings for each document and the query.

The ranker first calculates embeddings for each document and the query. It starts by selecting the document that is semantically closest to the query. Then, for each remaining document, it selects the one that, on average, is least similar to the already selected documents. This process continues until all documents are selected, resulting in a list where each subsequent document contributes the most to the overall diversity of the selected set.

How did you test it?

Tests have been added in test_sentence_transformers_diversity.py.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2024-02-26T12:30:51Z

Pull Request Test Coverage Report for Build 8232016781

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.1%) to 90.176%

Totals
Change from base Build 8190601051:	0.1%
Covered Lines:	5370
Relevant Lines:	5955

💛 - Coveralls

sjrl

Thanks for the work on this, I did a quick look over and had two comments that I wanted to bring up.

haystack/components/rankers/diversity.py

…ty query

vblagoje · 2024-02-27T15:48:25Z

@sjrl can you please take over, I'm overloaded with some other unrelated items 🙏

sjrl · 2024-02-27T15:53:43Z

Yup I can review this more later this week.

haystack/components/rankers/diversity.py

haystack/components/rankers/sentence_transformers_diversity.py

test/components/rankers/test_sentence_transformers_diversity.py

haystack/components/rankers/sentence_transformers_diversity.py

test/components/rankers/test_sentence_transformers_diversity.py

awinml · 2024-03-11T11:45:08Z

@sjrl All the tests, except those running the component on real data, have been converted to unit tests. Mocks have been used for the sentence-transformer model. An additional test for warm_up() has also been added. These improvements now bring the test coverage for this component to 100%.

sjrl

Thanks a lot @awinml this looks great!

Add Diversity Ranker

ed58436

awinml requested review from a team as code owners February 26, 2024 11:57

awinml requested review from dfokina and ZanSara and removed request for a team February 26, 2024 11:57

github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Feb 26, 2024

awinml added 2 commits February 26, 2024 17:31

Merge branch 'main' into add_diversity_ranker

bf5f022

Update tests

0aed48c

sjrl reviewed Feb 26, 2024

View reviewed changes

haystack/components/rankers/diversity.py Outdated Show resolved Hide resolved

haystack/components/rankers/diversity.py Outdated Show resolved Hide resolved

ZanSara requested review from a team and vblagoje and removed request for ZanSara and a team February 26, 2024 14:54

awinml added 2 commits February 26, 2024 21:18

Add separate suffix, prefix params for query and documents; allow emp…

9955ada

…ty query

Update docstrings

5ab2a32

awinml requested a review from sjrl February 27, 2024 15:00

vblagoje removed their request for review February 27, 2024 15:48

Merge branch 'main' into add_diversity_ranker

f2f3ce4