feat: Add DocumentNDCGEvaluator component #8419

julian-risch · 2024-09-30T07:56:40Z

Related Issues

fixes Add DocumentNDCGEvaluator component #8415

Proposed Changes:

Adding a new DocumentNDCGEvaluator component and corresponding unit tests

How did you test it?

New unit tests

Notes for the reviewer

The implementation of this component could benefit from any work on Update Document.__eq__ to intelligently compare floats #8412 In this PR, I am simply comparing document ids to evaluate whether two documents are equal.
I can recommend reading https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Example for understanding the calculations and the values used in the test assertions

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2024-09-30T11:27:51Z

Pull Request Test Coverage Report for Build 11124601375

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
43 unchanged lines in 2 files lost coverage.
Overall coverage increased (+0.05%) to 90.329%

Files with Coverage Reduction	New Missed Lines	%
core/pipeline/pipeline.py	8	79.55%
core/pipeline/base.py	35	92.36%

Totals
Change from base Build 11054954171:	0.05%
Covered Lines:	7435
Relevant Lines:	8231

💛 - Coveralls

haystack/components/evaluators/document_ndcg.py

Amnah199 · 2024-09-30T14:03:41Z

haystack/components/evaluators/document_ndcg.py

+        `ground_truth_documents` and `retrieved_documents` must have the same length.
+
+        :param ground_truth_documents:
+            A list of expected documents for each question with relevance scores or sorted by relevance.


In light of above comments, maybe this can also be refined. Currently, it sounds like we are expecting a list of documents.

I updated the docstring. Please let me know if it was better before or not.

test/components/evaluators/test_document_ndcg.py

tstadel · 2024-10-01T10:06:09Z

haystack/components/evaluators/document_ndcg.py

+        relevant_id_to_score = {doc.id: doc.score for doc in gt_docs}
+        for i, doc in enumerate(ret_docs):
+            if doc.id in relevant_id_to_score:  # TODO Related to https://github.com/deepset-ai/haystack/issues/8412
+                # If the gt document has a float score, use it; otherwise, use the inverse of the rank


Why are we using the inverse of the rank as fallback? Effectively this would double the "rank-discount" of the retrieved document: One by dividing by (i +1) in line 85 and the other by dividing by log2(i + 2) in line 86.

I guess a better fallback would be to just use value 1 which would translate into a simple binary relevance schema according to https://en.wikipedia.org/wiki/Discounted_cumulative_gain

My idea was that the user can provide the relevant documents as a sorted list without scores. With the current fallback, the retrieved documents get the highest NDCG score only if all relevant documents are retrieved in this particular order.
With a fallback to 1, the order of the relevant documents wouldn't matter anymore. I agree that's then simple binary relevance. Happy to change the fallback to that if users benefit more from that.
@bilgeyucel You wanted to pass a sorted list of documents without scores right?

@julian-risch yes, I'm using HotPot QA dataset from hugging face and it doesn't provide scores.

If we change the fallback to binary relevance, what you could do is calculate scores yourself before passing the documents to the DocumentNDCGEvaluator. For example:

for i, doc in enumerate(docs, 1): doc.score = 1 / i

That would work for you too right?

@julian-risch I can understand the intuition now. Still I'd probably not make this the default behavior: When supplying ground-truth docs, I wouldn't expect that the order of them makes a difference, tbh.
And if you really need that, you could simply pass scores as you showed in the preceding comment.

Anyways, I think there is an error in the implemetatoin of the intuiton. If I got it correct, then document relevance should be based on the order of the passed ground-truth docs. In the current implementation it's instead based on the order of the retrieved documents: relevance = 1 / (i + 1) where i is the index of the retrieved doc, not the ground-truth doc.

Yes, true. I will change the fallback to binary relevance. 👍

haystack/components/evaluators/document_ndcg.py

test/components/evaluators/test_document_ndcg.py

tstadel

Looking pretty good. I like the tests as well. Two things:

we should ensure that the code doesn't break when both inputs are empty lists. Or at least it should break friedly
I wouldn't create our own flavour of nDCG by using the "inverse of document ranks" score-fallback. Instead I'd prefer to go with simple binary relevance assumption. This is at least what I would expect to get, if I haven't specified graded relevance values.

julian-risch · 2024-10-01T11:45:20Z

Looking pretty good. I like the tests as well. Two things:

we should ensure that the code doesn't break when both inputs are empty lists. Or at least it should break friedly

I wouldn't create our own flavour of nDCG by using the "inverse of document ranks" score-fallback. Instead I'd prefer to go with simple binary relevance assumption. This is at least what I would expect to get, if I haven't specified graded relevance values.

@Amnah199 @tstadel Thank you for your reviews!

I extended the input validation and now raise an error if one of the inputs is []. [[]] as input still works and raises no error.
I now changed the fallback to score 1, which is binary relevance. Simplifies the implementation too.

tstadel

Thanks for the changes! 💯

Amnah199

👍

* draft new component and tests * draft new component and tests * fix tests, replace usage of get_attr * improve docstrings, refactor tests * add test for mixed documents w/wo scores * add test with multiple lists and update docstring * validate inputs, add tests, make methods static * change fallback to binary relevance * rename validate_init_parameters to validate_inputs

draft new component and tests

6aeeffc

github-actions bot added the topic:tests label Sep 30, 2024

draft new component and tests

f036e06

github-actions bot added the type:documentation Improvements on the docs label Sep 30, 2024

fix tests, replace usage of get_attr

9b62397

improve docstrings, refactor tests

ac67e4a

julian-risch mentioned this pull request Sep 30, 2024

docs: new DocumentNDCGEvaluator #8421

Closed

add test for mixed documents w/wo scores

28c6399

julian-risch marked this pull request as ready for review September 30, 2024 13:21

julian-risch requested review from a team as code owners September 30, 2024 13:21

julian-risch requested review from dfokina and vblagoje and removed request for a team September 30, 2024 13:21

shadeMe requested a review from sjrl September 30, 2024 13:26

julian-risch requested review from Amnah199 and removed request for vblagoje September 30, 2024 13:43

Amnah199 reviewed Sep 30, 2024

View reviewed changes

haystack/components/evaluators/document_ndcg.py Show resolved Hide resolved

Amnah199 reviewed Sep 30, 2024

View reviewed changes

test/components/evaluators/test_document_ndcg.py Show resolved Hide resolved

add test with multiple lists and update docstring

9782cb5

julian-risch requested review from Amnah199 and tstadel September 30, 2024 16:40

tstadel reviewed Oct 1, 2024

View reviewed changes

haystack/components/evaluators/document_ndcg.py Show resolved Hide resolved

tstadel reviewed Oct 1, 2024

View reviewed changes

test/components/evaluators/test_document_ndcg.py Show resolved Hide resolved

tstadel reviewed Oct 1, 2024

View reviewed changes

validate inputs, add tests, make methods static

88aca41

julian-risch added 2 commits October 1, 2024 13:39

change fallback to binary relevance

635ee75

rename validate_init_parameters to validate_inputs

a283cfe

julian-risch requested a review from tstadel October 1, 2024 11:47

tstadel approved these changes Oct 1, 2024

View reviewed changes

Amnah199 approved these changes Oct 1, 2024

View reviewed changes

julian-risch merged commit 08686d9 into main Oct 1, 2024
19 checks passed

julian-risch deleted the document-ndcg-evaluator branch October 1, 2024 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add DocumentNDCGEvaluator component #8419

feat: Add DocumentNDCGEvaluator component #8419

julian-risch commented Sep 30, 2024 •

edited

Loading

coveralls commented Sep 30, 2024 •

edited

Loading

Amnah199 Sep 30, 2024

julian-risch Sep 30, 2024

tstadel Oct 1, 2024 •

edited

Loading

julian-risch Oct 1, 2024

bilgeyucel Oct 1, 2024

julian-risch Oct 1, 2024

tstadel Oct 1, 2024 •

edited

Loading

julian-risch Oct 1, 2024

tstadel left a comment

julian-risch commented Oct 1, 2024

tstadel left a comment

Amnah199 left a comment

feat: Add DocumentNDCGEvaluator component #8419

feat: Add DocumentNDCGEvaluator component #8419

Conversation

julian-risch commented Sep 30, 2024 • edited Loading

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented Sep 30, 2024 • edited Loading

Pull Request Test Coverage Report for Build 11124601375

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Amnah199 Sep 30, 2024

Choose a reason for hiding this comment

julian-risch Sep 30, 2024

Choose a reason for hiding this comment

tstadel Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

julian-risch Oct 1, 2024

Choose a reason for hiding this comment

bilgeyucel Oct 1, 2024

Choose a reason for hiding this comment

julian-risch Oct 1, 2024

Choose a reason for hiding this comment

tstadel Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

julian-risch Oct 1, 2024

Choose a reason for hiding this comment

tstadel left a comment

Choose a reason for hiding this comment

julian-risch commented Oct 1, 2024

tstadel left a comment

Choose a reason for hiding this comment

Amnah199 left a comment

Choose a reason for hiding this comment

julian-risch commented Sep 30, 2024 •

edited

Loading

coveralls commented Sep 30, 2024 •

edited

Loading

tstadel Oct 1, 2024 •

edited

Loading

tstadel Oct 1, 2024 •

edited

Loading