Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add DocumentNDCGEvaluator component #8419
feat: Add DocumentNDCGEvaluator component #8419
Changes from 5 commits
6aeeffc
f036e06
9b62397
ac67e4a
28c6399
9782cb5
88aca41
635ee75
a283cfe
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In light of above comments, maybe this can also be refined. Currently, it sounds like we are expecting a list of documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the docstring. Please let me know if it was better before or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using the inverse of the rank as fallback? Effectively this would double the "rank-discount" of the retrieved document: One by dividing by (i +1) in line 85 and the other by dividing by log2(i + 2) in line 86.
I guess a better fallback would be to just use value
1
which would translate into a simple binary relevance schema according to https://en.wikipedia.org/wiki/Discounted_cumulative_gainThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that the user can provide the relevant documents as a sorted list without scores. With the current fallback, the retrieved documents get the highest NDCG score only if all relevant documents are retrieved in this particular order.
With a fallback to
1
, the order of the relevant documents wouldn't matter anymore. I agree that's then simple binary relevance. Happy to change the fallback to that if users benefit more from that.@bilgeyucel You wanted to pass a sorted list of documents without scores right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julian-risch yes, I'm using HotPot QA dataset from hugging face and it doesn't provide scores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the fallback to binary relevance, what you could do is calculate scores yourself before passing the documents to the DocumentNDCGEvaluator. For example:
That would work for you too right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@julian-risch I can understand the intuition now. Still I'd probably not make this the default behavior: When supplying ground-truth docs, I wouldn't expect that the order of them makes a difference, tbh.
And if you really need that, you could simply pass scores as you showed in the preceding comment.
Anyways, I think there is an error in the implemetatoin of the intuiton. If I got it correct, then document relevance should be based on the order of the passed ground-truth docs. In the current implementation it's instead based on the order of the retrieved documents:
relevance = 1 / (i + 1)
wherei
is the index of the retrieved doc, not the ground-truth doc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. I will change the fallback to binary relevance. 👍