Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pre-processed training data #9

Open
minamonaa opened this issue Oct 22, 2024 · 2 comments
Open

pre-processed training data #9

minamonaa opened this issue Oct 22, 2024 · 2 comments

Comments

@minamonaa
Copy link

I have two questions about the pre-processed training data of NQ data.
How is it possible for 'has_gold_answer' to be False when 'em' is 1 and 'f1' is 1.0?
What criteria were used to select 'positive_ctxs'? In QA tasks, it is mentioned that the context with the highest EM score was chosen, but how were 'positive_ctxs' set when there were multiple sentences with an EM of 1?

@carriex
Copy link
Owner

carriex commented Oct 22, 2024

Hi there,

has_gold_answer denotes whether the retrieved documents contains the gold answer, while EM and F1 measures whether the model outputs the correct answer. It is possible that model outputs the correct answer while it is not in the retrieved documents.

When there are multiple sentences with EM equals to 1, we select the one with the highest P(gold_answer | sentence) (i.e. the sentence which leads to the highest probability of the gold answer when prepended).

Hope that helps!

@khalidrizki01
Copy link

Hello, I'm also interested in this topic.

You mention that in case there are multiple sentences with EM equals to 1, you would select the one sentence that would resulted in the highest probability of generating the gold answer. How is this probability measured exactly? Also, may I know the code necessary to make the pre-processed training data? That would be really helpful.

Thank you in advanced!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants