Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Dataset Construction and Cover Formula in the Paper #5

Open
geuk-hub opened this issue Dec 2, 2024 · 2 comments
Open

Comments

@geuk-hub
Copy link

geuk-hub commented Dec 2, 2024

Dear authors,

Thank you for sharing this valuable research and the corresponding codebase. While going through the paper, I had a couple of questions and wanted to clarify some points:

1. Regarding Section 3.1 (Dataset Construction) - Removing Existing Objects:

The paper mentions selecting 6 negative samples in this section. However, this appears to assume only 3 objects are present in the segmented objects. When using the SEEM model for segmentation, does it consistently produce only 3 segmented objects? If not, how is this handled in cases where there are fewer or more than 3 objects?

2. Regarding the Cover Formula:

The paper defines the Cover metric as (captions w/ hallucinated objects) / (ground truth objects). However, if the numerator represents the number of captions containing hallucinated objects, wouldn't a lower Cover value be better? Yet, the experimental results suggest that a higher Cover value is considered better. Could you clarify if there’s a discrepancy in the formula or its interpretation?
In the paper "AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation", the Cover metric is calculated as (model-predicted objects ∩ ground truth objects) / (ground truth objects). Am I misunderstanding the definition or the intended interpretation in this work?

Thank you for taking the time to address my questions. I truly appreciate the effort you’ve put into creating such an insightful and impactful piece of research.

@Yufang-Liu
Copy link
Owner

Thank you for your attention to our work.
Regarding the first question, we excluded the cases where the SEEM model extracted fewer than three segmented objects. For cases with more than three objects, only the first three were selected.
As for the second question, upon careful review, it seems there was an error in our paper. We intended to write (captions w/o hallucinated objects) / (ground truth objects). The script used for the calculations is directly from AMBER's repository: https://github.com/junyangwang0410/AMBER/blob/master/inference.py, and thus aligns with the definitions used in the AMBER paper.

@geuk-hub
Copy link
Author

geuk-hub commented Dec 3, 2024

Thank you for your clear and detailed response. Your clarifications have resolved my doubts entirely.

Thank you again for your valuable research!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants