You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for sharing this valuable research and the corresponding codebase. While going through the paper, I had a couple of questions and wanted to clarify some points:
The paper mentions selecting 6 negative samples in this section. However, this appears to assume only 3 objects are present in the segmented objects. When using the SEEM model for segmentation, does it consistently produce only 3 segmented objects? If not, how is this handled in cases where there are fewer or more than 3 objects?
2. Regarding the Cover Formula:
The paper defines the Cover metric as (captions w/ hallucinated objects) / (ground truth objects). However, if the numerator represents the number of captions containing hallucinated objects, wouldn't a lower Cover value be better? Yet, the experimental results suggest that a higher Cover value is considered better. Could you clarify if there’s a discrepancy in the formula or its interpretation?
In the paper "AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation", the Cover metric is calculated as (model-predicted objects ∩ ground truth objects) / (ground truth objects). Am I misunderstanding the definition or the intended interpretation in this work?
Thank you for taking the time to address my questions. I truly appreciate the effort you’ve put into creating such an insightful and impactful piece of research.
The text was updated successfully, but these errors were encountered:
Thank you for your attention to our work.
Regarding the first question, we excluded the cases where the SEEM model extracted fewer than three segmented objects. For cases with more than three objects, only the first three were selected.
As for the second question, upon careful review, it seems there was an error in our paper. We intended to write (captions w/o hallucinated objects) / (ground truth objects). The script used for the calculations is directly from AMBER's repository: https://github.com/junyangwang0410/AMBER/blob/master/inference.py, and thus aligns with the definitions used in the AMBER paper.
Dear authors,
Thank you for sharing this valuable research and the corresponding codebase. While going through the paper, I had a couple of questions and wanted to clarify some points:
1. Regarding Section 3.1 (Dataset Construction) - Removing Existing Objects:
The paper mentions selecting 6 negative samples in this section. However, this appears to assume only 3 objects are present in the segmented objects. When using the SEEM model for segmentation, does it consistently produce only 3 segmented objects? If not, how is this handled in cases where there are fewer or more than 3 objects?
2. Regarding the Cover Formula:
The paper defines the Cover metric as (captions w/ hallucinated objects) / (ground truth objects). However, if the numerator represents the number of captions containing hallucinated objects, wouldn't a lower Cover value be better? Yet, the experimental results suggest that a higher Cover value is considered better. Could you clarify if there’s a discrepancy in the formula or its interpretation?
In the paper "AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation", the Cover metric is calculated as (model-predicted objects ∩ ground truth objects) / (ground truth objects). Am I misunderstanding the definition or the intended interpretation in this work?
Thank you for taking the time to address my questions. I truly appreciate the effort you’ve put into creating such an insightful and impactful piece of research.
The text was updated successfully, but these errors were encountered: