List of appropriate metrics to test #1

klh5 · 2024-11-08T15:23:38Z

No description provided.

jack89roberts · 2024-11-13T09:15:08Z

Starting pitch:

BLEU (baseline)
ROUGE-S / any other skip-gram or conventional metric of interest that may improve on BLEU (e.g. METEOR, CHRF)
BLASER 2.0
CometKiwi (reference-based variant) / other model-based/text translation-oriented metric of choice (e.g. critical error detection?)

and

klh5 · 2024-11-13T10:05:06Z

It doesn't include BLASER as it's from 2023 but this paper also has a nice taxonomy of metrics which we could re-use.

jack89roberts · 2024-11-15T13:44:03Z

SPICE is using one of these as part of: https://arxiv.org/pdf/2405.13845

klh5 assigned jack89roberts Nov 8, 2024

klh5 added the research label Nov 12, 2024

Provide feedback