Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" #1784

kirdreamer · 2024-12-22T18:56:14Z

[v] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
I have the same problem as trish11953 from the issue #1770 if I only use the FactualCorrectness or LLMContextPrecisionWithReference (1 out of 5 times) metric. Maybe it's also somehow similar to the issue with Faithfulness?

I also checked metrics LLMContextRecall and SemanticSimilarity - they work perfect each time.
UPD: I checked metric LLMContextRecall again and received the same error, that was shown in error trace. I think it's a common problem for all metrics.

Ragas version: 0.2.8
Python version: 3.12.2

Code to Reproduce
...

eval_dataset = EvaluationDataset.from_pandas(df)
metrics = [
    #LLMContextRecall(llm=generator_llm), 
    FactualCorrectness(llm=generator_llm), 
    #Faithfulness(llm=generator_llm),
    #LLMContextPrecisionWithReference(llm=generator_llm),
    #SemanticSimilarity(embeddings=generator_embeddings)
]

results = evaluate(dataset=eval_dataset, metrics=metrics)

Error trace

Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt claim_decomposition_prompt failed to parse output: The output parser failed to parse the output including retries.
Exception raised in Job[1]: RagasOutputParserException(The output parser failed to parse the output including retries.)
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:12<00:00, 36.36s/it] 
Traceback (most recent call last):
  File "my-project\backend\ragas\ragas_evaluation.py", line 66, in <module>
    results = evaluate(dataset=eval_dataset, metrics=metrics)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "my-project\venv\Lib\site-packages\ragas\_analytics.py", line 205, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "my-project\venv\Lib\site-packages\ragas\evaluation.py", line 333, in evaluate
    result = EvaluationResult(
             ^^^^^^^^^^^^^^^^^
  File "<string>", line 10, in __init__
  File "my-project\venv\Lib\site-packages\ragas\dataset_schema.py", line 410, in __post_init__
    self.traces = parse_run_traces(self.ragas_traces, run_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "my-project\venv\Lib\site-packages\ragas\callbacks.py", line 167, in parse_run_traces
    "output": prompt_trace.outputs.get("output", {})[0],
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
KeyError: 0

Expected behavior
Output the top evaluations results

The text was updated successfully, but these errors were encountered:

Austin-QW · 2024-12-23T14:36:49Z

Hi, there~ I also met the same question as you. I failed in "Faithfulness", so I comment this metric and use the others. Then I successed! Here is the code snippet:
metrics = [
LLMContextRecall(llm=evaluator_llm),
FactualCorrectness(llm=evaluator_llm),
# Faithfulness(llm=evaluator_llm),
SemanticSimilarity(embeddings=evaluator_embeddings)
]
results = evaluate(dataset=dataset, metrics=metrics)

kirdreamer · 2024-12-23T15:05:38Z

Hi @Austin-QW, thanks for your response!
I also met the same problem with the metric "Faithfulness", but even when I commented it out, the metric "FactualCorrectness" doesn't work at all, and the metric "LLMContextPrecisionWithReference" works in about 80% of all attempts

neverlatetolearn0 · 2024-12-24T05:10:26Z

I got Nan on the “Faithfulness” , anyone else experiencing the same problem

kirdreamer · 2024-12-24T13:21:09Z

@neverlatetolearn0 following shahules786's explanation from #1773 NaN means undetermined.
I would suggest just to reevaluate your RAG for the questions for which NaN was returned, or if necessary, divide the test set into smaller pieces and evaluate them independently.

kirdreamer · 2024-12-24T19:38:07Z

Problem was fixed since 0.2.9

kirdreamer added the bug Something isn't working label Dec 22, 2024

dosubot bot added the module-metrics this is part of metrics module label Dec 22, 2024

kirdreamer changed the title ~~KeyError: 0 for Metrics FactualCorrectness and LLMContextPrecisionWithReference during "Run ragas metrics for evaluating RAG"~~ Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" Dec 24, 2024

kirdreamer closed this as completed Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" #1784

Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" #1784

kirdreamer commented Dec 22, 2024 •

edited

Loading

Austin-QW commented Dec 23, 2024

kirdreamer commented Dec 23, 2024

neverlatetolearn0 commented Dec 24, 2024

kirdreamer commented Dec 24, 2024

kirdreamer commented Dec 24, 2024

Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" #1784

Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" #1784

Comments

kirdreamer commented Dec 22, 2024 • edited Loading

Austin-QW commented Dec 23, 2024

kirdreamer commented Dec 23, 2024

neverlatetolearn0 commented Dec 24, 2024

kirdreamer commented Dec 24, 2024

kirdreamer commented Dec 24, 2024

kirdreamer commented Dec 22, 2024 •

edited

Loading