Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" #1784

Closed
kirdreamer opened this issue Dec 22, 2024 · 5 comments
Labels
bug Something isn't working module-metrics this is part of metrics module

Comments

@kirdreamer
Copy link

kirdreamer commented Dec 22, 2024

[v] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
I have the same problem as trish11953 from the issue #1770 if I only use the FactualCorrectness or LLMContextPrecisionWithReference (1 out of 5 times) metric. Maybe it's also somehow similar to the issue with Faithfulness?

I also checked metrics LLMContextRecall and SemanticSimilarity - they work perfect each time.
UPD: I checked metric LLMContextRecall again and received the same error, that was shown in error trace. I think it's a common problem for all metrics.

Ragas version: 0.2.8
Python version: 3.12.2

Code to Reproduce
...

eval_dataset = EvaluationDataset.from_pandas(df)
metrics = [
    #LLMContextRecall(llm=generator_llm), 
    FactualCorrectness(llm=generator_llm), 
    #Faithfulness(llm=generator_llm),
    #LLMContextPrecisionWithReference(llm=generator_llm),
    #SemanticSimilarity(embeddings=generator_embeddings)
]

results = evaluate(dataset=eval_dataset, metrics=metrics)

Error trace

Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt fix_output_format failed to parse output: The output parser failed to parse the output including retries.
Prompt claim_decomposition_prompt failed to parse output: The output parser failed to parse the output including retries.
Exception raised in Job[1]: RagasOutputParserException(The output parser failed to parse the output including retries.)
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:12<00:00, 36.36s/it] 
Traceback (most recent call last):
  File "my-project\backend\ragas\ragas_evaluation.py", line 66, in <module>
    results = evaluate(dataset=eval_dataset, metrics=metrics)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "my-project\venv\Lib\site-packages\ragas\_analytics.py", line 205, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "my-project\venv\Lib\site-packages\ragas\evaluation.py", line 333, in evaluate
    result = EvaluationResult(
             ^^^^^^^^^^^^^^^^^
  File "<string>", line 10, in __init__
  File "my-project\venv\Lib\site-packages\ragas\dataset_schema.py", line 410, in __post_init__
    self.traces = parse_run_traces(self.ragas_traces, run_id)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "my-project\venv\Lib\site-packages\ragas\callbacks.py", line 167, in parse_run_traces
    "output": prompt_trace.outputs.get("output", {})[0],
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
KeyError: 0

Expected behavior
Output the top evaluations results

@kirdreamer kirdreamer added the bug Something isn't working label Dec 22, 2024
@dosubot dosubot bot added the module-metrics this is part of metrics module label Dec 22, 2024
@Austin-QW
Copy link

Hi, there~ I also met the same question as you. I failed in "Faithfulness", so I comment this metric and use the others. Then I successed! Here is the code snippet:
metrics = [
LLMContextRecall(llm=evaluator_llm),
FactualCorrectness(llm=evaluator_llm),
# Faithfulness(llm=evaluator_llm),
SemanticSimilarity(embeddings=evaluator_embeddings)
]
results = evaluate(dataset=dataset, metrics=metrics)

@kirdreamer
Copy link
Author

Hi @Austin-QW, thanks for your response!
I also met the same problem with the metric "Faithfulness", but even when I commented it out, the metric "FactualCorrectness" doesn't work at all, and the metric "LLMContextPrecisionWithReference" works in about 80% of all attempts

@neverlatetolearn0
Copy link

I got Nan on the “Faithfulness” , anyone else experiencing the same problem

@kirdreamer kirdreamer changed the title KeyError: 0 for Metrics FactualCorrectness and LLMContextPrecisionWithReference during "Run ragas metrics for evaluating RAG" Common Problem: KeyError: 0 for Metrics during "Run ragas metrics for evaluating RAG" Dec 24, 2024
@kirdreamer
Copy link
Author

@neverlatetolearn0 following shahules786's explanation from #1773 NaN means undetermined.
I would suggest just to reevaluate your RAG for the questions for which NaN was returned, or if necessary, divide the test set into smaller pieces and evaluate them independently.

@kirdreamer
Copy link
Author

Problem was fixed since 0.2.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module-metrics this is part of metrics module
Projects
None yet
Development

No branches or pull requests

3 participants