Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while using Gemini models with RAGAS #1632

Closed
a-s-poorna opened this issue Nov 6, 2024 · 5 comments
Closed

Error while using Gemini models with RAGAS #1632

a-s-poorna opened this issue Nov 6, 2024 · 5 comments
Labels
bug Something isn't working question Further information is requested

Comments

@a-s-poorna
Copy link

a-s-poorna commented Nov 6, 2024

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question
How to integrate Gemini models with RAGAS without facing any error

Code Examples

async def get_additional_metrics(question: str,contexts: list, answers: list,  reference: str, model_name):
   """Calculates multiple metrics for given question, answers, contexts, and reference."""
   if ("diffbot" in model_name) or ("ollama" in model_name):
       raise ValueError(f"Unsupported model for evaluation: {model_name}")
   else:
       llm, model_name = get_llm(model=model_name)
   ragas_llm = LangchainLLMWrapper(llm)   
   embeddings = EMBEDDING_FUNCTION
   embedding_model = LangchainEmbeddingsWrapper(embeddings=embeddings)
   #bleu_scorer = BleuScore()
   rouge_scorer = RougeScore()
   factual_scorer = FactualCorrectness()
   semantic_scorer = SemanticSimilarity()
   entity_recall_scorer = ContextEntityRecall()
   factual_scorer.llm = ragas_llm
   entity_recall_scorer.llm = ragas_llm
   semantic_scorer.embeddings = embedding_model
   metrics = []
   for response, context in zip(answers, contexts):
       sample = SingleTurnSample(response=response, reference=reference)
       #bleu_score = await bleu_scorer.single_turn_ascore(sample)
       rouge_score = await rouge_scorer.single_turn_ascore(sample)
       fact_score = await factual_scorer.single_turn_ascore(sample)
       semantic_score = await semantic_scorer.single_turn_ascore(sample)
       entity_sample = SingleTurnSample(reference=reference, retrieved_contexts=[context])
       entity_recall_score = await entity_recall_scorer.single_turn_ascore(entity_sample)
       metrics.append({
           #"bleu_score": bleu_score,
           "rouge_score": rouge_score,
           "fact_score": fact_score,
           "semantic_score": semantic_score,
           "context_entity_recall_score": entity_recall_score
       })
   print("Metrics  :",metrics) 
   return metrics  

Additional context
We are following the official documentation of RAGAS and built it as it is . for llm we are trying to pass Gemini 1.5 pro and Gemini 1.5 flash but getting
The LLM generation was not completed. Please increase try increasing the max_tokens and try again. (Error) even for least size of tokens.

@a-s-poorna a-s-poorna added the question Further information is requested label Nov 6, 2024
@dosubot dosubot bot added the bug Something isn't working label Nov 6, 2024
@jjmachan
Copy link
Member

jjmachan commented Nov 7, 2024

@a-s-poorna thanks for reporting this. This does mean the llm failed in generation but it could be an error on our end. Do you use any tracing tools?

@a-s-poorna
Copy link
Author

Hi @jjmachan
Can you try the simple snippet and get back to me on this

Code

from ragas.dataset_schema import SingleTurnSample
from ragas.metrics._factual_correctness import FactualCorrectness
from ragas.llms import LangchainLLMWrapper
import google.auth
from langchain_google_vertexai import ChatVertexAI
 
model_name = "gemini-1.5-pro-002"
credentials, project_id = google.auth.default()
 
llm = ChatVertexAI(
            model_name=model_name,
            credentials=credentials,
            project=project_id,
            temperature=0)
 
sample = SingleTurnSample(
    response="The Eiffel Tower is located in Paris.",
    reference="The Eiffel Tower is located in Paris. I has a height of 1000ft."
)
 
scorer = FactualCorrectness()
scorer.llm =  LangchainLLMWrapper(llm)  
score = await scorer.single_turn_ascore(sample)
print(score)

The error i am facing is
File /opt/conda/envs/myenv310/lib/python3.10/site-packages/ragas/metrics/base.py:280, in SingleTurnMetric.single_turn_ascore(self, sample, callbacks, timeout) 273 rm, group_cm = new_group( 274 self.name, 275 inputs=sample.to_dict(), 276 callbacks=callbacks, 277 metadata={"type": ChainType.METRIC}, 278 ) 279 try:--> 280 score = await asyncio.wait_for( 281 self._single_turn_ascore(sample=sample, callbacks=group_cm), ... 109 if not self.is_finished(result):--> 110 raise LLMDidNotFinishException() 111 return result LLMDidNotFinishException: The LLM generation was not completed. Please increase try increasing the max_tokens and try again.

this is in ragas updated ragas==0.2.2version. When we were using ragas earlier we didn't faced this issue for gemini model

@jjmachan
Copy link
Member

jjmachan commented Nov 8, 2024

hey @a-s-poorna I will check it out but I don't have access to a gemini model of the back, will have to setup sometime to configure everything

what would help is having access to metadata for the response like so
image

any tracing tool will have this, this is from langsmith but you can also use arize which runs local

@cymarechal-devoteam
Copy link

cymarechal-devoteam commented Dec 1, 2024

Hey @jjmachan ,

I investigated using Arize and Gemini's finish_reason is in uppercase (STOP).
You can find the corresponding documentation here:
https://cloud.google.com/vertex-ai/generative-ai/docs/reference/python/latest/vertexai.generative_models.FinishReason

The is_finished method of LangchainLLMWrapper currently only checks hardcoded "stop" and "end_turn". (cc: issue #1548).

As a workaround, one can bypass the issue by providing the following is_finished_parser parameter to LangchainLLMWrapper:

def custom_is_finished_parser(response: LLMResult):
    is_finished_list = []
    for g in response.flatten():
        resp = g.generations[0][0]
        if resp.generation_info is not None:
            # generation_info is provided - so we parse that

            # Gemini uses "STOP" to indicate that the generation is finished
            # and is stored in 'finish_reason' key in generation_info
            if resp.generation_info.get("finish_reason") is not None:
                is_finished_list.append(
                    resp.generation_info.get("finish_reason") == "STOP"
                )

        # if generation_info is empty, we parse the response_metadata
        # this is less reliable
        elif (
            isinstance(resp, ChatGeneration)
            and t.cast(ChatGeneration, resp).message is not None
        ):
            resp_message: BaseMessage = t.cast(ChatGeneration, resp).message
            if resp_message.response_metadata.get("finish_reason") is not None:
                is_finished_list.append(
                    resp_message.response_metadata.get("finish_reason") == "STOP"
                )
        # default to True
        else:
            is_finished_list.append(True)
    return all(is_finished_list)


...

ragas_llm = LangchainLLMWrapper(
    llm,
    is_finished_parser=custom_is_finished_parser,
)

Best,

@a-s-poorna
Copy link
Author

Thank you for the help @cymarechal-devoteam
The Evaluation is working with this code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants