Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Answer Relevancy Template #1137

Open
dipanjanS opened this issue Nov 3, 2024 · 4 comments
Open

Improving Answer Relevancy Template #1137

dipanjanS opened this issue Nov 3, 2024 · 4 comments

Comments

@dipanjanS
Copy link

This is just a request in terms of if we could make the answer relevancy a bit more strict or have a way to control it especially if we have 'idk' verdicts being generated which go into the reasoning prompt (based on the source code I checked out earlier)?

Example RAG generation:
image

Eval Result:
image

@penguine-ip
Copy link
Contributor

Hey @dipanjanS , the example you showed is actually what we intended to happen. A few months ago when the metric was being built this example when being confined to yes or no will actually output 'no'. I don't know about you but I think the verdict should be 'yes'. So what we did was we made 'idk' and 'yes' vs 'no' to make these statements acceptable, is this something you don't want to happen?

@dipanjanS
Copy link
Author

dipanjanS commented Nov 3, 2024

Hey @penguine-ip thanks for getting back, yes I didn't think that it would be a bug hence I put it more as a feature request. The problem here is, given a lot of this is of course based on relevancy different folks might have different opinion.

In general what I felt was if the response is 'I don't know' I'm not sure if we can really treat the answer as relevant, example in case of the following one its easy to check for relevancy as we have some definite statements to compare with the input query.

image

However when the RAG fails to answer because of lack of context or lack of issues of the LLM not being able to generate the response and says it doesn't know maybe we should treat that as a not a relevant statement w.r.t question as it is either the shortcoming of the retriever or generator leading to that. (Note: this is the LLM saying I dont know as the answer; not the idk which you generate as a part of your verdicts which is fine)

In my initial example earlier, if you check the reasoning above also where it says the response is completely relevant and addresses the question, that is kind of misleading as it doesn't really address the question. So my thought was rather than getting rid of 'idk' is there a way we could control it using a setting to either treat it as a 'no'or 'yes' or somewhere in between, then people can take a call and set it up as they wish to.

I mean I can kind of do it manually as I understand your source code (kudos on the excellent code btw its really easy to understand) but this kind of a function argument or setting might be useful even for other metrics. Happy to hear your thoughts.

@penguine-ip
Copy link
Contributor

penguine-ip commented Nov 6, 2024

Hey @dipanjanS i think it would be a good idea to have users able to set it themselves. I think the easiest way is to have a way for users to inject each metrics with their own examples for in-context learning. For example, instead of having our default ones here: https://github.com/confident-ai/deepeval/blob/main/deepeval/metrics/answer_relevancy/template.py#L35, when a user initializes a metric with their own list of verdict examples we'll replace our example with the verdicts. So, if you wish to have LLM responses that says "idk" to be marked verdict, you could do so, and so could other users that wish something different.

The rules would still be the same, 'yes's and 'idk's will be counted relevant, but what you could do in the improved metrics is to define what is 'yes' and 'idk'

How would that sound?

@dipanjanS
Copy link
Author

dipanjanS commented Nov 7, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants