Improving Answer Relevancy Template #1137

dipanjanS · 2024-11-03T10:46:24Z

This is just a request in terms of if we could make the answer relevancy a bit more strict or have a way to control it especially if we have 'idk' verdicts being generated which go into the reasoning prompt (based on the source code I checked out earlier)?

Example RAG generation:

Eval Result:

penguine-ip · 2024-11-03T17:37:21Z

Hey @dipanjanS , the example you showed is actually what we intended to happen. A few months ago when the metric was being built this example when being confined to yes or no will actually output 'no'. I don't know about you but I think the verdict should be 'yes'. So what we did was we made 'idk' and 'yes' vs 'no' to make these statements acceptable, is this something you don't want to happen?

dipanjanS · 2024-11-03T19:23:55Z

Hey @penguine-ip thanks for getting back, yes I didn't think that it would be a bug hence I put it more as a feature request. The problem here is, given a lot of this is of course based on relevancy different folks might have different opinion.

In general what I felt was if the response is 'I don't know' I'm not sure if we can really treat the answer as relevant, example in case of the following one its easy to check for relevancy as we have some definite statements to compare with the input query.

However when the RAG fails to answer because of lack of context or lack of issues of the LLM not being able to generate the response and says it doesn't know maybe we should treat that as a not a relevant statement w.r.t question as it is either the shortcoming of the retriever or generator leading to that. (Note: this is the LLM saying I dont know as the answer; not the idk which you generate as a part of your verdicts which is fine)

In my initial example earlier, if you check the reasoning above also where it says the response is completely relevant and addresses the question, that is kind of misleading as it doesn't really address the question. So my thought was rather than getting rid of 'idk' is there a way we could control it using a setting to either treat it as a 'no'or 'yes' or somewhere in between, then people can take a call and set it up as they wish to.

I mean I can kind of do it manually as I understand your source code (kudos on the excellent code btw its really easy to understand) but this kind of a function argument or setting might be useful even for other metrics. Happy to hear your thoughts.

penguine-ip · 2024-11-06T13:30:18Z

Hey @dipanjanS i think it would be a good idea to have users able to set it themselves. I think the easiest way is to have a way for users to inject each metrics with their own examples for in-context learning. For example, instead of having our default ones here: https://github.com/confident-ai/deepeval/blob/main/deepeval/metrics/answer_relevancy/template.py#L35, when a user initializes a metric with their own list of verdict examples we'll replace our example with the verdicts. So, if you wish to have LLM responses that says "idk" to be marked verdict, you could do so, and so could other users that wish something different.

The rules would still be the same, 'yes's and 'idk's will be counted relevant, but what you could do in the improved metrics is to define what is 'yes' and 'idk'

How would that sound?

dipanjanS · 2024-11-07T09:19:01Z

Yes, actually that sounds pretty good to me, have a way to easily replace those examples with their own list if necessary making the decision making process a bit more flexible based on user requirements or conditions. The overall metric would still remain the same in that way. Regards DJ

…

On Wed, Nov 6, 2024 at 7:00 PM Jeffrey Ip ***@***.***> wrote: Hey @dipanjanS <https://github.com/dipanjanS> i think it would be a good idea to have users able to set it themselves. I think the easiest way is to have a way for users to inject each metrics with their own examples for in-context learning. For example, instead of having our default ones here: https://github.com/confident-ai/deepeval/blob/main/deepeval/metrics/answer_relevancy/template.py#L35, when a user initializes a metric with their own list of verdict examples we'll replace our example with the verdicts. So, if you wish to have LLM responses that says "idk" to be marked verdict, you could do so, and so could other users that wish something different. How would that sound? — Reply to this email directly, view it on GitHub <#1137 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2J3R3ETOGQB3A2X376ZO3Z7IKYBAVCNFSM6AAAAABRCTU4BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZG42TSMRXGY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Answer Relevancy Template #1137

Improving Answer Relevancy Template #1137

dipanjanS commented Nov 3, 2024

penguine-ip commented Nov 3, 2024

dipanjanS commented Nov 3, 2024 •

edited

Loading

penguine-ip commented Nov 6, 2024 •

edited

Loading

dipanjanS commented Nov 7, 2024 via email

Improving Answer Relevancy Template #1137

Improving Answer Relevancy Template #1137

Comments

dipanjanS commented Nov 3, 2024

penguine-ip commented Nov 3, 2024

dipanjanS commented Nov 3, 2024 • edited Loading

penguine-ip commented Nov 6, 2024 • edited Loading

dipanjanS commented Nov 7, 2024 via email

dipanjanS commented Nov 3, 2024 •

edited

Loading

penguine-ip commented Nov 6, 2024 •

edited

Loading