-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving Answer Relevancy Template #1137
Comments
Hey @dipanjanS , the example you showed is actually what we intended to happen. A few months ago when the metric was being built this example when being confined to yes or no will actually output 'no'. I don't know about you but I think the verdict should be 'yes'. So what we did was we made 'idk' and 'yes' vs 'no' to make these statements acceptable, is this something you don't want to happen? |
Hey @penguine-ip thanks for getting back, yes I didn't think that it would be a bug hence I put it more as a feature request. The problem here is, given a lot of this is of course based on relevancy different folks might have different opinion. In general what I felt was if the response is 'I don't know' I'm not sure if we can really treat the answer as relevant, example in case of the following one its easy to check for relevancy as we have some definite statements to compare with the input query. However when the RAG fails to answer because of lack of context or lack of issues of the LLM not being able to generate the response and says it doesn't know maybe we should treat that as a not a relevant statement w.r.t question as it is either the shortcoming of the retriever or generator leading to that. (Note: this is the LLM saying I dont know as the answer; not the idk which you generate as a part of your verdicts which is fine) In my initial example earlier, if you check the reasoning above also where it says the response is completely relevant and addresses the question, that is kind of misleading as it doesn't really address the question. So my thought was rather than getting rid of 'idk' is there a way we could control it using a setting to either treat it as a 'no'or 'yes' or somewhere in between, then people can take a call and set it up as they wish to. I mean I can kind of do it manually as I understand your source code (kudos on the excellent code btw its really easy to understand) but this kind of a function argument or setting might be useful even for other metrics. Happy to hear your thoughts. |
Hey @dipanjanS i think it would be a good idea to have users able to set it themselves. I think the easiest way is to have a way for users to inject each metrics with their own examples for in-context learning. For example, instead of having our default ones here: https://github.com/confident-ai/deepeval/blob/main/deepeval/metrics/answer_relevancy/template.py#L35, when a user initializes a metric with their own list of verdict examples we'll replace our example with the verdicts. So, if you wish to have LLM responses that says "idk" to be marked verdict, you could do so, and so could other users that wish something different. The rules would still be the same, 'yes's and 'idk's will be counted relevant, but what you could do in the improved metrics is to define what is 'yes' and 'idk' How would that sound? |
Yes, actually that sounds pretty good to me, have a way to easily replace
those examples with their own list if necessary making the decision making
process a bit more flexible based on user requirements or conditions. The
overall metric would still remain the same in that way.
Regards
DJ
…On Wed, Nov 6, 2024 at 7:00 PM Jeffrey Ip ***@***.***> wrote:
Hey @dipanjanS <https://github.com/dipanjanS> i think it would be a good
idea to have users able to set it themselves. I think the easiest way is to
have a way for users to inject each metrics with their own examples for
in-context learning. For example, instead of having our default ones here:
https://github.com/confident-ai/deepeval/blob/main/deepeval/metrics/answer_relevancy/template.py#L35,
when a user initializes a metric with their own list of verdict examples
we'll replace our example with the verdicts. So, if you wish to have LLM
responses that says "idk" to be marked verdict, you could do so, and so
could other users that wish something different.
How would that sound?
—
Reply to this email directly, view it on GitHub
<#1137 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2J3R3ETOGQB3A2X376ZO3Z7IKYBAVCNFSM6AAAAABRCTU4BWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJZG42TSMRXGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This is just a request in terms of if we could make the answer relevancy a bit more strict or have a way to control it especially if we have 'idk' verdicts being generated which go into the reasoning prompt (based on the source code I checked out earlier)?
Example RAG generation:
Eval Result:
The text was updated successfully, but these errors were encountered: