Skip to content

Commit

Permalink
updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
penguine-ip committed Nov 27, 2024
1 parent 91f7f2d commit 878dd0d
Show file tree
Hide file tree
Showing 4 changed files with 143 additions and 2 deletions.
77 changes: 77 additions & 0 deletions docs/docs/metrics-json-correctness.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
id: metrics-json-correctness
title: Json Correctness
sidebar_label: Json Correctness
---

import Equation from "@site/src/components/equation";

The json correctness metric measures whether your LLM application is able to generate `actual_output`s with the correct **json schema**.

:::note

The `JsonCorrectnessMetric` like the `ToolCorrectnessMetric` is not an LLM-eval, and you'll have to supply your expected Json schema when creating a `JsonCorrectnessMetric`.

:::

## Required Arguments

To use the `JsonCorrectnessMetric`, you'll have to provide the following arguments when creating an `LLMTestCase`:

- `input`
- `actual_output`

## Example

```python
from pydantic import BaseModel

from deepeval import evaluate
from deepeval.metrics import JsonCorrectnessMetric
from deepeval.test_case import LLMTestCase

class ExampleSchema(BaseModel):
name: str

metric = JsonCorrectnessMetric(
expected_schema=ExampleSchema,
model="gpt-4",
include_reason=True
)
test_case = LLMTestCase(
input="Output me a random Json with the 'name' key",
# Replace this with the actual output from your LLM application
actual_output="{'name': null}"
)

metric.measure(test_case)
print(metric.score)
print(metric.reason)
```

There are one mandatory and six optional parameters when creating an `PromptAlignmentMetric`:

- `expected_schema`: a `pydantic` `BaseModel` specifying the schema of the Json that is expected from your LLM.
- [Optional] `threshold`: a float representing the minimum passing threshold, defaulted to 0.5.
- [Optional] `model`: a string specifying which of OpenAI's GPT models to use to generate reasons, **OR** [any custom LLM model](metrics-introduction#using-a-custom-llm) of type `DeepEvalBaseLLM`. Defaulted to 'gpt-4o'.
- [Optional] `include_reason`: a boolean which when set to `True`, will include a reason for its evaluation score. Defaulted to `True`.
- [Optional] `strict_mode`: a boolean which when set to `True`, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to `False`.
- [Optional] `async_mode`: a boolean which when set to `True`, enables [concurrent execution within the `measure()` method.](metrics-introduction#measuring-a-metric-in-async) Defaulted to `True`.
- [Optional] `verbose_mode`: a boolean which when set to `True`, prints the intermediate steps used to calculate said metric to the console, as outlined in the [How Is It Calculated](#how-is-it-calculated) section. Defaulted to `False`.

:::info
Unlike other metrics, the `model` is used for generating reason instead of evaluation. It will only be used if the `actual_output` has the wrong schema, **AND** if `include_reason` is set to `True`.
:::

## How Is It Calculated?

The `PromptAlignmentMetric` score is calculated according to the following equation:

<Equation
formula="\text{Json Corectness} = \begin{cases}
1 & \text{If the actual output fits the expected schema}, \\
0 & \text{Otherwise}
\end{cases}"
/>

The `JsonCorrectnessMetric` does not use an LLM for evaluation and instead uses the provided `expected_schema` to determine whether the `actual_output` can be loaded into the schema.
63 changes: 63 additions & 0 deletions docs/docs/metrics-prompt-alignment.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
id: metrics-prompt-alignment
title: Prompt Alignment
sidebar_label: Prompt Alignment
---

import Equation from "@site/src/components/equation";

The prompt alignment metric measures whether your LLM application is able to generate `actual_output`s that aligns with any **instructions** specified in your prompt template. `deepeval`'s prompt alignment metric is a self-explaining LLM-Eval, meaning it outputs a reason for its metric score.

## Required Arguments

To use the `PromptAlignmentMetric`, you'll have to provide the following arguments when creating an `LLMTestCase`:

- `input`
- `actual_output`

## Example

```python
from deepeval import evaluate
from deepeval.metrics import PromptAlignmentMetric
from deepeval.test_case import LLMTestCase

metric = PromptAlignmentMetric(
prompt_instructions=["Reply in all uppercase"],
model="gpt-4",
include_reason=True
)
test_case = LLMTestCase(
input="What if these shoes don't fit?",
# Replace this with the actual output from your LLM application
actual_output="We offer a 30-day full refund at no extra cost."
)

metric.measure(test_case)
print(metric.score)
print(metric.reason)
```

There are one mandatory and six optional parameters when creating an `PromptAlignmentMetric`:

- `prompt_instructions`: a list of strings specifying the instructions you want followed in your prompt template.
- [Optional] `threshold`: a float representing the minimum passing threshold, defaulted to 0.5.
- [Optional] `model`: a string specifying which of OpenAI's GPT models to use, **OR** [any custom LLM model](metrics-introduction#using-a-custom-llm) of type `DeepEvalBaseLLM`. Defaulted to 'gpt-4o'.
- [Optional] `include_reason`: a boolean which when set to `True`, will include a reason for its evaluation score. Defaulted to `True`.
- [Optional] `strict_mode`: a boolean which when set to `True`, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to `False`.
- [Optional] `async_mode`: a boolean which when set to `True`, enables [concurrent execution within the `measure()` method.](metrics-introduction#measuring-a-metric-in-async) Defaulted to `True`.
- [Optional] `verbose_mode`: a boolean which when set to `True`, prints the intermediate steps used to calculate said metric to the console, as outlined in the [How Is It Calculated](#how-is-it-calculated) section. Defaulted to `False`.

## How Is It Calculated?

The `PromptAlignmentMetric` score is calculated according to the following equation:

<Equation formula="\text{Prompt Alignment} = \frac{\text{Number of Instructions Followed}}{\text{Total Number of Instructions}}" />

The `PromptAlignmentMetric` uses an LLM to classify whether each prompt instruction is followed in the `actual_output` using additional context from the `input`.

:::tip

By providing an initial list of `prompt_instructions` instead of the entire prompt template, the `PromptAlignmentMetric` is able to more accurately determine whether the core instructions laid out in your prompt template is followed.

:::
4 changes: 3 additions & 1 deletion docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,15 @@ module.exports = {
items: [
"metrics-introduction",
"metrics-llm-evals",
"metrics-summarization",
"metrics-prompt-alignment",
"metrics-answer-relevancy",
"metrics-faithfulness",
"metrics-contextual-precision",
"metrics-contextual-recall",
"metrics-contextual-relevancy",
"metrics-json-correctness",
"metrics-tool-correctness",
"metrics-summarization",
"metrics-hallucination",
"metrics-bias",
"metrics-toxicity",
Expand Down
1 change: 0 additions & 1 deletion docs/src/components/equation.jsx
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import React from 'react';
import styles from './index.module.css';
import katex from 'katex';

function Equation(props) {
Expand Down

0 comments on commit 878dd0d

Please sign in to comment.