Pipelines for Validating LLM Outputs #67

S1M0N38 · 2024-06-06T12:10:21Z

S1M0N38
Jun 6, 2024

I am interested in using the open-webui interface to validate the outputs of a large language model (LLM).

I have a collection of text snippets that I would like to classify using an LLM based on a system prompt (e.g., "Is the following text a positive or negative review?"). The goal is to build a small dataset to evaluate classification capabilities. This dataset can later be used for testing classification with new system prompts or for training a text classifier.

I aim to automate the process as much as possible, automatically feeding new text to the LLM so that the human evaluator only needs to press 👍 or 👎 in the interface.

I understand that this method of building a dataset for binary text classification is somewhat convoluted (the simpler way is to manually open text snippets with a text editor and annotate them in a table-like structure). However, in the future, I plan to extend the LLM output evaluation beyond binary classification (e.g., using annotation reasons) and to have multiple contributors for the annotations.

Is this a use case for pipelines?

tjbck · 2024-06-06T13:52:07Z

tjbck
Jun 6, 2024
Maintainer

Great Idea! Should be feasible with the current state of pipelines, I'd love to see a PR for this!

0 replies

S1M0N38 · 2024-06-07T09:30:15Z

S1M0N38
Jun 7, 2024
Author

I’m trying to come up with a proper chat structure.

My idea is that the user is constantly exposed to new text snippets.

For example, when the user sends the text "/next" (or another command), the pipe function is triggered, a text snippet is drawn from the dataset, and shown to the user on the UI as an LLM message. Then, I'd like to run the LLM on that very text output and return the text with a classification result (e.g., a JSON text like {result: true}).

Now the user can validate the output with 👍 or 👎.
Then, the user can send another "/next" message to validate the next datapoint.

At any point, the user can export the chat (or inspect the open-webui dataset) and parse it, expecting the following message history structure:

["/next", x_1, y_1, "/next", x_2, y_2, ...]

I think that this implementation is not very good. But:

it is scalable beyond simple categorical JSON output
it is easy to parse the message history.

I’m wondering if there is a way to:

Send multiple subsequent LLM messages.
Manually overwrite the chat message and see changes reflected in the UI.

I know that I’m overengineering this simple process, but I'd like it to be scalable to other types of validation and compatible with the OpenAI chat history specification.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipelines for Validating LLM Outputs #67

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Pipelines for Validating LLM Outputs #67

S1M0N38 Jun 6, 2024

Replies: 2 comments

tjbck Jun 6, 2024 Maintainer

S1M0N38 Jun 7, 2024 Author

S1M0N38
Jun 6, 2024

tjbck
Jun 6, 2024
Maintainer

S1M0N38
Jun 7, 2024
Author