Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anthropic batch mode not available #225

Open
RyanMarten opened this issue Dec 7, 2024 · 11 comments
Open

Anthropic batch mode not available #225

RyanMarten opened this issue Dec 7, 2024 · 11 comments
Assignees

Comments

@RyanMarten RyanMarten self-assigned this Dec 7, 2024
@RyanMarten
Copy link
Contributor Author

Example (from the batch console welcome page)

import anthropic

client = anthropic.Anthropic()

message_batch = client.beta.messages.batches.create(
    requests=[
        {
            "custom_id": "first-prompt-in-my-batch",
            "params": {
                "model": "claude-3-5-haiku-20241022",
                "max_tokens": 100,
                "messages": [
                    {
                        "role": "user",
                        "content": "Hey Claude, tell me a short fun fact about video games!",
                    }
                ],
            },
        },
        {
            "custom_id": "second-prompt-in-my-batch",
            "params": {
                "model": "claude-3-5-sonnet-20241022",
                "max_tokens": 100,
                "messages": [
                    {
                        "role": "user",
                        "content": "Hey Claude, tell me a short fun fact about bees!",
                    }
                ],
            },
        },
    ]
)
print(message_batch)

Stdout

BetaMessageBatch(id='msgbatch_01XWYEcAqybHAWXqyinUyp8K', archived_at=None, cancel_initiated_at=None, created_at=datetime.datetime(2024, 12, 10, 21, 30, 23, 225753, tzinfo=datetime.timezone.utc), ended_at=None, expires_at=datetime.datetime(2024, 12, 11, 21, 30, 23, 225753, tzinfo=datetime.timezone.utc), processing_status='in_progress', request_counts=BetaMessageBatchRequestCounts(canceled=0, errored=0, expired=0, processing=2, succeeded=0), results_url=None, type='message_batch'

Batch Output

{"custom_id":"first-prompt-in-my-batch","result":{"type":"succeeded","message":{"id":"msg_014KfxurNm3n65CGkqUNTkCk","type":"message","role":"assistant","model":"claude-3-5-haiku-20241022","content":[{"type":"text","text":"Here's a fun video game fact: The first video game Easter egg was hidden in the Atari 2600 game Adventure in 1979. Created by programmer Warren Robinett, it was a hidden room with his name that players could only access through a secret sequence of actions."}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":20,"output_tokens":64}}}}
{"custom_id":"second-prompt-in-my-batch","result":{"type":"succeeded","message":{"id":"msg_01DLmwptRqXuVMJsdzgR4Ntp","type":"message","role":"assistant","model":"claude-3-5-sonnet-20241022","content":[{"type":"text","text":"Here's a fun fact: Bees can recognize human faces! Scientists have discovered that honey bees can be trained to remember and distinguish between different human facial features, despite having a brain about the size of a grass seed. They do this using a technique called \"configural processing\" - the same way humans process faces!"}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":20,"output_tokens":69}}}}

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 10, 2024

Details / meaningful differences from openai

https://docs.anthropic.com/en/docs/build-with-claude/message-batches

Different limits

A Message Batch is limited to either 10,000 Message requests or 32 MB in size, whichever is reached first.

List instead of file content

A unique custom_id for identifying the Messages request
A params object with the standard Messages API parameters
You can create a batch by passing this list into the requests parameter:

Different batch statuses

When a batch is first created, the response will have a processing status of in_progress. updated to ended once all the requests in the batch have finished processing, and results are ready.
in_progress, canceling, ended

Different request statuses

Once batch processing has ended, each Messages request in the batch will have a result. There are 4 result types: succeeded, errored, cancelled, expired. request_counts, which shows how many requests reached each of these four states.

Recommend streaming finished requests instead of downloading all of them

Results of the batch are available for download both in the Console and at the results_url on the Message Batch. Because of the potentially large size of the results, it’s recommended to stream results back rather than download them all at once.

Different errors

If your result has an error, its result.error will be set to our standard error shape.

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 10, 2024

Examples: https://docs.anthropic.com/en/api/messages-batch-examples

Polling example shows interval of 60 seconds

When retrieving results
not sure if we want to use streaming or not...

Cancelling a batch
Cancelled batches also have partial results

Immediately after cancellation, a batch’s processing_status will be canceling. You can use the same polling for batch completion technique to poll for when cancellation is finalized as canceled batches also end up ended and may contain results.

@RyanMarten
Copy link
Contributor Author

API Reference, notable differences from openai
https://docs.anthropic.com/en/api/creating-message-batches

System prompt is a parameter not a message

Note that if you want to include a system prompt, you can use the top-level system parameter — there is no "system" role for input messages in the Messages API.

How is structured output done? Through tool use? @CharlieJCJ will provide the details based on the litellm work

@RyanMarten
Copy link
Contributor Author

We can't store metadata in the batch, so we will need to store a map of request_file to batch_id

@RyanMarten
Copy link
Contributor Author

They just increased the limits significantly for batch:

100,000 Message requests or 256 MB

https://docs.anthropic.com/en/docs/build-with-claude/message-batches#batch-limitations

@RyanMarten
Copy link
Contributor Author

instead use a single batch_objects_file (not submitted / downloaded) for each
store api_key_suffix for each batch
everything under "submitted" including canceling, validating, finalizing. is all file
we will add logic that checks these and then once they are finished, will properly process and resubmit remaining requests (e.g. in cancelled or expried)
read requests from metadata files, instead of batch object

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 19, 2024

when doing

import anthropic

client = anthropic.Anthropic()

# Stream results file in memory-efficient chunks, processing one at a time
for result in client.messages.batches.results(
    MESSAGE_BATCH_ID,
):
    print(result)

I get

MessageBatchIndividualResponse(custom_id='first-prompt-in-my-batch', result=MessageBatchSucceededResult(message=Message(id='msg_014KfxurNm3n65CGkqUNTkCk', content=[TextBlock(text="Here's a fun video game fact: The first video game Easter egg was hidden in the Atari 2600 game Adventure in 1979. Created by programmer Warren Robinett, it was a hidden room with his name that players could only access through a secret sequence of actions.", type='text')], model='claude-3-5-haiku-20241022', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=None, cache_read_input_tokens=None, input_tokens=20, output_tokens=64)), type='succeeded'))
MessageBatchIndividualResponse(custom_id='second-prompt-in-my-batch', result=MessageBatchSucceededResult(message=Message(id='msg_01DLmwptRqXuVMJsdzgR4Ntp', content=[TextBlock(text='Here\'s a fun fact: Bees can recognize human faces! Scientists have discovered that honey bees can be trained to remember and distinguish between different human facial features, despite having a brain about the size of a grass seed. They do this using a technique called "configural processing" - the same way humans process faces!', type='text')], model='claude-3-5-sonnet-20241022', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=None, cache_read_input_tokens=None, input_tokens=20, output_tokens=69)), type='succeeded'))

do

result.model_dump()

to get

{'custom_id': 'second-prompt-in-my-batch', 'result': {'message': {'id': 'msg_01DLmwptRqXuVMJsdzgR4Ntp', 'content': [{'text': 'Here\'s a fun fact: Bees can recognize human faces! Scientists have discovered that honey bees can be trained to remember and distinguish between different human facial features, despite having a brain about the size of a grass seed. They do this using a technique called "configural processing" - the same way humans process faces!', 'type': 'text'}], 'model': 'claude-3-5-sonnet-20241022', 'role': 'assistant', 'stop_reason': 'end_turn', 'stop_sequence': None, 'type': 'message', 'usage': {'cache_creation_input_tokens': None, 'cache_read_input_tokens': None, 'input_tokens': 20, 'output_tokens': 69}}, 'type': 'succeeded'}}

@RyanMarten
Copy link
Contributor Author

This is what a failed request looks like in the output

{
  "custom_id": "2",
  "result": {
    "error": {
      "error": {
        "message": "max_tokens: Field required",
        "type": "invalid_request_error",
        "details": null
      },
      "type": "error"
    },
    "type": "errored"
  }
}

This is what a successful request looks like

{
  "custom_id": "my-second-request",
  "result": {
    "type": "succeeded",
    "message": {
      "id": "msg_014VwiXbi91y3JMjcpyGBHX5",
      "type": "message",
      "role": "assistant",
      "model": "claude-3-5-sonnet-20241022",
      "content": [
        {
          "type": "text",
          "text": "Hello again! It's nice to see you. How can I assist you today? Is there anything specific you'd like to chat about or any questions you have?"
        }
      ],
      "stop_reason": "end_turn",
      "stop_sequence": null,
      "usage": {
        "input_tokens": 11,
        "output_tokens": 36
      }
    }
  }
}

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 28, 2024

Now I want to add structured output support for batch mode

While poking around I see that instructor supports batch mode

We are currently handling the json payload manually for both online and batch for openai and doing structured output via

        if generic_request.response_format:
            request["response_format"] = {
                "type": "json_schema",
                "json_schema": {
                    "name": "output_schema",  # NOTE: not using 'strict': True
                    "schema": generic_request.response_format,
                },
            }

In the instructor library here is how they define openai vs anthropic request json
https://github.com/instructor-ai/instructor/blob/main/instructor/batch.py#L127-L164

        if use_anthropic:
            _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.ANTHROPIC_JSON
            )
            with open(file_path, "w") as file:
                for messages in messages_batch:
                    # Format specifically for Anthropic batch API
                    request = {
                        "custom_id": str(uuid.uuid4()),
                        "params": {
                            "model": model,
                            "max_tokens": max_tokens,
                            "temperature": temperature,
                            "messages": messages,
                            **kwargs,
                        },
                    }
                    file.write(json.dumps(request) + "\n")
        else:
            # Existing OpenAI format
            _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.TOOLS
            )
            with open(file_path, "w") as file:
                for messages in messages_batch:
                    batch_model = BatchModel(
                        custom_id=str(uuid.uuid4()),
                        body=RequestBody(
                            model=model,
                            messages=messages,
                            max_tokens=max_tokens,
                            temperature=temperature,
                            **kwargs,
                        ),
                        method="POST",
                        url="/v1/chat/completions",
                    )
                    file.write(batch_model.model_dump_json() + "\n")

and response json
https://github.com/instructor-ai/instructor/blob/main/instructor/batch.py#L48-L72

                if "tool_calls" in data["response"]["body"]["choices"][0]["message"]:
                    # OpenAI format
                    res.append(
                        response_model(
                            **json.loads(
                                data["response"]["body"]["choices"][0]["message"][
                                    "tool_calls"
                                ][0]["function"]["arguments"]
                            )
                        )
                    )
                else:
                    # Anthropic format
                    res.append(
                        response_model(
                            **json.loads(
                                data["result"]["message"]["content"][0]["text"]
                            )
                        )
                    )

So we can use the handle_response_model function. Or do ourselves.

          # Anthropic 
           _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.ANTHROPIC_JSON
            )
            # OpenAI
            _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.TOOLS
            )

https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L735
Uses handle_anthropic_json
https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L360

Which actually does this
https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L374-L389

def handle_anthropic_json(
    response_model: type[T], new_kwargs: dict[str, Any]
) -> tuple[type[T], dict[str, Any]]:
    system_messages = extract_system_messages(new_kwargs.get("messages", []))

    if system_messages:
        new_kwargs["system"] = combine_system_messages(
            new_kwargs.get("system"), system_messages
        )

    new_kwargs["messages"] = [
        m for m in new_kwargs.get("messages", []) if m["role"] != "system"
    ]

    json_schema_message = dedent(
        f"""
        As a genius expert, your task is to understand the content and provide
        the parsed objects in json that match the following json_schema:\n

        {json.dumps(response_model.model_json_schema(), indent=2, ensure_ascii=False)}

        Make sure to return an instance of the JSON, not the schema itself
        """
    )

    new_kwargs["system"] = combine_system_messages(
        new_kwargs.get("system"), [{"type": "text", "text": json_schema_message}]
    )

    return response_model, new_kwargs

So it is just prompting in the system message with the schema.

There is also a tool use mode Mode.ANTHROPIC_TOOLS but for some reason the instructor batch cli does Mode.ANTHROPIC_JSON

https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L336

def handle_anthropic_tools(
    response_model: type[T], new_kwargs: dict[str, Any]
) -> tuple[type[T], dict[str, Any]]:
    tool_descriptions = response_model.anthropic_schema
    new_kwargs["tools"] = [tool_descriptions]
    new_kwargs["tool_choice"] = {
        "type": "tool",
        "name": response_model.__name__,
    }

    system_messages = extract_system_messages(new_kwargs.get("messages", []))

    if system_messages:
        new_kwargs["system"] = combine_system_messages(
            new_kwargs.get("system"), system_messages
        )

    new_kwargs["messages"] = [
        m for m in new_kwargs.get("messages", []) if m["role"] != "system"
    ]

    return response_model, new_kwargs

Anthropic docs on json output
https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency
https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-mode

The documentation on tool use gives the system prompt states

When you call the Anthropic API with the tools parameter, we construct a special system prompt from the tool definitions, tool configuration, and any user-specified system prompt. The constructed prompt is designed to instruct the model to use the specified tool(s) and provide the necessary context for the tool to operate properly:

In this environment you have access to a set of tools you can use to answer the user's question.
{{ FORMATTING INSTRUCTIONS }}
String and scalar parameters should be specified as is, while lists and objects should use JSON format. Note that spaces for string values are not stripped. The output is not expected to be valid XML and is parsed with regular expressions.
Here are the functions available in JSONSchema format:
{{ TOOL DEFINITIONS IN JSON SCHEMA }}
{{ USER SYSTEM PROMPT }}
{{ TOOL CONFIGURATION }}

For structured output, we would want to force tool use. They suggest using tool use when json output is needed, regardless of if it is a tool.

This is exactly what Mode.ANTHROPIC_TOOLS does
https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L340-L344

    tool_descriptions = response_model.anthropic_schema
   new_kwargs["tools"] = [tool_descriptions]
   new_kwargs["tool_choice"] = {
       "type": "tool",
       "name": response_model.__name__,
   }

@RyanMarten
Copy link
Contributor Author

https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#prefill-claudes-response

We might want to prefill the response with the start of the model json schema (as suggested in the doc above)

In pytest -s tests/test_batch.py::test_anthropic_batch_structured_output

1/5 responses failed because it has "Claude’s friendly preamble"

WARNING  bespokelabs.curator.llm.prompt_formatter:prompt_formatter.py:152 Failed to parse response as JSON: Here's a recipe for a classic Chocolate Chip Cookies dessert:

{
    "title": "Classic Chocolate Chip Cookies",
    "ingredients": [
        "2 1/4 cups all-purpose flour",
        "1 teaspoon baking soda",
        "1 teaspoon salt",
        "1 cup butter, softened",
        "3/4 cup white sugar",
        "3/4 cup brown sugar",
        "2 large eggs",
        "2 teaspoons vanilla extract",
        "2 cups chocolate chips"
    ],
    "cook_time": 12
}, skipping this response.
WARNING  bespokelabs.curator.request_processor.base_request_processor:base_request_processor.py:392 1 requests failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant