Anthropic batch mode not available #225

RyanMarten · 2024-12-07T16:00:04Z

Model information: https://docs.anthropic.com/en/docs/about-claude/models#model-comparison-table

Batch Doc: https://docs.anthropic.com/en/docs/build-with-claude/message-batches
Batch Examples: https://docs.anthropic.com/en/api/messages-batch-examples
Batch API Reference: https://docs.anthropic.com/en/api/creating-message-batches

Python SDK: https://github.com/anthropics/anthropic-sdk-python

Console batch: https://console.anthropic.com/settings/workspaces/default/batches
Console logs: https://console.anthropic.com/settings/logs
Console usage: https://console.anthropic.com/settings/usage

Discord: https://discord.com/channels/1072196207201501266/1114301961831981107
Thread about batches in discord: https://discord.com/channels/1072196207201501266/1114305823187943506/threads/1315560776374681702
Form to submit batch feedback: https://docs.google.com/forms/d/e/1FAIpQLSfNBdq0FK_WNzvcotvD0HVrFc1c-LrugFV3Rwb6QlTWNceIJQ/viewform

RyanMarten · 2024-12-10T21:32:43Z

Example (from the batch console welcome page)

import anthropic

client = anthropic.Anthropic()

message_batch = client.beta.messages.batches.create(
    requests=[
        {
            "custom_id": "first-prompt-in-my-batch",
            "params": {
                "model": "claude-3-5-haiku-20241022",
                "max_tokens": 100,
                "messages": [
                    {
                        "role": "user",
                        "content": "Hey Claude, tell me a short fun fact about video games!",
                    }
                ],
            },
        },
        {
            "custom_id": "second-prompt-in-my-batch",
            "params": {
                "model": "claude-3-5-sonnet-20241022",
                "max_tokens": 100,
                "messages": [
                    {
                        "role": "user",
                        "content": "Hey Claude, tell me a short fun fact about bees!",
                    }
                ],
            },
        },
    ]
)
print(message_batch)

Stdout

BetaMessageBatch(id='msgbatch_01XWYEcAqybHAWXqyinUyp8K', archived_at=None, cancel_initiated_at=None, created_at=datetime.datetime(2024, 12, 10, 21, 30, 23, 225753, tzinfo=datetime.timezone.utc), ended_at=None, expires_at=datetime.datetime(2024, 12, 11, 21, 30, 23, 225753, tzinfo=datetime.timezone.utc), processing_status='in_progress', request_counts=BetaMessageBatchRequestCounts(canceled=0, errored=0, expired=0, processing=2, succeeded=0), results_url=None, type='message_batch'

Batch Output

{"custom_id":"first-prompt-in-my-batch","result":{"type":"succeeded","message":{"id":"msg_014KfxurNm3n65CGkqUNTkCk","type":"message","role":"assistant","model":"claude-3-5-haiku-20241022","content":[{"type":"text","text":"Here's a fun video game fact: The first video game Easter egg was hidden in the Atari 2600 game Adventure in 1979. Created by programmer Warren Robinett, it was a hidden room with his name that players could only access through a secret sequence of actions."}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":20,"output_tokens":64}}}}
{"custom_id":"second-prompt-in-my-batch","result":{"type":"succeeded","message":{"id":"msg_01DLmwptRqXuVMJsdzgR4Ntp","type":"message","role":"assistant","model":"claude-3-5-sonnet-20241022","content":[{"type":"text","text":"Here's a fun fact: Bees can recognize human faces! Scientists have discovered that honey bees can be trained to remember and distinguish between different human facial features, despite having a brain about the size of a grass seed. They do this using a technique called \"configural processing\" - the same way humans process faces!"}],"stop_reason":"end_turn","stop_sequence":null,"usage":{"input_tokens":20,"output_tokens":69}}}}

RyanMarten · 2024-12-10T21:44:15Z

Details / meaningful differences from openai

https://docs.anthropic.com/en/docs/build-with-claude/message-batches

Different limits

A Message Batch is limited to either 10,000 Message requests or 32 MB in size, whichever is reached first.

List instead of file content

A unique custom_id for identifying the Messages request
A params object with the standard Messages API parameters
You can create a batch by passing this list into the requests parameter:

Different batch statuses

When a batch is first created, the response will have a processing status of in_progress. updated to ended once all the requests in the batch have finished processing, and results are ready.
in_progress, canceling, ended

Different request statuses

Once batch processing has ended, each Messages request in the batch will have a result. There are 4 result types: succeeded, errored, cancelled, expired. request_counts, which shows how many requests reached each of these four states.

Recommend streaming finished requests instead of downloading all of them

Results of the batch are available for download both in the Console and at the results_url on the Message Batch. Because of the potentially large size of the results, it’s recommended to stream results back rather than download them all at once.

Different errors

If your result has an error, its result.error will be set to our standard error shape.

RyanMarten · 2024-12-10T21:48:20Z

Examples: https://docs.anthropic.com/en/api/messages-batch-examples

Polling example shows interval of 60 seconds

When retrieving results
not sure if we want to use streaming or not...

Cancelling a batch
Cancelled batches also have partial results

Immediately after cancellation, a batch’s processing_status will be canceling. You can use the same polling for batch completion technique to poll for when cancellation is finalized as canceled batches also end up ended and may contain results.

RyanMarten · 2024-12-10T22:01:27Z

API Reference, notable differences from openai
https://docs.anthropic.com/en/api/creating-message-batches

System prompt is a parameter not a message

Note that if you want to include a system prompt, you can use the top-level system parameter — there is no "system" role for input messages in the Messages API.

How is structured output done? Through tool use? @CharlieJCJ will provide the details based on the litellm work

RyanMarten · 2024-12-10T22:36:12Z

We can't store metadata in the batch, so we will need to store a map of request_file to batch_id

RyanMarten · 2024-12-18T02:21:54Z

They just increased the limits significantly for batch:

100,000 Message requests or 256 MB

https://docs.anthropic.com/en/docs/build-with-claude/message-batches#batch-limitations

RyanMarten · 2024-12-18T22:42:58Z

instead use a single batch_objects_file (not submitted / downloaded) for each
store api_key_suffix for each batch
everything under "submitted" including canceling, validating, finalizing. is all file
we will add logic that checks these and then once they are finished, will properly process and resubmit remaining requests (e.g. in cancelled or expried)
read requests from metadata files, instead of batch object

RyanMarten · 2024-12-19T02:54:13Z

when doing

import anthropic

client = anthropic.Anthropic()

# Stream results file in memory-efficient chunks, processing one at a time
for result in client.messages.batches.results(
    MESSAGE_BATCH_ID,
):
    print(result)

I get

MessageBatchIndividualResponse(custom_id='first-prompt-in-my-batch', result=MessageBatchSucceededResult(message=Message(id='msg_014KfxurNm3n65CGkqUNTkCk', content=[TextBlock(text="Here's a fun video game fact: The first video game Easter egg was hidden in the Atari 2600 game Adventure in 1979. Created by programmer Warren Robinett, it was a hidden room with his name that players could only access through a secret sequence of actions.", type='text')], model='claude-3-5-haiku-20241022', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=None, cache_read_input_tokens=None, input_tokens=20, output_tokens=64)), type='succeeded'))
MessageBatchIndividualResponse(custom_id='second-prompt-in-my-batch', result=MessageBatchSucceededResult(message=Message(id='msg_01DLmwptRqXuVMJsdzgR4Ntp', content=[TextBlock(text='Here\'s a fun fact: Bees can recognize human faces! Scientists have discovered that honey bees can be trained to remember and distinguish between different human facial features, despite having a brain about the size of a grass seed. They do this using a technique called "configural processing" - the same way humans process faces!', type='text')], model='claude-3-5-sonnet-20241022', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(cache_creation_input_tokens=None, cache_read_input_tokens=None, input_tokens=20, output_tokens=69)), type='succeeded'))

do

result.model_dump()

to get

{'custom_id': 'second-prompt-in-my-batch', 'result': {'message': {'id': 'msg_01DLmwptRqXuVMJsdzgR4Ntp', 'content': [{'text': 'Here\'s a fun fact: Bees can recognize human faces! Scientists have discovered that honey bees can be trained to remember and distinguish between different human facial features, despite having a brain about the size of a grass seed. They do this using a technique called "configural processing" - the same way humans process faces!', 'type': 'text'}], 'model': 'claude-3-5-sonnet-20241022', 'role': 'assistant', 'stop_reason': 'end_turn', 'stop_sequence': None, 'type': 'message', 'usage': {'cache_creation_input_tokens': None, 'cache_read_input_tokens': None, 'input_tokens': 20, 'output_tokens': 69}}, 'type': 'succeeded'}}

RyanMarten · 2024-12-20T06:11:58Z

This is what a failed request looks like in the output

{
  "custom_id": "2",
  "result": {
    "error": {
      "error": {
        "message": "max_tokens: Field required",
        "type": "invalid_request_error",
        "details": null
      },
      "type": "error"
    },
    "type": "errored"
  }
}

This is what a successful request looks like

{
  "custom_id": "my-second-request",
  "result": {
    "type": "succeeded",
    "message": {
      "id": "msg_014VwiXbi91y3JMjcpyGBHX5",
      "type": "message",
      "role": "assistant",
      "model": "claude-3-5-sonnet-20241022",
      "content": [
        {
          "type": "text",
          "text": "Hello again! It's nice to see you. How can I assist you today? Is there anything specific you'd like to chat about or any questions you have?"
        }
      ],
      "stop_reason": "end_turn",
      "stop_sequence": null,
      "usage": {
        "input_tokens": 11,
        "output_tokens": 36
      }
    }
  }
}

RyanMarten · 2024-12-28T16:51:35Z

Now I want to add structured output support for batch mode

While poking around I see that instructor supports batch mode

We are currently handling the json payload manually for both online and batch for openai and doing structured output via

        if generic_request.response_format:
            request["response_format"] = {
                "type": "json_schema",
                "json_schema": {
                    "name": "output_schema",  # NOTE: not using 'strict': True
                    "schema": generic_request.response_format,
                },
            }

In the instructor library here is how they define openai vs anthropic request json
https://github.com/instructor-ai/instructor/blob/main/instructor/batch.py#L127-L164

        if use_anthropic:
            _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.ANTHROPIC_JSON
            )
            with open(file_path, "w") as file:
                for messages in messages_batch:
                    # Format specifically for Anthropic batch API
                    request = {
                        "custom_id": str(uuid.uuid4()),
                        "params": {
                            "model": model,
                            "max_tokens": max_tokens,
                            "temperature": temperature,
                            "messages": messages,
                            **kwargs,
                        },
                    }
                    file.write(json.dumps(request) + "\n")
        else:
            # Existing OpenAI format
            _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.TOOLS
            )
            with open(file_path, "w") as file:
                for messages in messages_batch:
                    batch_model = BatchModel(
                        custom_id=str(uuid.uuid4()),
                        body=RequestBody(
                            model=model,
                            messages=messages,
                            max_tokens=max_tokens,
                            temperature=temperature,
                            **kwargs,
                        ),
                        method="POST",
                        url="/v1/chat/completions",
                    )
                    file.write(batch_model.model_dump_json() + "\n")

and response json
https://github.com/instructor-ai/instructor/blob/main/instructor/batch.py#L48-L72

                if "tool_calls" in data["response"]["body"]["choices"][0]["message"]:
                    # OpenAI format
                    res.append(
                        response_model(
                            **json.loads(
                                data["response"]["body"]["choices"][0]["message"][
                                    "tool_calls"
                                ][0]["function"]["arguments"]
                            )
                        )
                    )
                else:
                    # Anthropic format
                    res.append(
                        response_model(
                            **json.loads(
                                data["result"]["message"]["content"][0]["text"]
                            )
                        )
                    )

So we can use the handle_response_model function. Or do ourselves.

          # Anthropic 
           _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.ANTHROPIC_JSON
            )
            # OpenAI
            _, kwargs = handle_response_model(
                response_model=response_model, mode=instructor.Mode.TOOLS
            )

https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L735
Uses handle_anthropic_json
https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L360

Which actually does this
https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L374-L389

def handle_anthropic_json(
    response_model: type[T], new_kwargs: dict[str, Any]
) -> tuple[type[T], dict[str, Any]]:
    system_messages = extract_system_messages(new_kwargs.get("messages", []))

    if system_messages:
        new_kwargs["system"] = combine_system_messages(
            new_kwargs.get("system"), system_messages
        )

    new_kwargs["messages"] = [
        m for m in new_kwargs.get("messages", []) if m["role"] != "system"
    ]

    json_schema_message = dedent(
        f"""
        As a genius expert, your task is to understand the content and provide
        the parsed objects in json that match the following json_schema:\n

        {json.dumps(response_model.model_json_schema(), indent=2, ensure_ascii=False)}

        Make sure to return an instance of the JSON, not the schema itself
        """
    )

    new_kwargs["system"] = combine_system_messages(
        new_kwargs.get("system"), [{"type": "text", "text": json_schema_message}]
    )

    return response_model, new_kwargs

So it is just prompting in the system message with the schema.

There is also a tool use mode Mode.ANTHROPIC_TOOLS but for some reason the instructor batch cli does Mode.ANTHROPIC_JSON

https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L336

def handle_anthropic_tools(
    response_model: type[T], new_kwargs: dict[str, Any]
) -> tuple[type[T], dict[str, Any]]:
    tool_descriptions = response_model.anthropic_schema
    new_kwargs["tools"] = [tool_descriptions]
    new_kwargs["tool_choice"] = {
        "type": "tool",
        "name": response_model.__name__,
    }

    system_messages = extract_system_messages(new_kwargs.get("messages", []))

    if system_messages:
        new_kwargs["system"] = combine_system_messages(
            new_kwargs.get("system"), system_messages
        )

    new_kwargs["messages"] = [
        m for m in new_kwargs.get("messages", []) if m["role"] != "system"
    ]

    return response_model, new_kwargs

Anthropic docs on json output
https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency
https://docs.anthropic.com/en/docs/build-with-claude/tool-use#json-mode

The documentation on tool use gives the system prompt states

When you call the Anthropic API with the tools parameter, we construct a special system prompt from the tool definitions, tool configuration, and any user-specified system prompt. The constructed prompt is designed to instruct the model to use the specified tool(s) and provide the necessary context for the tool to operate properly:
In this environment you have access to a set of tools you can use to answer the user's question.
{{ FORMATTING INSTRUCTIONS }}
String and scalar parameters should be specified as is, while lists and objects should use JSON format. Note that spaces for string values are not stripped. The output is not expected to be valid XML and is parsed with regular expressions.
Here are the functions available in JSONSchema format:
{{ TOOL DEFINITIONS IN JSON SCHEMA }}
{{ USER SYSTEM PROMPT }}
{{ TOOL CONFIGURATION }}

For structured output, we would want to force tool use. They suggest using tool use when json output is needed, regardless of if it is a tool.

This is exactly what Mode.ANTHROPIC_TOOLS does
https://github.com/instructor-ai/instructor/blob/main/instructor/process_response.py#L340-L344

    tool_descriptions = response_model.anthropic_schema
   new_kwargs["tools"] = [tool_descriptions]
   new_kwargs["tool_choice"] = {
       "type": "tool",
       "name": response_model.__name__,
   }

RyanMarten · 2024-12-29T01:03:09Z

https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/increase-consistency#prefill-claudes-response

We might want to prefill the response with the start of the model json schema (as suggested in the doc above)

In pytest -s tests/test_batch.py::test_anthropic_batch_structured_output

1/5 responses failed because it has "Claude’s friendly preamble"

WARNING  bespokelabs.curator.llm.prompt_formatter:prompt_formatter.py:152 Failed to parse response as JSON: Here's a recipe for a classic Chocolate Chip Cookies dessert:

{
    "title": "Classic Chocolate Chip Cookies",
    "ingredients": [
        "2 1/4 cups all-purpose flour",
        "1 teaspoon baking soda",
        "1 teaspoon salt",
        "1 cup butter, softened",
        "3/4 cup white sugar",
        "3/4 cup brown sugar",
        "2 large eggs",
        "2 teaspoons vanilla extract",
        "2 cups chocolate chips"
    ],
    "cook_time": 12
}, skipping this response.
WARNING  bespokelabs.curator.request_processor.base_request_processor:base_request_processor.py:392 1 requests failed.

RyanMarten self-assigned this Dec 7, 2024

RyanMarten mentioned this issue Dec 10, 2024

Add Anthropic batch and general refactor #243

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic batch mode not available #225

Anthropic batch mode not available #225

RyanMarten commented Dec 7, 2024 •

edited

Loading

RyanMarten commented Dec 10, 2024

RyanMarten commented Dec 10, 2024 •

edited

Loading

RyanMarten commented Dec 10, 2024 •

edited

Loading

RyanMarten commented Dec 10, 2024

RyanMarten commented Dec 10, 2024

RyanMarten commented Dec 18, 2024

RyanMarten commented Dec 18, 2024

RyanMarten commented Dec 19, 2024 •

edited

Loading

RyanMarten commented Dec 20, 2024

RyanMarten commented Dec 28, 2024 •

edited

Loading

RyanMarten commented Dec 29, 2024

Anthropic batch mode not available #225

Anthropic batch mode not available #225

Comments

RyanMarten commented Dec 7, 2024 • edited Loading

RyanMarten commented Dec 10, 2024

RyanMarten commented Dec 10, 2024 • edited Loading

RyanMarten commented Dec 10, 2024 • edited Loading

RyanMarten commented Dec 10, 2024

RyanMarten commented Dec 10, 2024

RyanMarten commented Dec 18, 2024

RyanMarten commented Dec 18, 2024

RyanMarten commented Dec 19, 2024 • edited Loading

RyanMarten commented Dec 20, 2024

RyanMarten commented Dec 28, 2024 • edited Loading

RyanMarten commented Dec 29, 2024

RyanMarten commented Dec 7, 2024 •

edited

Loading

RyanMarten commented Dec 10, 2024 •

edited

Loading

RyanMarten commented Dec 10, 2024 •

edited

Loading

RyanMarten commented Dec 19, 2024 •

edited

Loading

RyanMarten commented Dec 28, 2024 •

edited

Loading