Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include usage in streaming response now possible #55

Closed
ionflow opened this issue May 9, 2024 · 2 comments
Closed

Include usage in streaming response now possible #55

ionflow opened this issue May 9, 2024 · 2 comments

Comments

@ionflow
Copy link

ionflow commented May 9, 2024

It would be great to enable this new API feature in zod-stream.

stream_options: {"include_usage": true}

Currently if you include these options in a request will get a TypeError: Cannot read properties of undefined (reading 'delta')

  const response = await client.chat.completions.create({
    max_tokens: maxTokens || undefined,
    messages,
    model: model || 'gpt-4-turbo',
    response_format: { type: json ? 'json_object' : 'text' },
    response_model: {
      name: responseModel?.name,
      schema: responseModel?.schema,
    },
    stream: stream || false,
    stream_options: stream ? { include_usage: true } : undefined,
    temperature: temperature || 0,
    tool_choice: tools ? 'auto' : undefined,
    tools: tools || undefined,
  });
@roodboi
Copy link
Contributor

roodboi commented May 9, 2024

oh so sick!

Ill get it up asap

@roodboi
Copy link
Contributor

roodboi commented May 17, 2024

ok I added a small bit of missing defense to handle this error here:

c22f0a8#diff-a940976bb69b8ae2a687d8f68c6a81385384c55d244a875e020db391b89c1cbfR65

I tried a few different approaches to see if there was a reasonable way to handle reading usage given a flag in zod-stream - but just because of the nature of what we do in here - it was challenging to find a good way to include in here directly.

I have a PR up in instructor that will enable including it - but the approach is just teeing the stream and reading the usage separately so as to not interfere or add any additional work in the JSON parsing flow

instructor-ai/instructor-js#176

Thanks for bubbling this up I had no idea they even released it!

ps the approach I'm taking in instructor looks like:

const completionParams = withResponseModel({
    params: {
      ...params,
      stream: true
    } as OpenAI.ChatCompletionCreateParams,
    response_model,
    mode: this.mode
  })

  const streamClient = new ZodStream({})

  async function checkForUsage(reader: Stream<OpenAI.ChatCompletionChunk>) {
    for await (const chunk of reader) {
      if ("usage" in chunk) {
        streamUsage = chunk.usage as CompletionMeta["usage"]
      }
    }
  }

  let streamUsage: CompletionMeta["usage"] | undefined

  const structuredStream = await streamClient.create({
    completionPromise: async () => {
        const completion = await this.client.chat.completions.create(
          {
            ...completionParams,
            stream: true
          },
          requestOptions
        )

        if (
          completionParams?.stream &&
          "stream_options" in completionParams &&
          completion instanceof Stream
        ) {
          const [completion1, completion2] = completion.tee()

          checkForUsage(completion1)

          return OAIStream({
            res: completion2
          })
        }

        return OAIStream({
          res: completion as unknown as AsyncIterable<OpenAI.ChatCompletionChunk>
        })
    },
    response_model
  })

  for await (const chunk of structuredStream) {
    yield {
      ...chunk,
      _meta: {
        usage: streamUsage ?? undefined,
        ...(chunk?._meta ?? {})
      }
    }
  }

@roodboi roodboi closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants