Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bedrock Embeddings ThrottlingException #2905

Closed
Jtewen opened this issue Oct 13, 2023 · 3 comments · Fixed by #2908
Closed

Bedrock Embeddings ThrottlingException #2905

Jtewen opened this issue Oct 13, 2023 · 3 comments · Fixed by #2908
Assignees
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@Jtewen
Copy link

Jtewen commented Oct 13, 2023

When attempting to generate a vector store using the fromDocuments method on approximately 10,000 documents (specifically on the HNSWLib class) , it reaches the rate limit quickly. Most likely due to the BedrockEmbeddings.embedDocuments method just being a simple Promise.all on every document at once.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 13, 2023
@dosubot
Copy link

dosubot bot commented Oct 13, 2023

🤖

Hello,

I understand you're encountering a ThrottlingException when trying to generate a vector store using the fromDocuments method on the HNSWLib class for a large number of documents. This is likely due to the BedrockEmbeddings.embedDocuments method running all promises at once.

A similar issue was encountered and resolved in the past with the OpenAIEmbedding. The solution was to add concurrency settings to control the number of promises that can be pending at the same time. This was implemented in version 0.0.130 of langchainjs. You can find more details about this solution in the issue #2140 and the update #2294.

To apply a similar solution to your case, you can modify the fromDocuments method to limit the number of concurrent promises. Here's how you can do it:

First, install the 'bluebird' library:

npm install bluebird

Then, modify the 'fromDocuments' method in the 'HNSWLib' class:

import * as Bluebird from 'bluebird';

// ...

static async fromDocuments(docs: Document[], embeddings: Embeddings, concurrency: number = 1000): Promise<HNSWLib> {
    const vectorStore = new HNSWLib();
    await Bluebird.map(docs, async (doc) => {
        const vector = await embeddings.embedDocument(doc);
        vectorStore.add(doc.id, vector);
    }, {concurrency});
    return vectorStore;
}

In this code, 'Bluebird.map' is used to process the documents in parallel but with a limit on the number of concurrent operations. The 'concurrency' parameter controls the number of promises that can be pending at the same time. You can adjust this value based on your rate limit.

Finally, use the modified 'fromDocuments' method:

const vectorStore = await HNSWLib.fromDocuments(docs, new OpenAIEmbeddings(), 500);

In this example, the concurrency level is set to 500. This means that at most 500 documents will be processed at the same time.

As for the current rate limit for the 'BedrockEmbeddings.embedDocuments' method in the langchainjs codebase, I wasn't able to find this information in the repository. You might want to check the Bedrock Embeddings documentation or contact their support for this information.

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@jacoblee93
Copy link
Collaborator

I'll look into fixing this to better use concurrency, thank you for opening this!

@jacoblee93 jacoblee93 self-assigned this Oct 13, 2023
@bracesproul
Copy link
Member

@Jtewen fix should be merged soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants