Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: Consume all result pages in DynamoDbLogicalDb.batchLoad #176

Merged
merged 6 commits into from
Apr 2, 2024

Conversation

szabado-faire
Copy link
Contributor

@szabado-faire szabado-faire commented Apr 2, 2024

This fixes a somewhat subtle bug I realized tempest2 has.

From the enhanced client docs (emphasis mine):

Partial results. A single call to DynamoDb has restraints on how much data can be retrieved. If those limits are exceeded, the call yields a partial result. This may also be the case if provisional throughput is exceeded or there is an internal DynamoDb processing failure. The operation automatically retries any unprocessed keys returned from DynamoDb in subsequent calls for pages.

Tempest currently only gets the first page of results.

From the BatchGetItem docs, the reasons that partial results can be returned are:

the response size limit is exceeded, the table's provisioned throughput is exceeded, more than 1MB per partition is requested, or an internal processing failure occurs.

The end result is that tempest silently drops some results from batchLoad when these conditions occur because it only consumes the first page of the iterator.

This might also happen in DynamoDbQueryable and DynamoDbScannable as they also only consume the first page of the iterator - I haven't done enough digging there to confirm.

Seemingly this isn't the case with tempest, as the mapper it uses handles unprocessedKeys under the hood

@kyeotic
Copy link
Collaborator

kyeotic commented Apr 2, 2024

When I run this test locally I get an error

software.amazon.awssdk.services.dynamodb.model.DynamoDbException: Item size has exceeded the maximum allowed size (Service: DynamoDb, Status Code: 400, Request ID: 4e345a42-5d67-4d3c-82b5-34abe51e4288)
	at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:125)

@szabado-faire
Copy link
Contributor Author

@kyeotic not sure why a 300kB string caused problems, the max item size should be 400kB.

Dropping it to 200kB resolved that exception, and I confirmed that the test still fails if I revert the fix

@kyeotic
Copy link
Collaborator

kyeotic commented Apr 2, 2024

Looks good! Nice job @szabado-faire

@kyeotic kyeotic merged commit 95a97b2 into cashapp:main Apr 2, 2024
2 checks passed
@szabado-faire szabado-faire deleted the szabado/apr_2/consume_iterator branch April 2, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants