Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In a batch request each message should be in its own batchspec #171

Closed
SimonWoolf opened this issue Feb 28, 2024 · 4 comments · Fixed by #180
Closed

In a batch request each message should be in its own batchspec #171

SimonWoolf opened this issue Feb 28, 2024 · 4 comments · Fixed by #180
Assignees
Labels
bug Something isn't working. It's clear that this does need to be fixed.

Comments

@SimonWoolf
Copy link
Member

SimonWoolf commented Feb 28, 2024

The kafka connector currently uses the batch publishes API, but currently, every message sent to a single channel is sent in one batchspec, meaning we treat them atomically and send them as a single protocolmessage. (indeed, if there's only one channel, every single message in the request is in one batchspec, which means it's a pretty pointless use of the batch publish api).

This causes a problem when the total protocolmessage size gets way too big. It could stop accumulating when that size gets >64kB -- but unless it really needs atomicity, and given it's just accumulating until it gets 100 messages, it clearly doesn't, there's a much simpler fix, which is to just put each message into its own batchspec.

┆Issue is synchronized with this Jira Bug by Unito

@mattheworiordan
Copy link
Member

Whilst we could fix this, I think this is the wrong place to fix the problem. IMHO we should fix our Batch API, not make it more complicated to publish to our batch API endpoint. See https://ably.atlassian.net/browse/PMR-404. WDYT?

@SimonWoolf
Copy link
Member Author

MHO we should fix our Batch API, not make it more complicated to publish to our batch API endpoint

This is an odd framing. The batch publish api has clearly defined and well-documented semantics. There are possible changes we could make to to the api in future versions, sure, but if the kafka connector is using the current version wrongly -- which it is -- that's a bug in the kafka connector.

PMR-404 doesn't really make much in the way of concrete suggestions for how to 'fix' the current batch api (wrt atomicity behaviour), it just complains about the current behaviour, so I'm not quite clear on exactly what change you're imagining. But I've replied on that ticket.

More practically, any major change to the semantics of the batch publish api (proposing, agreeing on, making it serverside, then changing ably-java to use it, which would also need to be a new ably-java major version) would take time, and in the meantime the kafka connector would still be doing something clearly broken.

@ttypic ttypic added the bug Something isn't working. It's clear that this does need to be fixed. label Mar 6, 2024
@mattheworiordan
Copy link
Member

mattheworiordan commented Mar 6, 2024

This is an odd framing. The batch publish api has clearly defined and well-documented semantics. There are possible changes we could make to to the api in future versions, sure, but if the kafka connector is using the current version wrongly -- which it is -- that's a bug in the kafka connector.

You are right. Apologies I should have been far clearer.

For the avoidance of doubt, I agree we should fix this issue, and look at the batch API improvements separately. I recall explicitly having conversations with @subkanthi and @jaley about this requirement, and we wrote a spec ahead of the work being done, so I am honestly quite surprised the per channel limits have not been implemented. For reference, here is what was shared, which if implemented, would have meant this issue does not exist. I've checked the tests and AFAICT, this was not implemented.

MO screenshot 2024-03-06 at 14 28 02

PMR-404 doesn't really make much in the way of concrete suggestions for how to 'fix' the current batch api (wrt atomicity behaviour), it just complains about the current behaviour, so I'm not quite clear on exactly what change you're imagining. But I've replied on that ticket.

I think we need to be clear on the problem and the goals for the API before we propose a technical solution. I don't see a problem in that, but appreciate your comments on PMR-404 and I will review.

@jaley
Copy link
Contributor

jaley commented Mar 6, 2024

Hey folks!

Testing my memory slightly, but a few things feel familiar:

  • I think the reason we're currently not sending each message in its own BatchSpec is this makes offset committing on the Kafka side more problematic. If you know ably accepted a batch atomically, you can commit the offset in Kafka when that request completes successfully. What happens if you post them all separately? I don't recall, to be honest, but I think it was something along those lines.
  • The batch and total request size tracking was definitely in the original plan, but I recall it didn't make it to the final release. The initial implementation @subkanthi put together was tracking the size of JVM objects (i.e. strings are in UTF-16, etc), which wasn't quite right because it's the serialised size that matters here. Keeping track of the serialised size of all buffered messages isn't completely trivial, and I recall in conversations in conversations with @paddybyers we thought that functionality should probably live in ably-java if anywhere. Any client of the Batch API needs to respect the limits, so having some kind of BatchBuilder interface that tracks how much data you're putting into batches and requests might be nice, rather than expecting all clients to implement it. The kafka connector could then make use of that.

We shipped without in the end as time was getting on and we felt with the ability to control the number of messages per batch, it was somewhat serviceable as it is, though it was a significant annoyance. The other major outstanding limitation, by the way, was the lack of idempotency, as I think this makes it very difficult / impossible to avoid message duplication when redeploying the connector. I left a ticket here #143

Hope you're all doing well over there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. It's clear that this does need to be fixed.
Development

Successfully merging a pull request may close this issue.

4 participants