KREST-2746 Reflect slow responses from Kafka back to the http client #1043

ehumber · 2022-07-18T13:16:47Z

This PR adds code to reflect Kafka rate limiting back to the http client.

While the produce api response from Kafka does indicate that the client is being throttled to the Kafka Produce java client, this information is not exposed to the end user of that client (ie Kafka REST), so we have to infer the throttling another way.

Kafka throttles by delaying its response back to the client. So we can assume that if we are getting a growing backlog of requests waiting to be sent to kafka then kafka is throttling REST.

When this happens (or we have a backlog of calls to Kafka for some other reason) we are obliged to send responses back to kafka in the same order as the requests arrived with us, so all we can do is add 429 responses, once we hit the "throttle from this point "queue depth, to the end of the response queue. The requests for these are not sent to kafka, and so should reduce the traffic reaching kafka, possibly enough to bring it back under the throttle limit.

If the queue depth doesn't shrink sufficiently, and instead grows to the max limit, then the connection is chopped after the grace period in my previous (disconnect clients nicely) PR has expired.

ehumber · 2022-07-18T13:17:58Z

@dimitarndimitrov The failing unit test I mentioned is testWriteToChunkedOutputAfterTimeout. (although the new 429 test fails in the same way too, I've not been testing that one)

ehumber · 2022-07-18T13:42:19Z

@dimitarndimitrov ignore the first commit, I'd not tidied up properly. This one shows the hang behaviour from easymock where the close on the mappingIterator never returns:

logs look like

** has next delayed response
THREAD EXECUTOR TRIGGERSclass io.confluent.kafkarest.response.StreamingResponse$ComposingStreamingResponse
THREAD EXECUTOR closeAll from thread executor call through to a sub close class io.confluent.kafkarest.response.StreamingResponse$ComposingStreamingResponse
CLOSE could be from either THREAD or FINALLY ->  calls through to inputStreaming closeclass io.confluent.kafkarest.response.StreamingResponse$ComposingStreamingResponse
CLOSE could be either from THREAD or FINALLY -> calls through to inputStreaming closeclass io.confluent.kafkarest.response.StreamingResponse$InputStreamingResponse
CLOSE in json stream which calls mapping iterator close
delegate not nullclass com.fasterxml.jackson.databind.MappingIterator$$EnhancerByCGLIB$$4a178d4f

Process finished with exit code 130 (interrupted by signal 2: SIGINT)

ehumber · 2022-07-18T13:55:18Z

and with the debugger running it looks like this, and passes the test.

There is a difference (after I said there wasn't) :)

the debug line writing to sink.

This only writes out if I have a breakpoint in the async writing to sink method (eg line 471), otherwise I see the same behaviour with and without the debugger

THREAD EXECUTOR TRIGGERSclass io.confluent.kafkarest.response.StreamingResponse$ComposingStreamingResponse
THREAD EXECUTOR closeAll from thread executor call through to a sub close class io.confluent.kafkarest.response.StreamingResponse$ComposingStreamingResponse
CLOSE could be from either THREAD or FINALLY ->  calls through to inputStreaming closeclass io.confluent.kafkarest.response.StreamingResponse$ComposingStreamingResponse
CLOSE could be either from THREAD or FINALLY -> calls through to inputStreaming closeclass io.confluent.kafkarest.response.StreamingResponse$InputStreamingResponse
[2022-07-18 14:51:09,785] DEBUG Writing to sink (io.confluent.kafkarest.response.StreamingResponse:477)
CLOSE in json stream which calls mapping iterator close
delegate not nullclass com.fasterxml.jackson.databind.MappingIterator$$EnhancerByCGLIB$$4a178d4f
delegate now closed
THREAD EXECUTOR closeAll from thread executor calling responsequeue.close
CLOSE response queue from FINALLY or THREAD class io.confluent.kafkarest.response.StreamingResponse$AsyncResponseQueue

So that's a nice big clue I'm going to go investigate :)

ehumber · 2022-10-14T10:18:57Z

@dimitarndimitrov @AndrewJSchofield This PR could do with merging before mid-November if possible, so that we can deal with kafka back pressure before increasing (or removing) any byte based rate limits for produce.

It would be great if I could get some feedback by the end of October.

ehumber · 2022-10-18T10:31:27Z

kafka-rest/src/main/java/io/confluent/kafkarest/response/StreamingResponse.java

@@ -196,12 +216,24 @@ public final void resume(AsyncResponse asyncResponse) {
          CompletableFuture.completedFuture(
              ResultOrError.error(EXCEPTION_MAPPER.toErrorResponse(e))));
    } finally {
+      // if there are still outstanding response to send back, for example hasNext has returned


Remove the MAX_CLOSE_RETRIES

ehumber · 2022-10-18T10:40:21Z

kafka-rest/src/main/java/io/confluent/kafkarest/response/StreamingResponse.java

+  private void triggerDelayedClose(
+      ScheduledExecutorService executorService, AsyncResponseQueue responseQueue) {
+    if (executorService == null) {
+      executorService = Executors.newSingleThreadScheduledExecutor();


This is null, so it's not an object reference that we can overwrite/edit at this point.

Need to eg return the executor service and set the original object to that value

ehumber · 2022-10-18T10:46:13Z

kafka-rest/src/main/java/io/confluent/kafkarest/response/StreamingResponse.java

@@ -347,14 +438,16 @@ private void asyncResume(AsyncResponse asyncResponse) {
      asyncResponse.resume(Response.ok(sink).build());
    }

-    private volatile boolean sinkClosed = false;
+    volatile boolean sinkClosed = false;


Why is this no longer private?

ehumber · 2022-10-18T10:46:30Z

kafka-rest/src/main/java/io/confluent/kafkarest/response/StreamingResponse.java

-    private volatile boolean sinkClosed = false;
+    volatile boolean sinkClosed = false;
+
+    private volatile AtomicInteger queueDepth = new AtomicInteger(0);


Shouldn't be volatile

ehumber · 2022-10-18T11:03:08Z

kafka-rest/src/test/java/io/confluent/kafkarest/resources/v3/ProduceActionTest.java

@@ -839,7 +841,7 @@ private static ProduceAction getProduceAction(
    replay(produceControllerProvider, produceController);

    StreamingResponseFactory streamingResponseFactory =
-        new StreamingResponseFactory(chunkedOutputFactory, FIVE_SECONDS_MS, FIVE_SECONDS_MS);
+        new StreamingResponseFactory(chunkedOutputFactory, FIVE_SECONDS_MS, FIVE_MS, DEPTH, DEPTH);


Rename these to be eg DURATION or GRACE_DURATION etc, need to go look up the values atm

ehumber · 2022-10-18T11:14:00Z

kafka-rest/src/test/java/io/confluent/kafkarest/response/StreamingResponseTest.java

+    // no third message
+    expect(requestsMappingIterator.hasNext()).andReturn(false);
+
+    requestsMappingIterator.close(); // call from thread executor


Can't do an expect with something that returns a void, need to do the playback of the calls one by one :'(

Add a comment explaining this, because this looks bonkers.

ehumber · 2022-10-18T11:18:12Z

kafka-rest/src/test/java/io/confluent/kafkarest/response/StreamingResponseTest.java

    expect(mockedChunkedOutput.isClosed()).andReturn(false);
+    mockedChunkedOutput.write(sucessResult);


Two c's in success.

AndrewJSchofield

I have reviewed the code with @ehumber and am happy to approve. I do not think this is a sensible way to handle streamed REST produce requests which overrun the KafkaProducer request limits, but this does at least prevent heap exhaustion when it happens. I think there's a significant refactor due in this code at some point.

ehumber · 2023-01-06T14:56:35Z

Sorry I didn't manage to get this tidied up in time :(.

I made a PR from a branch in confluentinc/kafka-rest, so if my original branch from my fork goes missing when I leave, hopefully you still have the code

#1098

For this PR I recommend you don't merge it until you need to. That will most probably be when you look at rate limiting around REST produce, and potentially remove it in REST and rely on backpressure from Kafka to keep the requests at a sensible limit.

KREST-2746 Reflect slow responses from Kafka back to the http client

d3ef908

ehumber force-pushed the KREST-2746-reflect-kafka-slow-back-to-clients branch from baaf2ea to d3ef908 Compare July 18, 2022 13:40

ehumber force-pushed the KREST-2746-reflect-kafka-slow-back-to-clients branch 5 times, most recently from 700f956 to 3c4fe45 Compare July 19, 2022 21:33

KREST-2746 passing unit tests

9a95641

ehumber force-pushed the KREST-2746-reflect-kafka-slow-back-to-clients branch from 3c4fe45 to 9a95641 Compare July 19, 2022 21:41

ehumber marked this pull request as ready for review July 19, 2022 21:50

ehumber requested a review from a team as a code owner July 19, 2022 21:50

ehumber requested a review from dimitarndimitrov July 19, 2022 21:50

ehumber commented Oct 18, 2022

View reviewed changes

AndrewJSchofield approved these changes Oct 18, 2022

View reviewed changes

ehumber mentioned this pull request Jan 6, 2023

Krest 2746 reflect kafka slow back to clients #1098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KREST-2746 Reflect slow responses from Kafka back to the http client #1043

KREST-2746 Reflect slow responses from Kafka back to the http client #1043

ehumber commented Jul 18, 2022 •

edited

Loading

ehumber commented Jul 18, 2022

ehumber commented Jul 18, 2022

ehumber commented Jul 18, 2022

ehumber commented Oct 14, 2022

ehumber Oct 18, 2022

ehumber Oct 18, 2022

ehumber Oct 18, 2022

ehumber Oct 18, 2022

ehumber Oct 18, 2022

ehumber Oct 18, 2022

ehumber Oct 18, 2022

AndrewJSchofield left a comment

ehumber commented Jan 6, 2023

		expect(mockedChunkedOutput.isClosed()).andReturn(false);
		mockedChunkedOutput.write(sucessResult);

KREST-2746 Reflect slow responses from Kafka back to the http client #1043

Are you sure you want to change the base?

KREST-2746 Reflect slow responses from Kafka back to the http client #1043

Conversation

ehumber commented Jul 18, 2022 • edited Loading

ehumber commented Jul 18, 2022

ehumber commented Jul 18, 2022

ehumber commented Jul 18, 2022

ehumber commented Oct 14, 2022

ehumber Oct 18, 2022

Choose a reason for hiding this comment

ehumber Oct 18, 2022

Choose a reason for hiding this comment

ehumber Oct 18, 2022

Choose a reason for hiding this comment

ehumber Oct 18, 2022

Choose a reason for hiding this comment

ehumber Oct 18, 2022

Choose a reason for hiding this comment

ehumber Oct 18, 2022

Choose a reason for hiding this comment

ehumber Oct 18, 2022

Choose a reason for hiding this comment

AndrewJSchofield left a comment

Choose a reason for hiding this comment

ehumber commented Jan 6, 2023

ehumber commented Jul 18, 2022 •

edited

Loading