Maximum Throughput Limited to 1 Message per Second #9

arashta · 2021-03-18T11:54:36Z

We recently faced an issue with the maximum throughput of the s3-sqs connector. Apparently, in the fastest possible case, the fetching of a message from the target SQS queue can happen once every second since:

the scheduling unit of the SQS fetch job is SECOND accorindg to the code, and
the lowest possible value for the fetching intervals is 1

In other words, the connector can fetch only one message per second at its fastest possible rate. Therefore, if we have more than one message pushed into the SQS queue, i.e., the message generation rate is greater than 1 message per second, we end up having the queue size infinitely increasing. Besides, the utilization of the processing resources on the Spark cluster side would be considerably reduced.

I suggest changing the scheduling unit to MILLISECOND in order to resolve this issue.

pnain · 2022-09-20T20:58:10Z

I am facing the same issue where source is publishing 500 files per second and those needs to be processed every minute. Spark logic is able to process these files under a minute when try to run it in batch mode but would like to go with this approach to further reduce the listing time.
Is there any way we can increase the throughput on sqs reader?

tavakara mentioned this issue Mar 28, 2021

Fix max throughput issue #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum Throughput Limited to 1 Message per Second #9

Maximum Throughput Limited to 1 Message per Second #9

arashta commented Mar 18, 2021

pnain commented Sep 20, 2022

Maximum Throughput Limited to 1 Message per Second #9

Maximum Throughput Limited to 1 Message per Second #9

Comments

arashta commented Mar 18, 2021

pnain commented Sep 20, 2022