You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We recently faced an issue with the maximum throughput of the s3-sqs connector. Apparently, in the fastest possible case, the fetching of a message from the target SQS queue can happen once every second since:
the lowest possible value for the fetching intervals is 1
In other words, the connector can fetch only one message per second at its fastest possible rate. Therefore, if we have more than one message pushed into the SQS queue, i.e., the message generation rate is greater than 1 message per second, we end up having the queue size infinitely increasing. Besides, the utilization of the processing resources on the Spark cluster side would be considerably reduced.
I suggest changing the scheduling unit to MILLISECOND in order to resolve this issue.
The text was updated successfully, but these errors were encountered:
I am facing the same issue where source is publishing 500 files per second and those needs to be processed every minute. Spark logic is able to process these files under a minute when try to run it in batch mode but would like to go with this approach to further reduce the listing time.
Is there any way we can increase the throughput on sqs reader?
We recently faced an issue with the maximum throughput of the s3-sqs connector. Apparently, in the fastest possible case, the fetching of a message from the target SQS queue can happen once every second since:
In other words, the connector can fetch only one message per second at its fastest possible rate. Therefore, if we have more than one message pushed into the SQS queue, i.e., the message generation rate is greater than 1 message per second, we end up having the queue size infinitely increasing. Besides, the utilization of the processing resources on the Spark cluster side would be considerably reduced.
I suggest changing the scheduling unit to MILLISECOND in order to resolve this issue.
The text was updated successfully, but these errors were encountered: