Skip to content
This repository has been archived by the owner on May 7, 2024. It is now read-only.

Solution scalability #64

Open
NickAtSentry opened this issue Oct 27, 2021 · 3 comments
Open

Solution scalability #64

NickAtSentry opened this issue Oct 27, 2021 · 3 comments

Comments

@NickAtSentry
Copy link

We have encountered an issue with how this solution scales as voicemail traffic increases.

The handleRequest method of KVSProcessRecordingLambda is looping through all records returned from the stream and processing each one serially. Most of the time, the work being done in this Lambda is short-lived. Not all CTR records on the stream are voicemails that need to be processed, and often time the batch is small.

However, each time that a voicemail does get processed, that takes a significant amount of time that scales with the length of the voicemail message. Eventually under the right load conditions, this Lambda function will time out. When that happens, this solution will retry that same batch of work again and end up in a spiral, getting further behind until other action is taken.

We have engaged with AWS enterprise support on this issue and have a temporary workaround, but the correct long term solution is to separate the concerns and break apart KVSProcessRecordingLambda. One function should process a batch of work from the stream and quickly determine if additional work is needed. A separate Lambda function can then take care of the work of storing a recording and can be invoked as needed. This can be done through async Lambda invocation, or for more resiliency and processing control writing to an SQS queue that the second Lambda can pick work up from.

@dfw100
Copy link

dfw100 commented Nov 8, 2021

Hi NickAtSentry, thanks for the feedback. We have added it to the backlog, but at this point it's hard to tell when we'll get to it. We will update our changelog if it is supported in future.

Also, if you are able to resolve this, we would encourage you to send a pull request.

@dfw100 dfw100 closed this as completed Nov 8, 2021
@NickAtSentry
Copy link
Author

Why did the issue close? It remains a problem with this solution, and one that could very likely impact customers other than us. How would a customer considering this solution understand and account for scalability concerns? Is the backlog visible to customers?

@khastation khastation pinned this issue Jul 25, 2022
@khastation khastation reopened this Jul 25, 2022
@davelemons
Copy link
Contributor

Just an update that we are working on an update to this solution which will resolve this issue. No guarantees, but we are working to release an update in Q1 2024

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants