Use proper threading to encourage work completion of AMQP subscribers in a predictable manner. #95

TreyE · 2023-09-13T20:34:30Z

Underlying Issues and Justification

Currently Event Source is categorized, when it comes to AMQP, by three properties:

It is single process, but multi-threaded.
It runs all consumers for AMQP events in the same process, but in (theoretically) different threads.
It is 'greedy': it attempts to allow its consumers to process multiple messages simultaneously.

However, a problem can arise when allowing multiple consumers to perform work simultaneously without coordination in a multi-threaded environment: the system can switch the working thread during work being performed by a consumer, and there is no guarantee it will return to that message. Usually this isn't a problem under low loads for event_source, but becomes a problem when:

Event source is facing a high volume of messages
The messages are of different types, meaning multiple subscribers will not only be receiving messages, but also working to process those different types of messages simultaneously under different consumers and threads.
One type of worker performs a complex, work intensive task.

Under these circumstances, since workers are not prevented from interruption, and AMQP subscribers don't have any coordination around when work they are doing is allowed to be interrupted, a worker can be suspended while processing a work intensive task, with no promise it may ever be resumed.

This can result in:

Event Source workers beginning work they may never complete, but leaving the message in the 'unacked' state.
Process bloat, as multiple Event Source workers are interrupted while performing their work and don't finish the work - thus never releasing the memory.
Unpredictable system behaviour - if starting work doesn't promise when or how you might finish it, messages and their associated work can be processed at arbitrary, unpredictable times

The Fix

This can be fixed by marking the unit of work performed by an Event Source worker as atomic - so that it can not be interrupted.

However, certain portions of this approach must be taken into account in order not to cripple performance:

Only prevent interruption during the minimal portion of worker execution needed to ensure the unit of work is completed successfully.
Use a re-entrant synchronization primitive to avoid deadlocks.

In this case, the solution this offers is a ruby Monitor, synchronized only around the portion of the AMQP subscriber where work is actually being performed.

This ticket is tracked as: https://www.pivotaltracker.com/story/show/186036844

Caveats

Please note that while introducing a monitor to be used later, this fix does not attempt to manage or constrain the behaviour of the HTTP worker portion of Event Source. I was less certain of how that might behave in isolation and would rather exercise caution and handle that issue in a separate submission.

Use proper threading to encourage work completion of AMQP subscribers.

0016215

TreyE force-pushed the finish_what_you_started branch from 6dd2e24 to 0016215 Compare September 13, 2023 20:49

TreyE requested review from polographer and raghuramg September 20, 2023 16:40

TreyE added the bug Something isn't working label Oct 16, 2023

Merge branch 'trunk' into finish_what_you_started

27d63f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use proper threading to encourage work completion of AMQP subscribers in a predictable manner. #95

Use proper threading to encourage work completion of AMQP subscribers in a predictable manner. #95

TreyE commented Sep 13, 2023 •

edited

Loading

Use proper threading to encourage work completion of AMQP subscribers in a predictable manner. #95

Are you sure you want to change the base?

Use proper threading to encourage work completion of AMQP subscribers in a predictable manner. #95

Conversation

TreyE commented Sep 13, 2023 • edited Loading

Underlying Issues and Justification

The Fix

Caveats

TreyE commented Sep 13, 2023 •

edited

Loading