Set the result after cleaning the queue to reduce stuck transaction #448
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We are seeing spikes of
ResultNotFound
error from time to time. We are unable to reproduce reliably, but we think it might be coming from ARQ/Redis. Our asyncio tasks are not blocked and not every job getResultNotFound
during the spikes.So my current theory is that we have lock contention in Redis in the
finish_job
function. It could be possible that deleting thejob_id
from the queue is blocked for more than the result TTL (Which we have at 10s) then by the time the pipeline execute, the result key with already be expired.This change simply move the result key set after we delete the
job_id
from the busy queue that might block on a lock. If my theory is right, we should stop seeing theResultNotFound
error.