Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the result after cleaning the queue to reduce stuck transaction #448

Closed
wants to merge 1 commit into from

Conversation

isra17
Copy link

@isra17 isra17 commented Apr 25, 2024

We are seeing spikes of ResultNotFound error from time to time. We are unable to reproduce reliably, but we think it might be coming from ARQ/Redis. Our asyncio tasks are not blocked and not every job get ResultNotFound during the spikes.

So my current theory is that we have lock contention in Redis in the finish_job function. It could be possible that deleting the job_id from the queue is blocked for more than the result TTL (Which we have at 10s) then by the time the pipeline execute, the result key with already be expired.

This change simply move the result key set after we delete the job_id from the busy queue that might block on a lock. If my theory is right, we should stop seeing the ResultNotFound error.

Copy link

codecov bot commented Apr 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.47%. Comparing base (94cd878) to head (8df6e1a).
Report is 10 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #448      +/-   ##
==========================================
+ Coverage   96.27%   96.47%   +0.19%     
==========================================
  Files          11       11              
  Lines        1074     1078       +4     
  Branches      209      190      -19     
==========================================
+ Hits         1034     1040       +6     
  Misses         19       19              
+ Partials       21       19       -2     
Files Coverage Δ
arq/worker.py 97.37% <100.00%> (+0.20%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d7944b...8df6e1a. Read the comment docs.

@isra17
Copy link
Author

isra17 commented Apr 26, 2024

Nevermind, this doesn't fix it, I'm going to research a bit more into redis transaction model first ;)

@isra17 isra17 closed this Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant