-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[24.0] More efficient change_state queries, maybe fix deadlock #17632
[24.0] More efficient change_state queries, maybe fix deadlock #17632
Conversation
Here's the deadlock: ``` Traceback (most recent call last): File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context self.dialect.do_execute( File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) psycopg2.errors.DeadlockDetected: deadlock detected DETAIL: Process 317 waits for ShareLock on transaction 1057; blocked by process 318. Process 318 waits for ShareLock on transaction 1056; blocked by process 317. HINT: See server log for query details. CONTEXT: while updating tuple (0,7) in relation "dataset" The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/jobs/runners/__init__.py", line 203, in put queue_job = job_wrapper.enqueue() File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/jobs/__init__.py", line 1589, in enqueue self.change_state(model.Job.states.QUEUED, flush=False, job=job) File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/jobs/__init__.py", line 1547, in change_state job.update_output_states(self.app.application_stack.supports_skip_locked()) File "/home/runner/work/galaxy/galaxy/galaxy root/lib/galaxy/model/__init__.py", line 2053, in update_output_states sa_session.execute(statement, params) File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/orm/session.py", line 1717, in execute result = conn._execute_20(statement, params or {}, execution_options) File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1710, in _execute_20 return meth(self, args_10style, kwargs_10style, execution_options) File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/sql/elements.py", line 334, in _execute_on_connection return connection._execute_clauseelement( File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement ret = self._execute_context( File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context self._handle_dbapi_exception( File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception util.raise_( File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_ raise exception File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context self.dialect.do_execute( File "/home/runner/work/galaxy/galaxy/galaxy root/.venv/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute cursor.execute(statement, parameters) sqlalchemy.exc.OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected DETAIL: Process 317 waits for ShareLock on transaction 1057; blocked by process 318. Process 318 waits for ShareLock on transaction 1056; blocked by process 317. HINT: See server log for query details. CONTEXT: while updating tuple (0,7) in relation "dataset" [SQL: UPDATE dataset SET state = %(state)s, update_time = %(update_time)s WHERE id IN ( SELECT hda.dataset_id FROM history_dataset_association hda INNER JOIN job_to_output_dataset jtod ON jtod.dataset_id = hda.id AND jtod.job_id = %(job_id)s ); ] [parameters: {'state': 'queued', 'update_time': datetime.datetime(2024, 3, 7, 12, 29, 10, 229364), 'job_id': 3}] (Background on this error at: https://sqlalche.me/e/14/e3q8) ``` The likely culprit for the deadlock is that __EXTRACT_DATASET__ deals with the same dataset as the tool that created the collection __EXTRACT_DATASET__ is running on, they might both be attempting to update the output state. My thinking is that by filtering on the job_id we're not going to change state for the `__EXTRACT_DATASET__` change_state method.
Unfortunately, they are not equivalent. I haven't looked at the details, but here's what I've found: the first UPDATE statement is executed 5 times in the failing test. The 3rd and 4th pass result in 1 row selected for updating in the old version and 0 rows in the new version. |
because we didn't add the job association in the right place. |
these test fails look related
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Holding fingers crossed :)
Here's the deadlock:
The likely culprit for the deadlock is that
__EXTRACT_DATASET__
deals with the same dataset as the tool that created the collection__EXTRACT_DATASET__
is running on, they might both be attempting to update the output state.My thinking is that by filtering on the job_id we're not going to change state for the
__EXTRACT_DATASET__
change_state method.How to test the changes?
(Select all options that apply)
License