Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent deadlock when Closing a tree #1017

Open
mark-rushakoff opened this issue Dec 13, 2024 · 0 comments
Open

Intermittent deadlock when Closing a tree #1017

mark-rushakoff opened this issue Dec 13, 2024 · 0 comments

Comments

@mark-rushakoff
Copy link
Member

I am frequently encountering this in tests involving multiple SDK apps in the same process, on iavl tag v1.3.2. I am working with the latest cosmos-sdk commit, where async pruning is fixed to true.

The relevant code snippets are:

iavl/nodedb.go

Lines 1122 to 1129 in d89d5d2

func (ndb *nodeDB) Close() error {
ndb.mtx.Lock()
defer ndb.mtx.Unlock()
ndb.cancel()
if ndb.opts.AsyncPruning {
<-ndb.done // wait for the pruning process to finish
}

and

iavl/nodedb.go

Lines 599 to 608 in d89d5d2

func (ndb *nodeDB) startPruning() {
for {
select {
case <-ndb.ctx.Done():
ndb.done <- struct{}{}
return
default:
ndb.mtx.Lock()
toVersion := ndb.pruneVersion
ndb.mtx.Unlock()

(*nodeDB).startPruning runs in its own goroutine, created during newNodeDB. (*nodeDB).Close is called on a separate goroutine, e.g. from closing an SDK commitment store. Flow during the deadlock happens as follows:

  1. The Close goroutine acquires the lock on ndb.mtx
  2. Concurrently, the startPruning goroutine enters the default case and attempts to call ndb.mtx.Lock(), but it cannot acquire the lock until the Close goroutine releases it
  3. Therefore, the Close goroutine is blocked reading from ndb.done because the startPruning goroutine cannot advance past acquiring the lock
mark-rushakoff added a commit to gordian-engine/gcosmos that referenced this issue Dec 13, 2024
The gcapp handles this now, and the presence of the extra close in
gserver made it more likely that we would run into cosmos/iavl#1017.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant