Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent block processing #931

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RCasatta
Copy link
Contributor

By changing the signature of the for_blocks function, avoiding the FnMut and returning data instead, we can process blocks in parallel.
This requires changing the code using the function because at the moment closures are using mutability in the context.

The status of the PR is very early, there are bugs sometimes with TODO, reviewers should avoid in-depth review at the moment, just want to demonstrate feasibility and have early feedback on the concept.

On my machine, with a node with an empty mempool the time taken by
./history.py 1FeexV6bAHb8ybZjqQMjJrcCrHGW9sb6uF --no-merkle-proofs
is reduced from about 3.5s to 2.5s

src/p2p.rs Outdated
}
Ok(())
let x: Vec<_> = result.into_par_iter().map(|(a, b)| func(a, b)).collect(); // TODO restore observe
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should instead start processing as soon as the data arrives, instead of starting when all the data arrives

@romanz romanz self-assigned this Aug 29, 2023
@RCasatta
Copy link
Contributor Author

This comment still applies but could be addressed in a subsequent PR.
Other known TODOs should be addressed

Comment on lines +31 to +33
if self.height < other.height {
self.tip_row = other.tip_row
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic could bring to intermediate invalid states (with middle heights still missing).
However, since all batches are always merged should be okay at the end

@RCasatta RCasatta marked this pull request as ready for review September 8, 2023 09:08
src/tracker.rs Show resolved Hide resolved
src/p2p.rs Outdated Show resolved Hide resolved
@RCasatta
Copy link
Contributor Author

RCasatta commented Oct 7, 2023

Is something needed on my end?
I see CI failures but I am not sure, any suggestion is welcome

self.spending_rows.extend(other.spending_rows.into_iter());
self.txid_rows.extend(other.txid_rows.into_iter());
if self.height < other.height {
self.tip_row = other.tip_row
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also update self.height?

Copy link
Contributor Author

@RCasatta RCasatta Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so -> #931 (comment)

Which is not ideal at all, but I am not sure if there are other better options, or if I should keep this and add a comment

@romanz
Copy link
Owner

romanz commented Oct 14, 2023

I see CI failures but I am not sure, any suggestion is welcome

IIUC, it seems to be related to this PR... could you please take a look?

src/p2p.rs Outdated
let _ = send.send(r);
});
}
let result: Result<Vec<_>, std::sync::mpsc::RecvError> =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it OK to block here (from rayon point of view)?
Asking, since when using this patch, it seems that a task is being spawned, but seems to not run:

[2023-11-20T19:44:54.785Z DEBUG electrs::p2p] peer inventory: [Block(0x423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e)]
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] got 1 new headers
[2023-11-20T19:44:54.786Z INFO  electrs::index] indexing 1 blocks: [111..111]
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] loading 1 blocks
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] spawning task for 423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] waiting for 1 blocks
[2023-11-20T19:44:54.786Z DEBUG electrs::p2p] processing 423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e
[2023-11-20T19:44:54.786Z DEBUG electrs::index] writing 3 funding and 11 spending rows from 2 transactions, 1 blocks
[2023-11-20T19:44:54.843Z INFO  electrs::signals] notified via SIG10
[2023-11-20T19:44:54.866Z INFO  electrs::chain] chain updated: tip=423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e, height=111
[2023-11-20T19:44:54.866Z DEBUG electrs::p2p] loading 1 blocks
[2023-11-20T19:44:54.866Z DEBUG electrs::p2p] spawning task for 423cfe9f541e1425b02b93d8db56f8fecdcb0ce8f58d33102d205c73f6bb037e
[2023-11-20T19:44:54.866Z DEBUG electrs::p2p] waiting for 1 blocks
<it seems to get stuck here>
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":283,"params":[25]}
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":284,"params":[10]}
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":285,"params":[5]}
[2023-11-20T19:45:48.880Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"blockchain.estimatefee","id":286,"params":[2]}
[2023-11-20T19:45:48.889Z DEBUG electrs::server] 0: recv {"jsonrpc":"2.0","method":"mempool.get_fee_histogram","id":287}

Not sure, but it may be related to rayon-rs/rayon#835... WDYT?

Copy link
Contributor Author

@RCasatta RCasatta Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a good catch! Did you have a chance to try the patch with rayon::in_place_scope instead of rayon::scope as suggested in the issue? I'll try tomorrow anyway :)

Another alternative maybe could be to use Arc<Mutex<Vec<R>>> instead of the channel (so the blocking happens inside the spawn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I can't emulate the deadlock locally (I can see the output of another debug print after recv), maybe it depends on the number of threads used by rayon.
I am going to push a temporary commit to use the CI for this

Copy link
Contributor Author

@RCasatta RCasatta Nov 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the CI is not happy with in_place_scope and neither with the Mutex version https://github.com/RCasatta/electrs/actions/runs/6941492681/job/18882464601

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't emulate the deadlock locally (I can see the output of another debug print after recv)

With locally I mean without the docker...

How can I see the logs with the docker command as it is done (docker run -v $PWD/contrib/:/contrib -v $PWD/tests/:/tests --rm electrs:tests bash /tests/run.sh) in the CI ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I see the logs with the docker command as it is done (docker run -v $PWD/contrib/:/contrib -v $PWD/tests/:/tests --rm electrs:tests bash /tests/run.sh) in the CI ?

electrs logs are stored under /data/electrs (inside the container):

2> data/electrs/regtest-debug.log &

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the dumb questions, but if the container dies after an error how to check inside the container?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the logs are indeed unavailable when the container fails in CI...

I have added two more commits (8326249 & acaebd0), so it should be possible to reproduce the issue locally by:

$ docker build -f Dockerfile.ci . --rm -t electrs:tests-bug
$ docker run -v $PWD/contrib/:/contrib -v /tmp/data:/data -v $PWD/tests/:/tests --rm electrs:tests-bug bash -x /tests/run.sh

The logs should be under /tmp/data/ after the container exits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, in the meantime I launched a couple of jobs on the CI.
The bad news is that the in_place_scope seems to deadlock the same https://github.com/RCasatta/electrs/actions/runs/6947160675
The good news is that the Mutex version isn't deadlocking https://github.com/RCasatta/electrs/actions/runs/6946996655
But before cleaning up the Mutex version I think it could be possible a mutex-free solution...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But before cleaning up the Mutex version I think it could be possible a mutex-free solution...

I was considering to use a Vec<OnceLock<R>> for thread results, however OnceLock has MSRV 1.70
So, I changed the Mutex<Vec<R>> to Vec<Mutex<Option<R>>> so that there are no blocking call excluding the end of the rayon scope

requires changing the signature of the function from FnMut -> Fn and return
data instead of relying on mutating the context
@romanz
Copy link
Owner

romanz commented Nov 28, 2023

Thanks @RCasatta! Will review at the weekend :)

})?;
self.blocks_duration
.observe_duration("process", || func(hash, block));
let mut result = Vec::with_capacity(blockhashes_len);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, can we use 5aba2a1 to simplify this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! It is much better I didn't think about that

@romanz romanz force-pushed the par_for_blocks branch 2 times, most recently from 18a31e3 to f7c8cca Compare December 2, 2023 10:25
src/tracker.rs Show resolved Hide resolved
})?;
for (b, h) in inputs_filtering {
let e = result.entry(b).or_default();
e.extend(h);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may happen that the same transaction both funds and spends the same scripthash - so we probably need to "merge" both funding and spending TxEntry instances -> f43de8f

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry to introduce so many bugs in the proposed PR, but it's pretty hard to remember all these cases, ideally some testing should enforce these...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I definitely should add more tests :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants