Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sqlite slasher backend impl #4666

Closed
wants to merge 25 commits into from

Conversation

eserilev
Copy link
Collaborator

@eserilev eserilev commented Aug 25, 2023

Issue Addressed

#4424

Proposed Changes

add sqlite impl to slasher backend

Additional Info

DB Schema

CREATE TABLE {} (
    id    INTEGER PRIMARY KEY AUTOINCREMENT,
    key   BLOB UNIQUE,
    value BLOB
);
  • the ID is also used in the cursor to track the current value, next value etc.
  • I believe the key unique constraint is required. the put function does a INSERT OR REPLACE to replace value if key already exists

@eserilev eserilev changed the base branch from stable to unstable August 25, 2023 19:16
@eserilev
Copy link
Collaborator Author

still some code quality improvements to make here, but the general functionality is all here and the tests are passing

@eserilev
Copy link
Collaborator Author

eserilev commented Sep 30, 2023

I made some decent progress here. I'm using the rusqlite builtin Transaction to execute db queries and only committing when commit is called. I also made some improvements to prune_attestations in a similar vein to what was done for the redb impl. I also plan to potentially make some improvements to the database schema. The slasher tests are all passing and everything is almost ready for another round of benchmarks

But theres one issue here that I cant seem to resolve. Rusqlite doesnt seem to play nice in multi threaded situations.

future cannot be sent between threads safely
within `Slasher<<T as BeaconChainTypes>::EthSpec>`, the trait `Sync` is not implemented for `RefCell<rusqlite::inner_connection::InnerConnection>`
if you want to do aliasing and mutation between multiple threads, use `std::sync::RwLock` instead

The main issue here is that the sqlite database connection that i set as part of Environment struct cant be shared across threads safely. This ends up resulting in a bunch of compiler errors upstream in the beacon_node crate.

I'm pretty sure I need conn as part of the Environment struct so that I can properly create the rusqlite transaction within begin_rw_txn. If I try creating the database connection within begin_rw_txn I'll be attempting to reference a temporary db connection when creating Transaction. I tried playing around with mutexs and arcs but I dont see how they could be compatible within begin_rw_txn (i'll still be attempting to reference temp vars when creating the txn in some way)

I experimented with the r2d2 crate to see if I could resolve these compiler errors, but so far thats been a bit of a dead end. I'll keep digging and see if I can come up a solution here

@michaelsproul
Copy link
Member

We use an r2d2 connection pool for sqlite in the validator client, and I'm fairly sure it solves the Send+Sync problem. I'm afk now but can help get that working next week.

@michaelsproul
Copy link
Member

Great to hear about the progress! I'm excited!

@eserilev
Copy link
Collaborator Author

eserilev commented Oct 9, 2023

I was able to get batch times down to around 50 seconds by altering two pragma values:
journal_mode: wal and synchronous: NORMAL (https://www.sqlite.org/pragma.html)

Updated metrics:
https://snapshots.raintank.io/dashboard/snapshot/GJ1PseLyV5RLTDcvoJuSx93jruw6Vsc3

theres a few more tweaks I'll be trying over the following days to hopefully get batch times down further

@eserilev
Copy link
Collaborator Author

my most recent changes have gotten batch times down to around 23 seconds

https://snapshots.raintank.io/dashboard/snapshot/hIwPCJmC26Yjg2REEAdb0ueJFPpyNz22

These changes include

  • Removing the ID primary key
  • using prepare_cached which prepares a SQL statement for execution, returning a previously prepared statement from the cache if one is available.
  • Updating locking_mode pragma to EXCLUSIVE
  • Updating cache_size pragma to 10000

There may still be some optimizations to be made in delete_while, though I'm not sure that will get us down to sub 5 second batch times

@dapplion dapplion added the ready-for-review The code is ready for review label Jan 19, 2024
@eserilev
Copy link
Collaborator Author

closing this for now, I havent been able to get batch times down below 20 seconds.

will try revisiting in the future

@eserilev eserilev closed this Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database ready-for-review The code is ready for review slasher
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants