Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more slasher backends (redb, sqlite, postgres) #4424

Open
3 tasks
michaelsproul opened this issue Jun 22, 2023 · 12 comments
Open
3 tasks

Add more slasher backends (redb, sqlite, postgres) #4424

michaelsproul opened this issue Jun 22, 2023 · 12 comments
Labels
database slasher waiting-on-author The reviewer has suggested changes and awaits thier implementation.

Comments

@michaelsproul
Copy link
Member

michaelsproul commented Jun 22, 2023

Description

It would be great to have even more choice for slasher database backends beyond LMDB and MDBX. This could also be a good way for us to try out a couple of new DBs to see if they'd be suitable for the beacon node itself (which currently only supports LevelDB).

The candidates I'd like to try are:

I think redb would be the easiest, as it's most similar to the existing backends. Then sqlite, because it can be embedded in process and we already use it in the VC via rusqlite. Most difficult would be postgres, because we'd either need to assume that a postgres server is running (yuck) or run one ourselves in a Docker container or something (also kind of yuck).

Steps to resolve

  1. Add a new variant for the database to the DatabaseBackend enum.
  2. Add a new file in e.g. slasher/src/database/redb_impl.rs, with types named similarly to the other existing backends, see lmdb_impl.rs
  3. Add variants to the enums in interface.rs for the new backend.
  4. Get the slasher tests to pass with the DEFAULT_BACKEND set to the new backend, and cargo test --release -p slasher. We can worry about the feature flag stuff later.
@eserilev
Copy link
Collaborator

eserilev commented Jun 23, 2023

I think I saw you mention this as a project idea in the Ethereum Protocol Fellowship repo. I plan on participating in the upcoming cohort and would be very interested in picking this up as my project.

@Gua00va
Copy link
Contributor

Gua00va commented Jun 24, 2023

I think I saw you mention this as a project idea in the Ethereum Protocol Fellowship repo. I plan on participating in the upcoming cohort and would be very interested in picking this up as my project.

I am also interested in taking this as my project idea. Would you mind working as a team?

@eserilev
Copy link
Collaborator

eserilev commented Jun 24, 2023

I am also interested in taking this as my project idea. Would you mind working as a team?

yeah! would be great to work on this as a team

@Gua00va
Copy link
Contributor

Gua00va commented Jun 24, 2023

I am also interested in taking this as my project idea. Would you mind working as a team?

yeah! would be great to work on this as a team

Great! Ping me in the discord server. I have the same username.

@michaelsproul
Copy link
Member Author

Now that we have a couple of new backends in the works, namely Redb and Sqlite, it would be great to see a performance shoot-out. For the slasher the key metric is the time taken to process each batch, which is recorded in Prometheus as slasher_process_batch_time. We have a Grafana dashboard for this here: https://github.com/sigp/lighthouse-metrics/blob/master/dashboards/Slasher.json

A good test would be to start-up a slasher on the same machine with each of the backends -- LMDB, Redb, Sqlite -- and compare the performance. To prevent them competing for CPU and I/O it's probably best to run the slashers one at a time (not concurrently). Running each one for a few hours should be enough to get a feel for its performance. If one or both of the new backends are competitive in this test we can then look into running a longer test (2 weeks+) in which we can measure the total disk usage and the long-term performance.

I have a shared execution node on mainnet (running eleel) which I could grant access to for this experiment. If this is something you're interested in doing @eserilev, DM me your SSH pubkey on Discord.

@eserilev
Copy link
Collaborator

@michaelsproul this sounds great, I am sending you my SSH pubkey now. thank you!

@eserilev
Copy link
Collaborator

sqlite metrics: https://snapshots.raintank.io/dashboard/snapshot/TW4SxQCSehrE8534trSC0Kc9vIt2szLg?orgId=2&refresh=5s

ended up reaching 640mb in db size after about 2.5 hours

going to run lmdb and mdbx next to compare

@michaelsproul
Copy link
Member Author

The batch processing time for sqlite looks like it was 30 minutes and then an hour! LMDB is usually around 1 second

@eserilev
Copy link
Collaborator

eserilev commented Sep 19, 2023

yeah, looking at the lmdb metrics: https://snapshots.raintank.io/dashboard/snapshot/ivmfD00gWTrTYYgNryBL2eBtfWzd3to0
the batching times are radically different.

one issue im seeing is that in the sqlite implementation i immediately write to the db each time put is called. commit actually doesnt do anything and is just a placeholder at the moment.

I could try to instead append to a query string at each call to put and only commit to the db when commit is called.

another issue could potentially be my schema

 "CREATE TABLE IF NOT EXISTS {} (
                id    INTEGER PRIMARY KEY AUTOINCREMENT,
                key   BLOB UNIQUE,
                value BLOB
 );"

i'm going to spend the next few days tweaking the sqlite implementation to see if I can make any improvements here. will also post the results for redb shortly

@michaelsproul
Copy link
Member Author

Nice. I think the Rust Sqlite crate should have a native Transaction type we can use to stage the writes, and then commit. That could make a small difference.

Separate tables with different indices could also help. I wonder if we could use a BLOB as the primary key?

@eserilev
Copy link
Collaborator

eserilev commented Sep 24, 2023

redb metrics: https://snapshots.raintank.io/dashboard/snapshot/CWAriorThh0qxe0BtsUeSGoi7TK2KZCq

batch times are around 5 seconds. I need to refactor the code a bit to clean things up, and there may be some other small optimizations I can make here. But the 5 second batch time seems like a decent result.

I'm not sure why the Database size graph looks like a step function

Also I saw this error message a few times:

Unable to obtain indexed form of attestation for slasher, error: UnknownTargetRoot(0x3df8840b40a4675c7f082dc5e89b6075932b1d67f7ac8e7c95b7fec7ad0ee194), attestation_root: 0x5912802b8ab4e26f8d1f8eb223c6ae88b38f7d27953feed66ce8d95ba7a60653, service: beacon
Sep 24 12:26:03.466

not sure if theres a bug in the redb implementation, i'll try to dig in and see if i can recreate the issue on a subsequent run.

@michaelsproul
Copy link
Member Author

Sweet! That's definitely in the right ballpark. I'll help review your Redb PR and hopefully we can merge it soon-ish!

@dapplion dapplion added the waiting-on-author The reviewer has suggested changes and awaits thier implementation. label Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database slasher waiting-on-author The reviewer has suggested changes and awaits thier implementation.
Projects
None yet
Development

No branches or pull requests

4 participants