-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve chainbase mapped and heap behavior #1691
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done an in depth review yet, just some initial comments,
We should mix in the new mode as part of db-modes-test
; edit: just realized we aren't exposing it as an option... so that might not work
New mapped
mode is only available on Linux: it really shouldn't be available as an option on other platforms, and certainly shouldn't be the default on other platforms.
But wait, there's more to the above.. it seems not all CPU platforms support this soft-dirty feature on Linux. Shockingly, as far as I can tell, ARM8 does not because I do not see the CONFIG_HAVE_ARCH_SOFT_DIRTY
set. ARM8 is major enough that we shouldn't break it, so if we can't find a run time way to detect a working default, maybe we need to ifdef
this just to x86 for now.
This new mapped
mode will largely interfere with users running state on tmpfs
(from the standpoint of needlessly burning more memory). I am not sure what we should do to prevent this. Prevent starting in mapped
if state is discovered on tmpfs? Silently treat mapped
as mapped_shared
if state is discovered on tmpfs?
// when loading a snapshot, all the state will be modified, so use the `shared` mode instead | ||
// of `copy_on_write` to lower memory requirements | ||
if (snapshot_path && chain_config->db_map_mode == pinnable_mapped_file::mapped) | ||
chain_config->db_map_mode = pinnable_mapped_file::mapped_shared; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to maintain the new mapped
mode for loading snapshots because I think it would be a huge perf boost (I've seen comments from some users it takes 30 minutes to load a WAX snapshot, but it takes me less than 5 minutes in heap
mode.. I'm pretty sure it's disk grinding for those users).
If we're worried about leaving 100% dirty pages after loading a snapshot, maybe one option is chainbase could expose a flush()
call that (synchronously) performs the write out so that all the pages are clean. and nodeos calls that after loading the snapshot but before continuing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to maintain the new
mapped
mode for loading snapshots because I think it would be a huge perf boost (I've seen comments from some users it takes 30 minutes to load a WAX snapshot, but it takes me less than 5 minutes inheap
mode.. I'm pretty sure it's disk grinding for those users).
This would be useful when you have just the right amount of RAM that can hold all the state in RAM, but not quite enough for heap
mode. I'm a little bit concerned that we may get more crashes this way. Maybe I can detect the available RAM, and according to the size of the disk db configured decide if it makes sense to use the new mapped
mode (for example go for it if RAM_size > 1.1 x chain-state-db-size-mb
).
If we're worried about leaving 100% dirty pages after loading a snapshot, maybe one option is chainbase could expose a flush() call that (synchronously) performs the write out so that all the pages are clean. and nodeos calls that after loading the snapshot but before continuing.
Yes that's a great idea. We can still use the soft-dirty thing as is the state-db-size is still configured much greater than the actual db_size used, it will make the flush faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add an ilog
or maybe even a wlog
that the mode was changed from specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon reflection, I think the best compromise when loading a snapshot is:
- load the snapshot in
mapped_shared
mode. Yes it is probably slower, but it minimizes the odds of running out of memory. With the newmapped
mode, we need memory for both the currently read data from the snapshot + the full chainbase db. - when the snapshot is done loading, use a new API as suggested by Matt which flushes the still dirty pages to disk and restart with a new
copy_on_write
mapping.
Do you guys agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked in the implementation of the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@heifner I updated the change to mapped_shared
to be temporary (just while loading the snapshot), so I don't think a ilog
is necessary, but I can add it if you think it might be useful still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not needed if it honors the configuration after snapshot load.
Yes, makes sense, will do. Actually the new [update] I misunderstood. I added the
Yes my bad I forgot that. Thanks for catching it.
We could do one of your suggestions, preferably refuse to start, and suggest to not have the state on |
- test both `mapped` and `mapped_shared` modes - don't try to use the `pagemap` feature on platforms where it is not available
// when loading a snapshot, all the state will be modified, so use the `shared` mode instead | ||
// of `copy_on_write` to lower memory requirements | ||
if (snapshot_path && chain_config->db_map_mode == pinnable_mapped_file::mapped) | ||
chain_config->db_map_mode = pinnable_mapped_file::mapped_shared; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add an ilog
or maybe even a wlog
that the mode was changed from specified.
I think so. If state is not flushed then it will not match the blocklog, fork-database, code-cache, and any future state we may have. |
Yes, that's also what I thought, and what the code currently does. |
@spoonincode I think this is true as well for |
libraries/chain/controller.cpp
Outdated
const auto guard = my->conf.state_guard_size; | ||
EOS_ASSERT(free >= guard, database_guard_exception, "database free: ${f}, guard size: ${g}", ("f", free)("g",guard)); | ||
|
||
// give a change to chainbase to write some pages to disk if memory becomes scarce. | ||
if (auto flushed_pages = mutable_db().check_memory_and_flush_if_needed()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this call is only safe during the write window (at least, if it's actually clearing dirty bits). Are we sure validate_db_available_size()
is only called during the write window? This one I'm not sure about due to ROtrx,
leap/libraries/chain/controller.cpp
Lines 2989 to 2997 in 19f78f9
transaction_trace_ptr controller::push_transaction( const transaction_metadata_ptr& trx, | |
fc::time_point block_deadline, fc::microseconds max_transaction_time, | |
uint32_t billed_cpu_time_us, bool explicit_billed_cpu_time, | |
int64_t subjective_cpu_bill_us ) { | |
validate_db_available_size(); | |
EOS_ASSERT( get_read_mode() != db_read_mode::IRREVERSIBLE, transaction_type_exception, "push transaction not allowed in irreversible mode" ); | |
EOS_ASSERT( trx && !trx->implicit() && !trx->scheduled(), transaction_type_exception, "Implicit/Scheduled transaction not allowed" ); | |
return my->push_transaction(trx, block_deadline, max_transaction_time, billed_cpu_time_us, explicit_billed_cpu_time, subjective_cpu_bill_us ); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This is called from multiple read-only threads for read-only trx execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -80,6 +80,8 @@ int snapshot_actions::run_subcommand() { | |||
cfg.state_size = opt->db_size * 1024 * 1024; | |||
cfg.state_guard_size = opt->guard_size * 1024 * 1024; | |||
cfg.eosvmoc_tierup = wasm_interface::vm_oc_enable::oc_none; // wasm not used, no use to fire up oc | |||
|
|||
cfg.db_map_mode = pinnable_mapped_file::map_mode::mapped; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think need this line any more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, that's the default. removed.
we should document this new mode in leap/plugins/chain_plugin/chain_plugin.cpp Lines 347 to 354 in 5c9ebe7
|
Sure will do after my current meeting. |
done. |
if (is_write_window()) { | ||
if (auto flushed_pages = mutable_db().check_memory_and_flush_if_needed()) { | ||
ilog("CHAINBASE: flushed ${p} pages to disk to decrease memory pressure", ("p", flushed_pages)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure why to leave this in, since it's #if 0
ed on the other side. Kind of deceptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will do something on the next PR, so the deception will not be long lasting.
Resolves #1650 .
Includes the unmerged (yet) chainbase PR #21
issue with
heap
mode: at leap startup and exit, we perform amemcpy
copy between two mappings of the full database size. This causes significant pressure on the file cache when physical RAM is less than 2x the database size=> solution: use a series of smaller mappings for the copy, reducing RAM contention.
issue with
mapped
mode: when leap is running, linux will grind the disk into dust with its default dirty writeback algo. it's also a perf killer.=> solution: map the file with
MAP_PRIVATE
(so changes are not written back to the file), and on exit copy just the modified pages using a series of RW mappings (info on which pages are dirty is gathered using the pagemap interface. Should we still set thedirty
bit in the db file at startup?This new implementation of
mapped
mode, as well as the existingheap
andlocked
modes, does not allow sharing an opened database in RW mode with other instances in RO mode (a capability not used in Leap afaik). In order to not completely remove the sharing functionality (for which a test exists in chainbase's test.cpp), a newmapped_shared
mode is introduced, which is the same as the oldmapped
mode.Also:
C++20
debian:buster
todebian:bullseye
to getc++20
support (std::span
)Performance test results (Github Actions)
Running:
main
branchgh_1650
branchresultAvgTps
Performance test results (Run locally)
Running:
main
branchgh_1650
branchresultAvgTps