-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating defragmentation to defrag both bloomfiltertype and bloomfiler structs #24
Conversation
2bf0ce8
to
d490857
Compare
Added Defragging for Bloom as well. This involved also setting it in BloomFilter to be inside a box in order to gain ownership Example output of showing the addresses of bloom change when defragged
|
src/wrapper/bloom_callback.rs
Outdated
let bloom_filter_box = bloom_filter_type.filters.remove(0); | ||
let bloom_filter = Box::into_raw(bloom_filter_box); | ||
let defrag_result = unsafe { | ||
raw::RedisModule_DefragAlloc.unwrap()( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the places we use this API, we should handle the case where it returns null
re-using the same allocation / structure
You can use is_null()
on a pointer that you have
src/wrapper/bloom_callback.rs
Outdated
|
||
let inner_bloom = mem::replace( | ||
&mut defragged_filter.bloom, | ||
Box::new(bloomfilter::Bloom::new(1, 1)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to avoid additional allocations and frees from the defrag code path.
Is there an alternative approach to get ownership of the inner bloom? any other std::mem
function that can be used?
Would an Option
on the inner bloom work instead of a Box
? This allows us to use take
and obtain ownership safely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other option is using a LazyStatic that contains a mutex of an Option of a Box of a Bloom :D
lazy_static! {
static ref DEFRAG_BLOOM_FILTER: Mutex<Option<Box<Bloom<[u8]>>>> = Mutex::new(Some(Box::new(Bloom::<[u8]>::new(1, 1))));
}
With this, we can re-use the same allocation for every bloom operation - this is not allocated / freed and instead we swap it back and forth to help obtain ownership.
bda4fc6
to
3ea5c06
Compare
@@ -112,7 +112,7 @@ impl ValkeyDataType for BloomFilterType { | |||
num_items as u32, | |||
capacity as u32, | |||
); | |||
filters.push(filter); | |||
filters.push(Box::new(filter)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this to avoid re-assigning the ownership of filter variable? I don't understand the reason for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we put the filter in the box this allows us to take it out later and have ownership which means we can update the pointers address to actually change it. Without putting it into the box we couldn't actually change the addresses associated with it. Documentation for box
@@ -14,6 +14,7 @@ pub const BLOOM_FP_RATE_DEFAULT: f64 = 0.001; | |||
pub const BLOOM_FP_RATE_MIN: f64 = 0.0; | |||
pub const BLOOM_FP_RATE_MAX: f64 = 1.0; | |||
|
|||
pub const BLOOM_DEFRAG_DEAFULT: bool = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will a user/operator change this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can add this to the start up arguments:
--bf.bloom-defrag-enabled no
This will skip all the code related to bloom defragmenting and instantly just return 0
@@ -128,22 +139,113 @@ pub unsafe extern "C" fn bloom_free_effort( | |||
curr_item.free_effort() | |||
} | |||
|
|||
lazy_static! { | |||
static ref DEFRAG_BLOOM_FILTER: Mutex<Option<Box<Bloom<[u8]>>>> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document the intention here? I am not able to reverse engineer it from code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used as a safe way to take out the Bloom object without dropping it. This is a temporary bloom object which we put in while we attempt to defrag the original Bloom object. The ref is used to show it is a reference to the temp bloom object instead of the data of the temp object
d917921
to
17b961d
Compare
|
||
use valkey_module::raw; | ||
|
||
pub struct Defrag { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker. When you have time, let's contribute this to valkeymodule-rs. Once it is released, we can use it from there
/// | ||
/// Returns an `i32` where: | ||
/// * 0 indicates successful complete defragmentation. | ||
/// * 1 indicates incomplete defragmentation (not all filters were defragged). | ||
pub unsafe extern "C" fn bloom_defrag( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is looking good functionally (and from testing). Thank you!
Next, we can think about monitoring:
For monitoring, we can consider adding hit and miss metrics for the different layers of allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add the metrics for hits and misses in a seperate pr
fc59870
to
d8b41c2
Compare
…mexists and add_items_till_capacity. Refactored metric tracking as well to reduce code repetition Signed-off-by: zackcam <[email protected]>
e3573fa
to
bd24972
Compare
@@ -79,6 +80,9 @@ impl ValkeyDataType for BloomFilterType { | |||
return None; | |||
}; | |||
let is_seed_random = is_seed_random_u64 == 1; | |||
|
|||
let mut filters = Vec::with_capacity(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a short documentation explaining why we use this - Vec::with_capacity(1)
here? I think I had made this comment earlier
src/bloom/data_type.rs
Outdated
mem::size_of::<BloomFilterType>(), | ||
std::sync::atomic::Ordering::Relaxed, | ||
); | ||
|
||
BLOOM_NUM_OBJECTS.fetch_add(1, std::sync::atomic::Ordering::Relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this line into the function bloom_filter_type_incr_metrics_on_new_create
? It is used in every caller site of the bloom_filter_type_incr_metrics_on_new_create
and can help condense metrics code
BLOOM_NUM_OBJECTS.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
You can update the following caller sites:
- new_reserved
- load_from_rdb
- create_copy_from
- decode_bloom_filter
src/wrapper/defrag.rs
Outdated
/// # Safety | ||
/// | ||
/// This function is temporary and will be removed once implemented in valkeymodule-rs . | ||
pub unsafe fn cursorset(&self, cursor: u64) -> i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Naming:
set_cursor
might make this more readable / Rust like - Question: What is the return code here? What should the caller site do with this i32? if this is meant to be a void, can we remove the return code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return code is useful. But we can update it from i32 to use https://docs.rs/valkey-module/latest/valkey_module/raw/enum.Status.html#variant.Ok to follow the convention of this SDK
src/wrapper/defrag.rs
Outdated
/// # Safety | ||
/// | ||
/// This function is temporary and will be removed once implemented in valkeymodule-rs . | ||
pub unsafe fn cursorget(&self, cursor: *mut u64) -> i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Naming:
get_cursor
might make this more readable / Rust like - Can this simply return an u64 instead of accepting a mutable pointer? It will a simpler interface to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe have this return Result<u64, ()>
or Option<u64>
tests/valkey_bloom_test_case.py
Outdated
print(total_memory_bites) | ||
print(expected_memory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this
assert total_memory_bites == expected_memory | ||
assert num_objects == expected_num_objects | ||
assert num_filters == expected_num_filters | ||
assert num_items == expected_num_items | ||
assert sum_capacity == expected_sum_capacity | ||
|
||
def parse_valkey_info(self, section): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment for documentation here
bd24972
to
c6dd7ad
Compare
Signed-off-by: zackcam <[email protected]>
c6dd7ad
to
7de8717
Compare
Overview
Updated the defragmentation callback. Now it defrags the inner BloomFilter by looping over the filters and calling defrag method on each filter. Afterwards we call the defrag on the BloomFilterType and update that value as well.
In order to allow defragmentation on the inner BloomFilter from the BloomFilterType we needed to update the vec in type to hold a box of a BloomFilter this way we could get ownership of the BloomFilter object and update it as necessary. Sometimes due to the nature of rust the just unallocated addresses could be used by a following defrag call and reallocation
Example output of showing the addresses change when defragged
Command Ran:
bf.insert scale CAPACITY 10 ITEMS 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17