Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace recursive tree traversal #58

Merged
merged 25 commits into from
Mar 18, 2021
Merged

Replace recursive tree traversal #58

merged 25 commits into from
Mar 18, 2021

Conversation

wngr
Copy link
Contributor

@wngr wngr commented Mar 16, 2021

Closes https://github.com/Actyx/Cosmos/issues/5824
Closes https://github.com/Actyx/Cosmos/issues/6153
next step is to filter_chunked_reverse ..

let digest: [u8; 32] = value.hash().digest().try_into()?;
Ok(Self(digest))
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. just some code shuffling in the tests

@wngr wngr requested a review from rklaehn March 16, 2021 15:25
@wngr wngr force-pushed the ow/iter-iter-iter branch from ed83a94 to 5f7ce1e Compare March 16, 2021 20:31
@wngr wngr force-pushed the ow/iter-iter-iter branch from 5f7ce1e to a105154 Compare March 16, 2021 20:42
@wngr wngr force-pushed the ow/iter-iter-iter branch from 24032ca to c91803c Compare March 16, 2021 21:35
banyan/src/forest/read.rs Outdated Show resolved Hide resolved
@wngr wngr force-pushed the ow/iter-iter-iter branch from f949d3f to cab7d66 Compare March 17, 2021 09:04
Copy link
Contributor

@rklaehn rklaehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all: AWESOME!!!

Now, some remarks:

  • I think it is not clear whether it is worth it to have the iterator do double duty as forward and backward iterator. There are so many mode switches that it might be nicer to just have a bit more duplicate code and have a separate forward and backward iterator. Maybe the code can be DRYed in another way.

  • what would be awesome would be if you could combine the two stacks into one, and use a struct with named fields for that. It seems that one stack is "lagging" the other one, but other than that they seem to be used in the same way. The depth is going to be the same at all times +-1. So a single stack might make the code more readable...

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
banyan/src/forest/read.rs Outdated Show resolved Hide resolved
banyan/src/forest/read.rs Outdated Show resolved Hide resolved
banyan/src/forest/read.rs Outdated Show resolved Hide resolved
banyan/src/forest/read.rs Show resolved Hide resolved
@wngr wngr force-pushed the ow/iter-iter-iter branch from f6fff3c to accd906 Compare March 17, 2021 13:31
@wngr
Copy link
Contributor Author

wngr commented Mar 17, 2021

@rklaehn I combined both stacks into one, and created a named struct to use it with: TraverseState. I tried to DRY up the code a bit with that, but did not split up the forward and backwards case -- you think this is sufficiently clear now or should I split it up (or better: you have an idea to make it clearer but not split it up :-))?

@wngr wngr force-pushed the ow/iter-iter-iter branch from accd906 to 257be22 Compare March 17, 2021 13:35
@wngr wngr force-pushed the ow/iter-iter-iter branch from 257be22 to 99af728 Compare March 17, 2021 13:39
Mode::Backward => *pos == usize::MAX,
};
if new_branch {
if head.filter.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much nicer this way, I think.

@@ -162,44 +183,39 @@ where
self.query
.intersecting(start_offset, index, &mut q_matching);
debug_assert_eq!(branch.children.len(), q_matching.len());
let _ = std::mem::replace(matching, q_matching);
head.filter.replace(q_matching);

if matches!(self.mode, Mode::Backward) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer here to set the position in both cases, instead of using the default if Mode is Forward.

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@rklaehn rklaehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. Will probably approve. Just checking this out to look at it some more.

@rklaehn
Copy link
Contributor

rklaehn commented Mar 17, 2021

I think this would benefit from a test that builds a super-degenerate tree aka linked list with 10000 iitems and then tries to traverse it. I think we should be able to make this reliably fail with the previous impl. I will write it.

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
@rklaehn
Copy link
Contributor

rklaehn commented Mar 17, 2021

I think there is a bug with the range for the placeholder. Also, I have experimented with how to simplify the stack frame. See #59

index: Arc<Index<T>>,
// If `index` points to a branch node, `position` points to the currently
// traversed child
position: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth noting that this is not the child offset, but the child offset in case of forward traversal, and the child offset + 1 in case of backward traversal.

I guess you have already tried to make this an isize and have it be exactly the position?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this out here: 10bfa3d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Let's get it in!
I haven't tried that. At some point I kind of lost sight ..

@rklaehn
Copy link
Contributor

rklaehn commented Mar 17, 2021

Here is an experiment, making position an isize so it can refer directly to the child:

https://github.com/Actyx/banyan/tree/rkl/experiments

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@rklaehn rklaehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might still try to DRY this, especially the forward/backwards branches. But I think it reads quite well now. Great work.

Copy link
Member

@rkuhn rkuhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, quite readable, nice and tidy! Just a few nitpicks.

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
type Item = Result<FilteredChunk<T, V, E>>;

fn next(&mut self) -> Option<Self::Item> {
let res: FilteredChunk<T, V, E> = loop {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that I’ll find a break x somewhere in here, maybe just return Some(Ok(x)) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a matter of preference I guess. I like the break approach better. Less boilerplate in the loop.

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
continue;
}

match self.forest.load_node(&head.index) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side remark: it feels weird to apply a “load node” function to an index data structure when in my head the index is part of said node — maybe I got this wrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it is indeed a bit weird. All nodes in banyan are "split in half". It is not very object oriented, since data that you would logically group together is separate so it is ordered by probability / frequency of access.

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
banyan/src/forest/read.rs Outdated Show resolved Hide resolved
extra: (self.mk_extra)(index.as_index_ref()),
};

break placeholder;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it necessary to emit these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the placeholders?

If you skip a subtree because either the filter does not match, or the subtree has been purged, you still want a placeholder to indicate that you have skipped offsets x..x+n. There is also the mk_extra fn that can extract some info from the tree node, e.g. the last lamport.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that iter_filtered_chunked is a low level API that can be used in multiple ways. If you want just the data, you can just flatten them away.

banyan/src/forest/read.rs Outdated Show resolved Hide resolved
banyan/src/forest/read.rs Outdated Show resolved Hide resolved
// tree by using an appropriate mk_extra fn, or check
// `data.len()`.
Ok(_) => {
let TraverseState { index, .. } = self.stack.pop().expect("not empty");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that the tree always has at least one branch, correct? With time-based expiry I could imagine a tree that is pruned completely empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tree where all events have been pruned will be branch containing a summary of the pruned data, including the offset range. You can never go from a non-empty tree to an empty tree.

The initial empty tree is peeled off with the struct Tree, which is an option of a non empty tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But - it would be nicer to more explicitly spell out the invariants. So might be worth trying out https://crates.io/crates/contracts for banyan - if there is a way to switch the stuff off without switching to release mode...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an issue: #60

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words: If the root node is a PrunedBranch, we will end up in this case with Ok(PrunedBranch(index)) and everything is fine. This code won't panic.

@wngr wngr merged commit ea1924a into master Mar 18, 2021
@wngr wngr deleted the ow/iter-iter-iter branch March 18, 2021 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants