-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use binary search to get seq from offset #200
Conversation
Benchmark results
|
Here's previous benchmarks to compare with: #197 But there's no noticeable difference. But do we have a benchmark to measure this case? (A big index receives a bit of updates) |
This looks like a great optimization. I'm wondering if we can't we use bsb that is already included? |
Good point! I switched to using bsb.eq in getSeqFromOffset, and I took a look at the source code, it's basically the same, so that's good. Now the PR is a net negative in lines of code changed. 😎 I also added another commit use seq index.count instead of index.tarr.length. |
Benchmark results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. This is a good improvement.
Benchmark results
|
Context
I'm preparing for #199 and one thing I realized is that I'll need a quick way of getting the seq from any given offset, e.g. during compaction there is going to be
unshiftedOffset
and I'd like to discoverunshiftedSeq
.Problem
We don't have a helper function for that and currently we're just doing a linear for-loop to look for the offset.
Another problem is that the "end" of the for-loop is
tarr.length
, which I think is wrong because the tarr grows much more than thecount
does (like it doubles in size every time it "grows" but most of it is still unused). Instead we need to useindex.count
.Solution
Introduce
getSeqFromOffset
that does a binary search and usesindexes['seq'].count
as the end marker.This should also have a performance improvement for those cases when the jitdb indexes are many and large but we just need to update the tip of the index. This way we can avoid scanning the whole index in
O(n)
and can doO(log n)
. I'm not sure we have any benchmark for this but should be a pretty common thing to happen in production.