use binary search to get seq from offset #200

staltz · 2022-03-29T12:47:22Z

Context

I'm preparing for #199 and one thing I realized is that I'll need a quick way of getting the seq from any given offset, e.g. during compaction there is going to be unshiftedOffset and I'd like to discover unshiftedSeq.

Problem

We don't have a helper function for that and currently we're just doing a linear for-loop to look for the offset.

Another problem is that the "end" of the for-loop is tarr.length, which I think is wrong because the tarr grows much more than the count does (like it doubles in size every time it "grows" but most of it is still unused). Instead we need to use index.count.

Solution

Introduce getSeqFromOffset that does a binary search and uses indexes['seq'].count as the end marker.

This should also have a performance improvement for those cases when the jitdb indexes are many and large but we just need to update the tip of the index. This way we can avoid scanning the whole index in O(n) and can do O(log n). I'm not sure we have any benchmark for this but should be a pretty common thing to happen in production.

github-actions · 2022-03-29T12:54:35Z

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.31ms ± 0.04ms	-5.65 kB ± 15.37 kB	52
Create an index twice concurrently	579.51ms ± 2.25ms	31.17 kB ± 61.11 kB	95
Load core indexes	0.83ms ± 0.01ms	93.75 B ± 183.36 B	9274
Load two indexes concurrently	720.66ms ± 13.28ms	-90.18 kB ± 139.83 kB	17
Paginate 10 results	25.18ms ± 1.26ms	9.95 kB ± 18.95 kB	28
Paginate 20000 msgs with pageSize=5	6177.78ms ± 77.45ms	-123.13 kB ± 4086.27 kB	5
Paginate 20000 msgs with pageSize=500	594.5ms ± 23.94ms	-1.52 kB ± 712.31 kB	20
Query 1 big index (1st run)	811.31ms ± 11.76ms	1.67 kB ± 94.22 kB	67
Query 1 big index (2nd run)	281.26ms ± 5.11ms	-30.71 kB ± 29.98 kB	47
Query 3 indexes (1st run)	1137.74ms ± 16.27ms	24.41 kB ± 149 kB	47
Query 3 indexes (2nd run)	216.3ms ± 3.1ms	-96.49 kB ± 122.63 kB	54
Query a prefix map (1st run)	308.65ms ± 4.08ms	49.13 kB ± 513.65 kB	20
Query a prefix map (2nd run)	14.32ms ± 0.42ms	-3.9 kB ± 16.56 kB	24

staltz · 2022-03-29T13:00:04Z

Here's previous benchmarks to compare with: #197

But there's no noticeable difference. But do we have a benchmark to measure this case? (A big index receives a bit of updates)

arj03 · 2022-03-29T13:05:23Z

This looks like a great optimization. I'm wondering if we can't we use bsb that is already included?

staltz · 2022-03-29T13:45:37Z

Good point! I switched to using bsb.eq in getSeqFromOffset, and I took a look at the source code, it's basically the same, so that's good. Now the PR is a net negative in lines of code changed. 😎

I also added another commit use seq index.count instead of index.tarr.length.

github-actions · 2022-03-29T13:52:24Z

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.41ms ± 0.04ms	-3.24 kB ± 15.44 kB	44
Create an index twice concurrently	757.32ms ± 7ms	46.4 kB ± 73.81 kB	72
Load core indexes	1.46ms ± 0.02ms	94.27 B ± 248.4 B	6504
Load two indexes concurrently	709.12ms ± 16.92ms	57.29 kB ± 142.61 kB	17
Paginate 10 results	24.43ms ± 1.05ms	972 B ± 18893.48 B	28
Paginate 20000 msgs with pageSize=5	5856.57ms ± 250.37ms	-441.06 kB ± 4406.04 kB	5
Paginate 20000 msgs with pageSize=500	455.57ms ± 8.72ms	8.86 kB ± 669.42 kB	22
Query 1 big index (1st run)	822.93ms ± 8.8ms	-5.34 kB ± 84.71 kB	66
Query 1 big index (2nd run)	264.16ms ± 1.51ms	15.44 kB ± 31.24 kB	52
Query 3 indexes (1st run)	854.47ms ± 9.14ms	-7.25 kB ± 108.83 kB	64
Query 3 indexes (2nd run)	282.62ms ± 1.95ms	-27.61 kB ± 172.07 kB	41
Query a prefix map (1st run)	272.8ms ± 5.3ms	10.09 kB ± 179.72 kB	22
Query a prefix map (2nd run)	16.26ms ± 0.45ms	1.2 kB ± 18.66 kB	21

arj03

Excellent. This is a good improvement.

github-actions · 2022-03-29T14:39:41Z

Benchmark results

Part	Speed	Heap Change	Samples
Count 1 big index (3rd run)	0.35ms ± 0.03ms	-5.7 kB ± 18.5 kB	51
Create an index twice concurrently	585.93ms ± 16.1ms	1.02 kB ± 52.52 kB	94
Load core indexes	1.21ms ± 0.01ms	98.8 B ± 208.69 B	7478
Load two indexes concurrently	700.31ms ± 15.45ms	90.45 kB ± 100.59 kB	18
Paginate 10 results	37.91ms ± 2.34ms	47.24 kB ± 42.95 kB	22
Paginate 20000 msgs with pageSize=5	6090.69ms ± 60.36ms	2.98 MB ± 2.67 MB	5
Paginate 20000 msgs with pageSize=500	540.77ms ± 27.94ms	565.28 kB ± 582.67 kB	20
Query 1 big index (1st run)	819.2ms ± 7.73ms	-33.3 kB ± 75.36 kB	66
Query 1 big index (2nd run)	313.45ms ± 2.57ms	-6.5 kB ± 33.52 kB	43
Query 3 indexes (1st run)	1026.27ms ± 9.03ms	37.59 kB ± 137.64 kB	53
Query 3 indexes (2nd run)	226.9ms ± 2.62ms	-30 kB ± 201.89 kB	52
Query a prefix map (1st run)	294.02ms ± 4.54ms	28.23 kB ± 44.66 kB	21
Query a prefix map (2nd run)	13.75ms ± 0.55ms	-7.06 kB ± 14.91 kB	24

use binary search to get seq from offset

fcadfc4

staltz requested a review from arj03 March 29, 2022 12:49

staltz added 2 commits March 29, 2022 16:43

use bsb.eq in getSeqFromOffset

020e699

use seq index.count instead of index.tarr.length

bcba09e

arj03 approved these changes Mar 29, 2022

View reviewed changes

staltz merged commit a786bc2 into master Mar 29, 2022

staltz deleted the binary-search-seq branch March 29, 2022 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use binary search to get seq from offset #200

use binary search to get seq from offset #200

staltz commented Mar 29, 2022 •

edited

Loading

github-actions bot commented Mar 29, 2022

staltz commented Mar 29, 2022

arj03 commented Mar 29, 2022

staltz commented Mar 29, 2022 •

edited

Loading

github-actions bot commented Mar 29, 2022

arj03 left a comment

github-actions bot commented Mar 29, 2022

use binary search to get seq from offset #200

use binary search to get seq from offset #200

Conversation

staltz commented Mar 29, 2022 • edited Loading

Context

Problem

Solution

github-actions bot commented Mar 29, 2022

Benchmark results

staltz commented Mar 29, 2022

arj03 commented Mar 29, 2022

staltz commented Mar 29, 2022 • edited Loading

github-actions bot commented Mar 29, 2022

Benchmark results

arj03 left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 29, 2022

Benchmark results

staltz commented Mar 29, 2022 •

edited

Loading

staltz commented Mar 29, 2022 •

edited

Loading