Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster sort #123

Merged
merged 8 commits into from
Feb 20, 2021
Merged

Faster sort #123

merged 8 commits into from
Feb 20, 2021

Conversation

arj03
Copy link
Member

@arj03 arj03 commented Feb 19, 2021

This is a PR to improve the performance of paginated queries with a large number of results. The main bottleneck was the sorting of the result set before doing a slice. I tried a lot of different sorting techniques including radix sort but they all had a drawback that made them not quite right. Instead I started looking and though, our old friend Lemire must have thought about this problem, lo and behold he has: We can sort and slice...

The idea is to use a priority queue instead of sorting the array and plucking elements from that instead. I was a bit concerned with the clone() call in the sortedCache but that turns out to be still of value.

Old

public (post + contact): 545 ms
public (post + contact) page 2: 551 ms (total for page 1+2)
private (post + contact): 50 ms

New

public (post + contact): 214 ms
public (post + contact) page 2: 241 ms (total for page 1+2)
private (post + contact): 50 ms

@arj03
Copy link
Member Author

arj03 commented Feb 19, 2021

Need to link this db2 benchmark issue.

I'll look at the tests, was working fine locally.

@arj03
Copy link
Member Author

arj03 commented Feb 19, 2021

Failing test:

[
  {
    previous: null,
    sequence: 1,
    author: '@KNsYec80tCESsq7Q/x21ujtqts78gZ/LUflLw8dqByg=.ed25519',
    timestamp: 1613719231934,
    hash: 'sha256',
    content: { type: 'post', text: '1st' },
    signature: '/OgsgJW+/DPtj/JGlOWGAkRcV4c8ibtBwfRmdlPiRZyYz1yMofLYVc5Cn346/gxuFD2pOuSo8UYZ3i8dVDBdCA==.sig.ed25519'
  },
  {
    previous: '%vONc1qtCUZp73PgM4yMrpmEFknhq+WPRjRUoI35StxA=.sha256',
    sequence: 3,
    author: '@KNsYec80tCESsq7Q/x21ujtqts78gZ/LUflLw8dqByg=.ed25519',
    timestamp: 1613719229937,
    hash: 'sha256',
    content: { type: 'post', text: '3rd' },
    signature: 'DF5p0+NheKkPQe4W8gF8im6/znglACrebabUqU9lyEdcwvjtCFwbp5nMd1TQKoHV5yHz5hvWdiLCbeRU0X2yDw==.sig.ed25519'
  },
  {
    previous: '%Z2bIU8EzqRp787LIAsoZ401RBdCachb0DR0xEQoasBY=.sha256',
    sequence: 2,
    author: '@KNsYec80tCESsq7Q/x21ujtqts78gZ/LUflLw8dqByg=.ed25519',
    timestamp: 1613719230936,
    hash: 'sha256',
    content: { type: 'post', text: '2nd' },
    signature: 'vc30K0Pd6dVSn7vDgQQla7u9TkmWNrDquka22+ltbeaolUVQyron+aZSxP9ujBHyy4eFCRZTsUFWHywIJNVsAA==.sig.ed25519'
  }
]

Lets see if it is better now.

@arj03
Copy link
Member Author

arj03 commented Feb 19, 2021

Right, the problem is that we take the min of arrival and declared and if you are unlucky the arrival on 2 of them will be the same. So the test doesn't really work.

@github-actions
Copy link

Benchmark results

Part Duration
Load core indexes 10ms
Query 1 big index (1st run) 1041ms
Query 1 big index (2nd run) 317ms
Count 1 big index (3rd run) 1ms
Create an index twice concurrently 329ms
Query 3 indexes (1st run) 602ms
Query 3 indexes (2nd run) 266ms
Load two indexes concurrently 489ms
Paginate 1 big index 9496ms
Query a prefix map (1st run) 270ms
Query a prefix map (2nd run) 10ms

@staltz
Copy link
Member

staltz commented Feb 19, 2021

Very neat idea with a priority queue! I'll review carefully later. I might be able to help with sortedCache since I introduced that.

By the way, I also thought that the sorting could be done gradually. Like if we are doing paginate with pageSize=1 every time, then basically we are just looking for maximums. I know that insertion sort allows us to do it gradually, but I wonder if there are other faster algorithms that do it gradually. Then we can spread that sorting workload across the many paginate calls.

@arj03
Copy link
Member Author

arj03 commented Feb 19, 2021

I put this into the browser and have been clicking around. Maybe it's just psychological, but this feels snappier than before. The numbers are also good. Public is now 40ms.

@staltz
Copy link
Member

staltz commented Feb 19, 2021

Great news!

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

Should have mentioned in the other one, but with the latest changes from @staltz this sometimes goes as low as 164 (first page) and 180 (total, 2x25 messages). It is a bit all over the place, but somewhere between 164 and 240 for the first one.

@github-actions
Copy link

Benchmark results

Part Duration
Load core indexes 17ms
Query 1 big index (1st run) 1040ms
Query 1 big index (2nd run) 336ms
Count 1 big index (3rd run) 1ms
Create an index twice concurrently 318ms
Query 3 indexes (1st run) 563ms
Query 3 indexes (2nd run) 251ms
Load two indexes concurrently 462ms
Paginate 1 big index 22757ms
Query a prefix map (1st run) 316ms
Query a prefix map (2nd run) 18ms

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

Strange that this seems to be slower on the jitdb benchmark, will try and run that locally

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

I see the same here locally. For reference Paginate 1 big index was 269ms before.

@staltz
Copy link
Member

staltz commented Feb 20, 2021

So strange... (I'm on mobile)

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

In the esc meeting right now, but I'll have a look at the benchmarks here to see what is up and see if we can get them better aligned with real world tests.

@staltz
Copy link
Member

staltz commented Feb 20, 2021

Yeah, I'll look into asyncAOL stream bugs in the meanwhile.

@staltz
Copy link
Member

staltz commented Feb 20, 2021

I think these numbers are real, and the "paginate 1 huge index" benchmark is going through 4000 msgs, for each of those it takes roughly 10ms, sometimes less, which individually sounds like a little, but in aggregate it's indeed 20s+.

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

Right 5 * 4000. So the question is, is that a good benchmark to have? I mean is it a realistic case. I'm thinking something like 25 * 10 is maybe a better benchmark to have.

@staltz
Copy link
Member

staltz commented Feb 20, 2021

On master branch, I put some markers to measure performance of doing the cache lookup and then a simple arr.slice(), and individually it takes about 0.007ms.

@staltz
Copy link
Member

staltz commented Feb 20, 2021

Right 5 * 4000. So the question is, is that a good benchmark to have? I mean is it a realistic case. I'm thinking something like 25 * 10 is maybe a better benchmark to have.

Let me take a look if any one of the queries I do in Manyverse looks like this. But I'm inclined to say that yes it matters, because to build ssb-threads we want to go through several posts as quickly as possible. All things considered, on master branch we had 1300ms for the first public query, then 1100ms to go through 4000 msgs, so 2400ms in total. In this PR we have the total number adding up to 24000ms, 10x larger.

I do like the FastPriorityQueue idea, I just think we need to be careful with it. So let's not merge just yet.

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

Ok, maybe you can increase the chunksize a bit?

@staltz
Copy link
Member

staltz commented Feb 20, 2021

Hmm, that's a decent idea, I wonder if I could make a pull-stream operator that makes it transparent so that I can pull one item at a time, but underneath it's actually using pagination with a large page size.

I did a quick study, see below. Sensible page sizes are in the hundreds, 500 was a sweet spot. This is for the "paginate one huge index" benchmark.

Total items = 20k in all of the scenarios below

Page size Number of pages Duration
1 20000 36532ms
5 4000 18500ms
10 2000 8870ms
20 1000 4557ms
50 400 2206ms
100 200 1171ms
200 100 657ms
500 40 398ms
1000 20 349ms
2000 10 320ms
5000 4 291ms
20000 1 246ms

@staltz
Copy link
Member

staltz commented Feb 20, 2021

I approve this PR, but I'd like to add a benchmark that uses pageSize=5 and another that uses pageSize=1000, just so that we keep tracking the performance of these different ranges of values.

@arj03
Copy link
Member Author

arj03 commented Feb 20, 2021

Please do, not in a super rush to get this merged.

It's a bit of a trade-of here. I like the idea of a chunked reader, I wonder if we could do something that would be general enough to work in the async iterator case as well. Almost sounds like someone must have thought of that problem before 🤔

@staltz
Copy link
Member

staltz commented Feb 20, 2021

I like the idea of a chunked reader, I wonder if we could do something that would be general enough to work in the async iterator case as well. Almost sounds like someone must have thought of that problem before thinking

Now that I think about it, I don't think we need a new operator, we can just do

pull(
  query(
    fromDB(db),
    and(equal(seekType, 'post', { indexType: 'type' })),
    paginate(PAGESIZE),
    toPullStream()
  ),
  pull.take(NUMPAGES),
  pull.map(pull.values), // this
  pull.flatten(),        // and this
  pull.drain(op)
) 

For async iterators, people can do it manually or use an async iterable library such as https://github.com/ReactiveX/IxJS, we don't need to cover those use cases, because JITDB's responsibility ends after the toPullStream() or toAsyncIterator().

@github-actions
Copy link

Benchmark results

Part Duration
Load core indexes 6ms
Query 1 big index (1st run) 1023ms
Query 1 big index (2nd run) 322ms
Count 1 big index (3rd run) 1ms
Create an index twice concurrently 325ms
Query 3 indexes (1st run) 573ms
Query 3 indexes (2nd run) 278ms
Load two indexes concurrently 514ms
Paginate 20000 msgs with pageSize=5 20943ms
Paginate 20000 msgs with pageSize=500 432ms
Query a prefix map (1st run) 274ms
Query a prefix map (2nd run) 17ms

@staltz staltz merged commit 0f475d3 into master Feb 20, 2021
@staltz
Copy link
Member

staltz commented Feb 20, 2021

Great PR, great collaboration, and great that most of our hurdles are now behind us!

@staltz staltz deleted the faster-sort branch February 20, 2021 20:28
@github-actions
Copy link

Benchmark results

Part Duration
Load core indexes 8ms
Query 1 big index (1st run) 998ms
Query 1 big index (2nd run) 332ms
Count 1 big index (3rd run) 0ms
Create an index twice concurrently 366ms
Query 3 indexes (1st run) 587ms
Query 3 indexes (2nd run) 272ms
Load two indexes concurrently 476ms
Paginate 20000 msgs with pageSize=5 21661ms
Paginate 20000 msgs with pageSize=500 413ms
Query a prefix map (1st run) 257ms
Query a prefix map (2nd run) 17ms

@staltz staltz mentioned this pull request Aug 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants