IntSet.fromList and its benchmarks #652

jwaldmann · 2019-07-02T19:47:27Z

I experimented with IntSet.fromList via binary fold union, see jwaldmann@975781b
(cf. #330 but without computing runs)

I thought I had a huge improvement - but then found that this is due to the benchmarks: all the data is dense (contiguous numbers, even numbers, odd numbers). For these, my implementation cuts runtimes (of fromList) nearly in half. But for sparse data (square numbers, pseudo-random numbers) it does not.

On the other hand it does not increase runtime much so perhaps there's a way to make use of the idea. That's for later.

For now, I suggest that benchmarks be extended by some sparse sets.

The text was updated successfully, but these errors were encountered:

treeowl · 2019-07-02T20:00:15Z

That does sound like a serious benchmark weakness! I would conjecture that the low-hanging fruit for IntSet.fromList is the case where multiple values in a row belong in the same leaf, and that we might also be able to do better (for both sets and maps) by waiting to ascend to a parent until we need to. The general idea is to pass the rest of the list down the tree when inserting a value and return the tail of things that don't go there.

treeowl · 2019-07-02T23:39:33Z

It would probably be easier to make an attempt on IntMap. IntSet is similar, but with an extra layer of complexity.

treeowl · 2019-07-03T01:44:30Z

@jwaldmann I just opened #653 to try to be more clever about list conversions for IntMap. The benchmarks there suffer from the same limitations as the IntSet ones, but I also tried square numbers and either way I got serious improvements. I don't, however, trust my benchmarking skills, and I haven't tried any sort of pseudorandom number generation. Do you have the time to take a glance and see if you can duplicate my results? If that works as well as it seems, we should surely use the same approach for IntSet.

jwaldmann · 2019-07-03T06:27:54Z

"see if I can duplicate" Yes, I will look into it over the next days (not today).

jwaldmann · 2019-07-04T17:16:55Z

regarding benchmarks and their use of gauge, see vincenthz/hs-gauge#95

sjakobi added IntSet benchmarking labels Jul 15, 2020

sjakobi mentioned this issue Aug 14, 2020

improve benchmarks for Data.IntMap #657

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntSet.fromList and its benchmarks #652

IntSet.fromList and its benchmarks #652

jwaldmann commented Jul 2, 2019

treeowl commented Jul 2, 2019

treeowl commented Jul 2, 2019

treeowl commented Jul 3, 2019

jwaldmann commented Jul 3, 2019

jwaldmann commented Jul 4, 2019

IntSet.fromList and its benchmarks #652

IntSet.fromList and its benchmarks #652

Comments

jwaldmann commented Jul 2, 2019

treeowl commented Jul 2, 2019

treeowl commented Jul 2, 2019

treeowl commented Jul 3, 2019

jwaldmann commented Jul 3, 2019

jwaldmann commented Jul 4, 2019