Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntSet.fromList and its benchmarks #652

Open
jwaldmann opened this issue Jul 2, 2019 · 5 comments
Open

IntSet.fromList and its benchmarks #652

jwaldmann opened this issue Jul 2, 2019 · 5 comments

Comments

@jwaldmann
Copy link
Contributor

I experimented with IntSet.fromList via binary fold union, see jwaldmann@975781b
(cf. #330 but without computing runs)

I thought I had a huge improvement - but then found that this is due to the benchmarks: all the data is dense (contiguous numbers, even numbers, odd numbers). For these, my implementation cuts runtimes (of fromList) nearly in half. But for sparse data (square numbers, pseudo-random numbers) it does not.

On the other hand it does not increase runtime much so perhaps there's a way to make use of the idea. That's for later.

For now, I suggest that benchmarks be extended by some sparse sets.

@treeowl
Copy link
Contributor

treeowl commented Jul 2, 2019

That does sound like a serious benchmark weakness! I would conjecture that the low-hanging fruit for IntSet.fromList is the case where multiple values in a row belong in the same leaf, and that we might also be able to do better (for both sets and maps) by waiting to ascend to a parent until we need to. The general idea is to pass the rest of the list down the tree when inserting a value and return the tail of things that don't go there.

@treeowl
Copy link
Contributor

treeowl commented Jul 2, 2019

It would probably be easier to make an attempt on IntMap. IntSet is similar, but with an extra layer of complexity.

@treeowl
Copy link
Contributor

treeowl commented Jul 3, 2019

@jwaldmann I just opened #653 to try to be more clever about list conversions for IntMap. The benchmarks there suffer from the same limitations as the IntSet ones, but I also tried square numbers and either way I got serious improvements. I don't, however, trust my benchmarking skills, and I haven't tried any sort of pseudorandom number generation. Do you have the time to take a glance and see if you can duplicate my results? If that works as well as it seems, we should surely use the same approach for IntSet.

@jwaldmann
Copy link
Contributor Author

"see if I can duplicate" Yes, I will look into it over the next days (not today).

@jwaldmann
Copy link
Contributor Author

regarding benchmarks and their use of gauge, see vincenthz/hs-gauge#95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants