Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve fromAscList and friends. #658

Merged
merged 3 commits into from
Jul 15, 2019
Merged

Improve fromAscList and friends. #658

merged 3 commits into from
Jul 15, 2019

Conversation

int-e
Copy link
Contributor

@int-e int-e commented Jul 6, 2019

The relevant ticket is #654.

I have some supporting material, including more benchmarks, here:
https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6

@treeowl
Copy link
Contributor

treeowl commented Jul 6, 2019

fromMonoListWithKey won't inline, so fromAsclist for lazy maps will leave stupid thunks in the structure. You can restructure fromMonoListWithKey to inline it with a known f and then mark the exported functions NOINLINE.

@int-e
Copy link
Contributor Author

int-e commented Jul 6, 2019

fromMonoListWithKey won't inline, so fromAsclist for lazy maps will leave stupid thunks in the structure. You can restructure fromMonoListWithKey to inline it with a known f and then mark the exported functions NOINLINE.

You're probably right; I'll verify tomorrow. I forgot about the pitfall that functions that are used only once are still inlined, so my benchmark attempt for this scenario was flawed.

@int-e
Copy link
Contributor Author

int-e commented Jul 7, 2019

As expected, you were right about the extra thunks. Interestingly that was not a regression... the previous fromAscList also produced extra thunks.

I've sketched a test for such thunks in c88dc77... it's not yet properly split by module (Data.{,Int}Map{,.Strict}). I should probably split that out into a separate pull request, but I want to see if the current code passes the test for all relevant compilers first.

@treeowl
Copy link
Contributor

treeowl commented Jul 7, 2019

The approach I was suggesting was more like

fromMonoListWithKey f = \ xs0 -> go xs0
  where
    go [] = []
    ....

I'm not sure how that will compare with your way in the non-inlined case (fromAscListWithKey), but it should be the same for the inlined case fromAscList.

@int-e
Copy link
Contributor Author

int-e commented Jul 7, 2019

The approach I was suggesting was more like [...]

That's what I tried at first, but it didn't cure the creation of thunks by fromAscList. So I added the {-# INLINE #-} and realized that the auxiliary go was no longer required.

@int-e
Copy link
Contributor Author

int-e commented Jul 7, 2019

New benchmark result: for this particular benchmark inexplicably the code is 10% slower than before inlining fromMonoListWithKey now (inexplicably because the code path where f would matter is never taken.)

fromList                                 mean 242.6 μs  ( +- 994.5 ns  )
fromAscList                              mean 91.39 μs  ( +- 734.3 ns  )
fromDistinctAscList                      mean 91.29 μs  ( +- 473.5 ns  )

In contrast, running the code from https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfal2-hs the previous code was a tiny bit slower.

Note that none of these benchmarks actually uses duplicate keys. I cooked up a benchmark with duplicate keys using the following generator for keys:

    keys = map ((sk*) . (`div` 3) . (*2)) (if sz < 0 then [2*sz `div` 3.. -sz `div` 3] else [0..sz])

In that case, avoiding the thunks actually results in a speedup, especially for larger map sizes:

[1,1]/fromAscList                        mean 13.65 ns  ( +- 62.14 ps  )
[1,1]/fromAscList1a                      mean 12.36 ns  ( +- 53.82 ps  )
[10,1]/fromAscList                       mean 122.2 ns  ( +- 1.424 ns  )
[10,1]/fromAscList1a                     mean 102.1 ns  ( +- 693.7 ps  )
[-10,51791]/fromAscList                  mean 139.7 ns  ( +- 1.247 ns  )
[-10,51791]/fromAscList1a                mean 117.8 ns  ( +- 763.6 ps  )
[100,1]/fromAscList                      mean 1.304 μs  ( +- 29.38 ns  )
[100,1]/fromAscList1a                    mean 1.232 μs  ( +- 67.38 ns  )
[-100,51791]/fromAscList                 mean 1.402 μs  ( +- 49.40 ns  )
[-100,51791]/fromAscList1a               mean 1.316 μs  ( +- 57.71 ns  )
[1000,1]/fromAscList                     mean 13.90 μs  ( +- 173.5 ns  )
[1000,1]/fromAscList1a                   mean 12.29 μs  ( +- 187.5 ns  )
[-1000,51791]/fromAscList                mean 15.12 μs  ( +- 545.2 ns  )
[-1000,51791]/fromAscList1a              mean 13.69 μs  ( +- 650.7 ns  )
[10000,1]/fromAscList                    mean 208.6 μs  ( +- 1.979 μs  )
[10000,1]/fromAscList1a                  mean 155.3 μs  ( +- 5.096 μs  )
[-10000,51791]/fromAscList               mean 226.5 μs  ( +- 4.236 μs  )
[-10000,51791]/fromAscList1a             mean 167.3 μs  ( +- 3.117 μs  )
[100000,1]/fromAscList                   mean 8.410 ms  ( +- 204.8 μs  )
[100000,1]/fromAscList1a                 mean 6.218 ms  ( +- 203.4 μs  )
[-100000,51791]/fromAscList              mean 8.877 ms  ( +- 322.3 μs  )
[-100000,51791]/fromAscList1a            mean 6.312 ms  ( +- 228.8 μs  )

where fromAscList is the version without inlined fromMonoListWithKey and I've checked that fromAscList1a matches the speed of the inlined one.

@treeowl
Copy link
Contributor

treeowl commented Jul 7, 2019

I keep wondering if we can do a bit better. In particular, I gather that highestBitMask, and therefore branchMask, are relatively expensive as bitwise operations go. What if we didn't need to use so many? Suppose we track the Mask of a subtree as well as a key within. If we take our new key, xor it with the one in the subtree, and mask off the result using the subtree's mask, we should be able to .|. that with its negation to get a new Mask (probably throwing in a constant shift somewhere).

@treeowl treeowl closed this Jul 7, 2019
@treeowl treeowl reopened this Jul 7, 2019
@treeowl
Copy link
Contributor

treeowl commented Jul 7, 2019

Sorry about that; accidentally hit "close and comment" when I wasn't even finished writing my non-closing comment....

@treeowl
Copy link
Contributor

treeowl commented Jul 7, 2019

But also I realize my last substantive comment was all wrong..... Hrm.... Never mind.

@int-e
Copy link
Contributor Author

int-e commented Jul 7, 2019

I suppose the trick from #653 (comment) is applicable here as well.

@int-e
Copy link
Contributor Author

int-e commented Jul 8, 2019

I suppose the trick from #653 (comment) is applicable here as well.

While applicable, it doesn't seem to pay off here. Selected benchmarks... other sizes show a small slowdown as well: (the new versions are fromAscList2 and fromAscList2a)

[10000,1]/fromAscList1a                  mean 236.0 μs  ( +- 1.631 μs  )
[10000,1]/fromAscList2                   mean 268.3 μs  ( +- 22.73 μs  )
[10000,1]/fromAscList2a                  mean 266.5 μs  ( +- 21.94 μs  )

[-10000,51791]/fromAscList1a             mean 255.4 μs  ( +- 5.849 μs  )
[-10000,51791]/fromAscList2              mean 309.0 μs  ( +- 22.05 μs  )
[-10000,51791]/fromAscList2a             mean 293.4 μs  ( +- 21.61 μs  )

The code is not pretty either:

fromAscList2a :: [(Key,a)] -> IntMap a
fromAscList2a []              = Nil
fromAscList2a ((kx,vx) : zs1) = addAll' kx vx zs1
  where
    -- `addAll'` collects all keys equal to `kx` into a single value,
    -- and then proceeds with `addAll`.
    addAll' !kx vx [] = Tip kx vx
    addAll' !kx vx ((ky,vy) : zs)
        | kx == ky
        = addAll' ky vy zs
        | m <- branchMask ky kx
        = case addMany' m ky vy zs of
            InsertedNil  ty           -> link ky ty kx (Tip kx vx)
            InsertedCons ty kz vz zs' -> addAll kx (link ky ty kx (Tip kx vx)) kz vz zs'

    -- for `addAll` and `addMany`, kx is /a/ key inside the tree `tx`
    -- `addAll` consumes the rest of the list, adding to the tree `tx`
    addAll !kx !tx !ky vy zs
        | m <- branchMask ky kx
        = case addMany' m ky vy zs of
            InsertedNil  ty           -> link ky ty kx tx
            InsertedCons ty kz vz zs' -> addAll kx (link ky ty kx tx) kz vz zs'

    -- `addMany'` is similar to `addAll'`, but proceeds with `addMany'`.
    addMany' !m !kx vx [] = InsertedNil (Tip kx vx)
    addMany' !m !kx vx ((ky,vy) : zs)
        | kx == ky
        = addMany' m ky vy zs
        | mask kx m /= mask ky m
        = InsertedCons (Tip kx vx) ky vy zs
        | otherwise
        = case addMany' (branchMask ky kx) ky vy zs of
            InsertedNil  ty           -> InsertedNil (link ky ty kx (Tip kx vx))
            InsertedCons ty kz vz zs' -> addMany m kx (link ky ty kx (Tip kx vx)) kz vz zs'

    -- `addAll` adds to `tx` all keys whose prefix w.r.t. `m` agrees with `kx`.
    addMany !m !kx tx !ky vy zs
        | mask kx m /= mask ky m
        = InsertedCons tx ky vy zs
        | otherwise
        = case addMany' (branchMask ky kx) ky vy zs of
            InsertedNil  ty           -> InsertedNil (link ky ty kx tx)
            InsertedCons ty kz vz zs' -> addMany m kx (link ky ty kx tx) kz vz zs'

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

@treeowl
Copy link
Contributor

treeowl commented Jul 8, 2019

Even if there's no reliably detectable advantage to the CSEed version, it should at least produce slightly smaller code, which is better. I'm a bit mystified about why the fancy Inserted representation would help there and not here.

@int-e
Copy link
Contributor Author

int-e commented Jul 8, 2019

Last action for tonight... I've pushed the CSE changes. (I'm making separate patches for each data structure because in the end I want to squash everything into the first three commits that I pushed.)

I'm also mystified by what exactly the benefits and drawbacks of the Inserted' trick are.

@treeowl
Copy link
Contributor

treeowl commented Jul 8, 2019 via email

@int-e int-e mentioned this pull request Jul 8, 2019
@treeowl
Copy link
Contributor

treeowl commented Jul 14, 2019

Do you consider this ready to merge? If so, could you squash whatever commits you want?

benchmark summary:

old:
fromList                                 mean 86.33 μs  ( +- 557.8 ns  )
fromAscList                              mean 42.01 μs  ( +- 154.4 ns  )
fromDistinctAscList                      mean 15.74 μs  ( +- 90.03 ns  )

new:
fromList                                 mean 83.00 μs  ( +- 1.147 μs  )
fromAscList                              mean 14.40 μs  ( +- 367.6 ns  )
fromDistinctAscList                      mean 14.56 μs  ( +- 486.9 ns  )
@int-e
Copy link
Contributor Author

int-e commented Jul 14, 2019

Do you consider this ready to merge? If so, could you squash whatever commits you want?

I think it's ready to go.

@treeowl
Copy link
Contributor

treeowl commented Jul 14, 2019

One more request: could you use a custom 2-value type instead of Bool to indicate the distinctness requirement?

benchmark summary:

old:
fromList                                 mean 244.4 μs  ( +- 7.088 μs  )
fromAscList                              mean 178.6 μs  ( +- 1.211 μs  )
fromDistinctAscList                      mean 105.5 μs  ( +- 1.048 μs  )

new:
fromList                                 mean 225.2 μs  ( +- 2.345 μs  )
fromAscList                              mean 84.71 μs  ( +- 1.011 μs  )
fromDistinctAscList                      mean 84.20 μs  ( +- 945.0 ns  )
- no benchmarks, but the code is analogous to Data.IntMap.from*AscList*
@int-e
Copy link
Contributor Author

int-e commented Jul 15, 2019

One more request: could you use a custom 2-value type instead of Bool to indicate the distinctness requirement?

Done. I struggled with the naming for a bit; in the end I went with data Distinct = Distinct | Nondistinct, but I don't particularily like it.

@treeowl
Copy link
Contributor

treeowl commented Jul 15, 2019

It's not the best, but we can fix it later. At least it says something!

@treeowl treeowl merged commit 77c8e5f into haskell:master Jul 15, 2019
@treeowl
Copy link
Contributor

treeowl commented Jul 15, 2019

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants