Improve fromAscList and friends. #658

int-e · 2019-07-06T19:04:53Z

The relevant ticket is #654.

I have some supporting material, including more benchmarks, here:
https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6

treeowl · 2019-07-06T19:22:11Z

fromMonoListWithKey won't inline, so fromAsclist for lazy maps will leave stupid thunks in the structure. You can restructure fromMonoListWithKey to inline it with a known f and then mark the exported functions NOINLINE.

int-e · 2019-07-06T19:39:49Z

fromMonoListWithKey won't inline, so fromAsclist for lazy maps will leave stupid thunks in the structure. You can restructure fromMonoListWithKey to inline it with a known f and then mark the exported functions NOINLINE.

You're probably right; I'll verify tomorrow. I forgot about the pitfall that functions that are used only once are still inlined, so my benchmark attempt for this scenario was flawed.

int-e · 2019-07-07T12:09:40Z

As expected, you were right about the extra thunks. Interestingly that was not a regression... the previous fromAscList also produced extra thunks.

I've sketched a test for such thunks in c88dc77... it's not yet properly split by module (Data.{,Int}Map{,.Strict}). I should probably split that out into a separate pull request, but I want to see if the current code passes the test for all relevant compilers first.

treeowl · 2019-07-07T15:27:49Z

The approach I was suggesting was more like

fromMonoListWithKey f = \ xs0 -> go xs0
  where
    go [] = []
    ....

I'm not sure how that will compare with your way in the non-inlined case (fromAscListWithKey), but it should be the same for the inlined case fromAscList.

int-e · 2019-07-07T15:53:59Z

The approach I was suggesting was more like [...]

That's what I tried at first, but it didn't cure the creation of thunks by fromAscList. So I added the {-# INLINE #-} and realized that the auxiliary go was no longer required.

int-e · 2019-07-07T16:37:45Z

New benchmark result: for this particular benchmark inexplicably the code is 10% slower than before inlining fromMonoListWithKey now (inexplicably because the code path where f would matter is never taken.)

fromList                                 mean 242.6 μs  ( +- 994.5 ns  )
fromAscList                              mean 91.39 μs  ( +- 734.3 ns  )
fromDistinctAscList                      mean 91.29 μs  ( +- 473.5 ns  )

In contrast, running the code from https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfal2-hs the previous code was a tiny bit slower.

Note that none of these benchmarks actually uses duplicate keys. I cooked up a benchmark with duplicate keys using the following generator for keys:

    keys = map ((sk*) . (`div` 3) . (*2)) (if sz < 0 then [2*sz `div` 3.. -sz `div` 3] else [0..sz])

In that case, avoiding the thunks actually results in a speedup, especially for larger map sizes:

[1,1]/fromAscList                        mean 13.65 ns  ( +- 62.14 ps  )
[1,1]/fromAscList1a                      mean 12.36 ns  ( +- 53.82 ps  )
[10,1]/fromAscList                       mean 122.2 ns  ( +- 1.424 ns  )
[10,1]/fromAscList1a                     mean 102.1 ns  ( +- 693.7 ps  )
[-10,51791]/fromAscList                  mean 139.7 ns  ( +- 1.247 ns  )
[-10,51791]/fromAscList1a                mean 117.8 ns  ( +- 763.6 ps  )
[100,1]/fromAscList                      mean 1.304 μs  ( +- 29.38 ns  )
[100,1]/fromAscList1a                    mean 1.232 μs  ( +- 67.38 ns  )
[-100,51791]/fromAscList                 mean 1.402 μs  ( +- 49.40 ns  )
[-100,51791]/fromAscList1a               mean 1.316 μs  ( +- 57.71 ns  )
[1000,1]/fromAscList                     mean 13.90 μs  ( +- 173.5 ns  )
[1000,1]/fromAscList1a                   mean 12.29 μs  ( +- 187.5 ns  )
[-1000,51791]/fromAscList                mean 15.12 μs  ( +- 545.2 ns  )
[-1000,51791]/fromAscList1a              mean 13.69 μs  ( +- 650.7 ns  )
[10000,1]/fromAscList                    mean 208.6 μs  ( +- 1.979 μs  )
[10000,1]/fromAscList1a                  mean 155.3 μs  ( +- 5.096 μs  )
[-10000,51791]/fromAscList               mean 226.5 μs  ( +- 4.236 μs  )
[-10000,51791]/fromAscList1a             mean 167.3 μs  ( +- 3.117 μs  )
[100000,1]/fromAscList                   mean 8.410 ms  ( +- 204.8 μs  )
[100000,1]/fromAscList1a                 mean 6.218 ms  ( +- 203.4 μs  )
[-100000,51791]/fromAscList              mean 8.877 ms  ( +- 322.3 μs  )
[-100000,51791]/fromAscList1a            mean 6.312 ms  ( +- 228.8 μs  )

where fromAscList is the version without inlined fromMonoListWithKey and I've checked that fromAscList1a matches the speed of the inlined one.

containers/src/Data/IntMap/Internal.hs

treeowl · 2019-07-07T23:30:25Z

I keep wondering if we can do a bit better. In particular, I gather that highestBitMask, and therefore branchMask, are relatively expensive as bitwise operations go. What if we didn't need to use so many? Suppose we track the Mask of a subtree as well as a key within. If we take our new key, xor it with the one in the subtree, and mask off the result using the subtree's mask, we should be able to .|. that with its negation to get a new Mask (probably throwing in a constant shift somewhere).

treeowl · 2019-07-07T23:33:23Z

Sorry about that; accidentally hit "close and comment" when I wasn't even finished writing my non-closing comment....

treeowl · 2019-07-07T23:36:44Z

But also I realize my last substantive comment was all wrong..... Hrm.... Never mind.

int-e · 2019-07-07T23:45:47Z

I suppose the trick from #653 (comment) is applicable here as well.

int-e · 2019-07-08T01:20:36Z

I suppose the trick from #653 (comment) is applicable here as well.

While applicable, it doesn't seem to pay off here. Selected benchmarks... other sizes show a small slowdown as well: (the new versions are fromAscList2 and fromAscList2a)

[10000,1]/fromAscList1a                  mean 236.0 μs  ( +- 1.631 μs  )
[10000,1]/fromAscList2                   mean 268.3 μs  ( +- 22.73 μs  )
[10000,1]/fromAscList2a                  mean 266.5 μs  ( +- 21.94 μs  )

[-10000,51791]/fromAscList1a             mean 255.4 μs  ( +- 5.849 μs  )
[-10000,51791]/fromAscList2              mean 309.0 μs  ( +- 22.05 μs  )
[-10000,51791]/fromAscList2a             mean 293.4 μs  ( +- 21.61 μs  )

The code is not pretty either:

fromAscList2a :: [(Key,a)] -> IntMap a
fromAscList2a []              = Nil
fromAscList2a ((kx,vx) : zs1) = addAll' kx vx zs1
  where
    -- `addAll'` collects all keys equal to `kx` into a single value,
    -- and then proceeds with `addAll`.
    addAll' !kx vx [] = Tip kx vx
    addAll' !kx vx ((ky,vy) : zs)
        | kx == ky
        = addAll' ky vy zs
        | m <- branchMask ky kx
        = case addMany' m ky vy zs of
            InsertedNil  ty           -> link ky ty kx (Tip kx vx)
            InsertedCons ty kz vz zs' -> addAll kx (link ky ty kx (Tip kx vx)) kz vz zs'

    -- for `addAll` and `addMany`, kx is /a/ key inside the tree `tx`
    -- `addAll` consumes the rest of the list, adding to the tree `tx`
    addAll !kx !tx !ky vy zs
        | m <- branchMask ky kx
        = case addMany' m ky vy zs of
            InsertedNil  ty           -> link ky ty kx tx
            InsertedCons ty kz vz zs' -> addAll kx (link ky ty kx tx) kz vz zs'

    -- `addMany'` is similar to `addAll'`, but proceeds with `addMany'`.
    addMany' !m !kx vx [] = InsertedNil (Tip kx vx)
    addMany' !m !kx vx ((ky,vy) : zs)
        | kx == ky
        = addMany' m ky vy zs
        | mask kx m /= mask ky m
        = InsertedCons (Tip kx vx) ky vy zs
        | otherwise
        = case addMany' (branchMask ky kx) ky vy zs of
            InsertedNil  ty           -> InsertedNil (link ky ty kx (Tip kx vx))
            InsertedCons ty kz vz zs' -> addMany m kx (link ky ty kx (Tip kx vx)) kz vz zs'

    -- `addAll` adds to `tx` all keys whose prefix w.r.t. `m` agrees with `kx`.
    addMany !m !kx tx !ky vy zs
        | mask kx m /= mask ky m
        = InsertedCons tx ky vy zs
        | otherwise
        = case addMany' (branchMask ky kx) ky vy zs of
            InsertedNil  ty           -> InsertedNil (link ky ty kx tx)
            InsertedCons ty kz vz zs' -> addMany m kx (link ky ty kx tx) kz vz zs'

data Inserted' a
   = InsertedNil  !(IntMap a)
   | InsertedCons !(IntMap a) !Key a ![(Key,a)]

treeowl · 2019-07-08T02:11:56Z

Even if there's no reliably detectable advantage to the CSEed version, it should at least produce slightly smaller code, which is better. I'm a bit mystified about why the fancy Inserted representation would help there and not here.

int-e · 2019-07-08T02:17:44Z

Last action for tonight... I've pushed the CSE changes. (I'm making separate patches for each data structure because in the end I want to squash everything into the first three commits that I pushed.)

I'm also mystified by what exactly the benefits and drawbacks of the Inserted' trick are.

containers/src/Data/IntMap/Internal.hs

containers/src/Data/IntMap/Strict/Internal.hs

treeowl · 2019-07-08T17:43:42Z

Yeah, the semantics are weird all right. But I think that needs to be a separate discussion.

…

On Mon, Jul 8, 2019, 1:11 PM Bertram Felgenhauer ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In containers/src/Data/IntMap/Strict/Internal.hs <#658 (comment)>: > + | m <- branchMask kx ky + , Inserted ty zs' <- addMany' m ky vy zs + = addAll kx (linkWithMask m ky ty kx (Tip kx vx)) zs' + + -- for `addAll` and `addMany`, kx is /a/ key inside the tree `tx` + -- `addAll` consumes the rest of the list, adding to the tree `tx` + addAll !kx !tx [] + = tx + addAll !kx !tx ((ky,!vy) : zs) + | m <- branchMask kx ky + , Inserted ty zs' <- addMany' m ky vy zs + = addAll kx (linkWithMask m ky ty kx tx) zs' + + -- `addMany'` is similar to `addAll'`, but proceeds with `addMany'`. + addMany' !m !kx vx [] = Inserted (Tip kx vx) [] + addMany' !m !kx vx zs0@((ky,!vy) : zs) Good point. I was working under the assumption that all values are supposed to be forced, but indeed we shouldn't make the function more strict than it currently is without a very good reason. For your amusement (the results may look a bit surprising, but given that fromList{,With} works by inserting values into the map as it goes, the bottoms are unavoidable: (using containers-0.6.2.1) Prelude Data.IntMap.Strict> fromList [(1,undefined),(1,2)] `seq` () *** Exception: Prelude.undefined Prelude Data.IntMap.Strict> fromAscList [(1,undefined),(1,2)] `seq` () () Prelude Data.IntMap.Strict> fromListWith (\_ _ -> 2) [(1,undefined),(1,1)] `seq` () *** Exception: Prelude.undefined Prelude Data.IntMap.Strict> fromListWith (\_ _ -> 2) [(1,1),(1,undefined)] `seq` () () Prelude Data.IntMap.Strict> fromAscListWith (\_ _ -> 2) [(1,undefined),(1,1)] `seq` () () Prelude Data.IntMap.Strict> fromAscListWith (\_ _ -> 2) [(1,1),(1,undefined)] `seq` () () — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#658?email_source=notifications&email_token=AAOOF7MNNBO7XN5MNTQWWULP6NYNNA5CNFSM4H6UCBJ2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB5YKUCA#discussion_r301206199>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAOOF7J22G2XW7QZXEZ7SU3P6NYNNANCNFSM4H6UCBJQ> .

containers/src/Data/IntSet/Internal.hs

containers/src/Data/IntMap/Internal.hs

treeowl · 2019-07-14T20:38:29Z

Do you consider this ready to merge? If so, could you squash whatever commits you want?

benchmark summary: old: fromList mean 86.33 μs ( +- 557.8 ns ) fromAscList mean 42.01 μs ( +- 154.4 ns ) fromDistinctAscList mean 15.74 μs ( +- 90.03 ns ) new: fromList mean 83.00 μs ( +- 1.147 μs ) fromAscList mean 14.40 μs ( +- 367.6 ns ) fromDistinctAscList mean 14.56 μs ( +- 486.9 ns )

int-e · 2019-07-14T21:43:13Z

Do you consider this ready to merge? If so, could you squash whatever commits you want?

I think it's ready to go.

treeowl · 2019-07-14T22:40:50Z

One more request: could you use a custom 2-value type instead of Bool to indicate the distinctness requirement?

benchmark summary: old: fromList mean 244.4 μs ( +- 7.088 μs ) fromAscList mean 178.6 μs ( +- 1.211 μs ) fromDistinctAscList mean 105.5 μs ( +- 1.048 μs ) new: fromList mean 225.2 μs ( +- 2.345 μs ) fromAscList mean 84.71 μs ( +- 1.011 μs ) fromDistinctAscList mean 84.20 μs ( +- 945.0 ns )

- no benchmarks, but the code is analogous to Data.IntMap.from*AscList*

int-e · 2019-07-15T00:21:51Z

One more request: could you use a custom 2-value type instead of Bool to indicate the distinctness requirement?

Done. I struggled with the naming for a bit; in the end I went with data Distinct = Distinct | Nondistinct, but I don't particularily like it.

treeowl · 2019-07-15T00:27:04Z

It's not the best, but we can fix it later. At least it says something!

treeowl · 2019-07-15T00:28:26Z

Many thanks!

int-e mentioned this pull request Jul 6, 2019

Can we improve fromDistinctAscList for IntSet and IntMap #654

Open

treeowl reviewed Jul 7, 2019

View reviewed changes

containers/src/Data/IntMap/Internal.hs Show resolved Hide resolved

containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved

treeowl closed this Jul 7, 2019

treeowl reopened this Jul 7, 2019

int-e mentioned this pull request Jul 8, 2019

Speed up fromList for IntMap #653

Draft

treeowl reviewed Jul 8, 2019

View reviewed changes

containers/src/Data/IntMap/Internal.hs Show resolved Hide resolved

treeowl reviewed Jul 8, 2019

View reviewed changes

containers/src/Data/IntMap/Strict/Internal.hs Outdated Show resolved Hide resolved

int-e mentioned this pull request Jul 8, 2019

add fromList tests #663

Merged

int-e commented Jul 11, 2019

View reviewed changes

containers/src/Data/IntSet/Internal.hs Show resolved Hide resolved

containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved

int-e commented Jul 11, 2019

View reviewed changes

containers/src/Data/IntMap/Internal.hs Outdated Show resolved Hide resolved

int-e added 2 commits July 15, 2019 01:58

improve Data.IntMap.Strict.from*AscList* (#654)

48c1ca0

- no benchmarks, but the code is analogous to Data.IntMap.from*AscList*

treeowl merged commit 77c8e5f into haskell:master Jul 15, 2019

int-e deleted the intmap-fromasclist branch July 15, 2019 00:42

treeowl mentioned this pull request Dec 30, 2019

Cleanup after IntMap rewrite #698

Open

26 tasks

int-e mentioned this pull request Apr 18, 2021

Settle on a consistent strictness for fromList and fromAscList #473

Closed

meooow25 mentioned this pull request May 29, 2023

Fusible Set.fromDistinctAscList definition #949

Closed

meooow25 mentioned this pull request Jun 18, 2023

Fusible IntSet.fromDistinctAscList definition #951

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve fromAscList and friends. #658

Improve fromAscList and friends. #658

int-e commented Jul 6, 2019

treeowl commented Jul 6, 2019

int-e commented Jul 6, 2019

int-e commented Jul 7, 2019 •

edited

Loading

treeowl commented Jul 7, 2019

int-e commented Jul 7, 2019

int-e commented Jul 7, 2019

treeowl commented Jul 7, 2019 •

edited

Loading

treeowl commented Jul 7, 2019

treeowl commented Jul 7, 2019

int-e commented Jul 7, 2019

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019 via email

treeowl commented Jul 14, 2019

int-e commented Jul 14, 2019

treeowl commented Jul 14, 2019

int-e commented Jul 15, 2019

treeowl commented Jul 15, 2019

treeowl commented Jul 15, 2019

Improve fromAscList and friends. #658

Improve fromAscList and friends. #658

Conversation

int-e commented Jul 6, 2019

treeowl commented Jul 6, 2019

int-e commented Jul 6, 2019

int-e commented Jul 7, 2019 • edited Loading

treeowl commented Jul 7, 2019

int-e commented Jul 7, 2019

int-e commented Jul 7, 2019

treeowl commented Jul 7, 2019 • edited Loading

treeowl commented Jul 7, 2019

treeowl commented Jul 7, 2019

int-e commented Jul 7, 2019

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019

int-e commented Jul 8, 2019

treeowl commented Jul 8, 2019 via email

treeowl commented Jul 14, 2019

int-e commented Jul 14, 2019

treeowl commented Jul 14, 2019

int-e commented Jul 15, 2019

treeowl commented Jul 15, 2019

treeowl commented Jul 15, 2019

int-e commented Jul 7, 2019 •

edited

Loading

treeowl commented Jul 7, 2019 •

edited

Loading