Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make union*, difference*, and intersect* run linearly by sorting elements first #203

Closed
wants to merge 8 commits into from
Closed

Conversation

JordanMartinez
Copy link
Contributor

@JordanMartinez JordanMartinez commented Dec 31, 2020

Fixes #192

@JordanMartinez JordanMartinez changed the title Implement differenceBy Make difference and intersectBy run linearly by sorting elements first Dec 31, 2020
@JordanMartinez
Copy link
Contributor Author

I've updated the intersectionBy and difference/differenceBy to work and pass CI. I'm still not sure how union differs from alt/<|>, so I'm going to wait until that question is clarified.

@JordanMartinez JordanMartinez changed the title Make difference and intersectBy run linearly by sorting elements first Make union*, difference*, and intersect* run linearly by sorting elements first Dec 31, 2020
@JordanMartinez
Copy link
Contributor Author

This PR is ready for review

@hdgarrood
Copy link
Contributor

I haven't reviewed properly yet but just so you know, sorting based on a comparison function which only compares two elements at a time is always at least O(n log n), which is beyond linear. Not by that much, admittedly, since log n grows very slowly, but it's incorrect to say that it is linear if we're sorting.

@JordanMartinez
Copy link
Contributor Author

Hmm..... Besides the eq to compare change and the lack of STFn, is there anything else that might make this benchmark misleading? Everything is slower by quite a bit.

Assume the following values for each benchmark below

let 
  shortNats = Array.range 0 100
  longNats = Array.range 0 10000
  mod3Eq x y = (x `mod` 3) == (y `mod` 3)
  mod3Cmp x y = compare (x `mod` 3) (y `mod` 3)
  
-- `master` uses `mod3Eq` whereas this PR uses `mod3Cmp`

unionBy

benchUnionBy = do
    log $ "unionBy (" <> show (Array.length shortNats) <> ")"
    benchWith 1000 \_ -> Array.unionBy mod3Eq shortNats shortNats

    log $ "unionBy (" <> show (Array.length longNats) <> ")"
    benchWith 100 \_ -> Array.unionBy mod3Eq longNats longNats
unionBy -- master branch
---------------
unionBy (101)
mean   = 60.73 μs
stddev = 81.23 μs
min    = 34.67 μs
max    = 1.36 ms
unionBy (10001)
mean   = 3.40 ms
stddev = 1.56 ms
min    = 2.98 ms
max    = 18.20 ms

unionBy -- this PR
---------------
unionBy (101)
mean   = 208.75 μs
stddev = 197.98 μs
min    = 162.88 μs
max    = 3.23 ms
unionBy (10001)
mean   = 32.38 ms
stddev = 6.30 ms
min    = 28.67 ms
max    = 73.98 ms

intersectBy

  benchIntersectBy = do
    log $ "intersectBy (" <> show (Array.length shortNats) <> ")"
    benchWith 1000 \_ -> Array.intersectBy mod3Eq shortNats shortNats

    log $ "intersectBy (" <> show (Array.length longNats) <> ")"
    benchWith 100 \_ -> Array.intersectBy mod3Eq longNats longNats
intersectBy -- current master
---------------
intersectBy (101)
mean   = 16.93 μs
stddev = 39.69 μs
min    = 7.69 μs
max    = 789.20 μs
intersectBy (10001)
mean   = 760.82 μs
stddev = 549.92 μs
min    = 626.33 μs
max    = 6.13 ms

intersectBy -- this PR
---------------
intersectBy (101)
mean   = 205.14 μs
stddev = 150.04 μs
min    = 158.94 μs
max    = 2.21 ms
intersectBy (10001)
mean   = 31.81 ms
stddev = 5.34 ms
min    = 28.55 ms
max    = 62.24 ms

difference

  benchDifference = do
    log $ "difference (" <> show (Array.length shortNats) <> ")"
    benchWith 1000 \_ -> Array.difference shortNats shortNats

    log $ "difference (" <> show (Array.length longNats) <> ")"
    benchWith 100 \_ -> Array.difference longNats longNats
difference -- current master
---------------
difference (101)
mean   = 79.57 μs
stddev = 48.39 μs
min    = 63.89 μs
max    = 742.35 μs
difference (10001)
mean   = 492.35 ms
stddev = 24.31 ms
min    = 463.04 ms
max    = 594.56 ms

difference -- this PR
---------------
difference (101)
mean   = 142.91 μs
stddev = 136.28 μs
min    = 106.35 μs
max    = 1.73 ms
difference (10001)
mean   = 21.30 ms
stddev = 3.75 ms
min    = 18.21 ms
max    = 44.52 ms

Comment on lines 1209 to 1210
-- combineIndex compare [0] [0, 1]
-- == [t3 true 0 0, t3 false 1 0, t3 false 2 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- combineIndex compare [0] [0, 1]
-- == [t3 true 0 0, t3 false 1 0, t3 false 2 1]
-- combineIndex compare [7] [9, 5]
-- == [t3 true 0 7, t3 false 1 9, t3 false 2 5]

Easier to follow example with values that are distinct from indices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then there aren't values that exist in both arrays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since combineIndex doesn't care about duplicates, I don't think we need the example to involve duplicates.
This would be like saying that [1,2] <> [3,4] == [1,2,3,4] is a bad example for append because it doesn't clarify that duplicates are preserved.

Could compromise with something like [7] [7, 5].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since combineIndex doesn't care about duplicates, I don't think we need the example to involve duplicates.

🤦‍♂️ You're right! I forgot that this function doesn't need to care unlike the ones I'm working on. I'll make that fix.

-- ```
combineIndex :: forall a. Array a -> Array a -> Array (Tuple Boolean (Tuple Int a))
combineIndex left right = ST.run do
out <- STA.new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any benefits to preallocating the array to a known size? Tried to find a clear answer for this in JS. And if there is a benefit that we want to take advantage of, we'd need to expand the Array.ST API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've wondered about that myself...

src/Data/Array.purs Outdated Show resolved Hide resolved
let rightLen = length right
ST.for 0 rightLen \idx -> do
let val = unsafePartial $ unsafeIndex right idx
void $ STA.push (Tuple false (Tuple (leftLen + idx) val)) out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any performance penalties to using a record instead of a tuple? Luckily, this tuple happens to be type-safe because of the distinct types (Boolean, Int, a).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I also tried it out with data Tuple3 a b c = Tuple3 a b c and that sped things up only slightly.

unionBy :: forall a. (a -> a -> Boolean) -> Array a -> Array a -> Array a
unionBy eq xs ys = xs <> foldl (flip (deleteBy eq)) (nubByEq eq ys) xs
unionBy :: forall a. (a -> a -> Ordering) -> Array a -> Array a -> Array a
unionBy cmp left right = map snd $ ST.run do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more efficient to take advantage of the of the fact that the output is always the first array plus some other stuff appended to it. Here's some non-ST pseudo-ish code describing that option:

union left right = left <> other where
  other =
    combineIndex right left -- intentionally putting right first and assuming stable sort
    # sortBy valueFunc
    # groupBy valueFunc
    # map head
    # filter keepRightOnly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the runtime cost of this approach versus mine?

I made the rather stupid assumption that mine would be n+m because when Harry said the current version was n*m, I thought he was implying we should make the code linear. I didn't actually analyze my code using Big O notation to see how many steps it takes.

Copy link
Contributor

@milesfrain milesfrain Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all reasonable approaches are O(n log n) (where n is the sum of array sizes). But benchmarking could still reveal a 3x speedup with a different approach, which would still be O(n log n). I'll play around with this and report back with findings. The choice of input data could also significantly change the relative performance of different algorithms.

@hdgarrood
Copy link
Contributor

I think these code changes should wait until we have reached a consensus on how these functions should actually work - there are still unknowns to do with things the handling of duplicates. I’m not sure how much it helps to change union to use Ord now if we plan to change its behaviour with respect to duplicates later, as that later change would still be breaking even if the type is the same.

@JordanMartinez
Copy link
Contributor Author

So, will this change be merged before v0.14.0? Or should we discuss it further and include it in v0.15.0?

@JordanMartinez
Copy link
Contributor Author

Minor optimization by only working with indices and copying values only once to the final array.

@hdgarrood
Copy link
Contributor

I’m tempted to leave it for 0.15.0 personally.

@JordanMartinez
Copy link
Contributor Author

I think this shouldn't be implemented until we have STFn. At the very least, the performance hit doesn't make it worth it in my mind. If one wanted this behavior, they could implement it locally in their project using this PR as a basis for what to do (and likely they would use FFI to get around the performance issue).

@JordanMartinez
Copy link
Contributor Author

I'm going to mark arrays as done in the core libs tracking issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Ord by default for all set-like operations
3 participants