Use Ord by default for all set-like operations #192

hdgarrood · 2020-12-18T05:44:42Z

We already made nub use Ord by default in #91, but there are a few other unnecessarily quadratic functions in this library still which probably deserve the same treatment. I think these are union, intersect, and difference.

As well as being quadratic, union and difference have a strange-seeming behaviour where duplicates are only preserved in the first argument:

union [0, 0] [1, 1] == [0, 0, 1]

See #56 for more info. We ought to be careful about changing this behaviour but I think we should consider it, since I think it’s very likely to trip people up and we will be making breaking changes to these functions anyway if we do this.

The text was updated successfully, but these errors were encountered:

hdgarrood · 2020-12-18T06:25:27Z

If we go ahead with this, the same changes should propagate through to the lists library too.

milesfrain · 2020-12-18T19:52:06Z

duplicates are only preserved in the first argument

Yes, this seems really strange, even if it does match how things are done in Haskell. I'm curious if any projects rely on this asymmetric duplicate-preserving behavior.

If we match Set behavior and remove all duplicates, then we get simpler code (just append then nub), and we can do property-based testing against Set.

Another option is to thin-out these libraries a bit and recommend using Set instead. Potential downsides are inconvenience, loss of ordering, and worse performance (would be interesting to benchmark). And we'd still need the Ord-free versions, since those aren't Set-compatible.

Perhaps we should open a discourse thread to discuss these proposals, since this affects multiple libraries.

hdgarrood · 2020-12-19T09:11:42Z

The main reason I don't really want to just deprecate these functions and tell people to just use sets instead is that arrays have ordering, whereas sets don't, which means that the set-like Array/List operations can preserve ordering, and this might be useful in some cases.

JordanMartinez · 2020-12-30T18:50:13Z

This is the last issue in this repo that doesn't currently have a PR opened for it.

milesfrain · 2020-12-30T20:28:14Z

I'm thinking we should table all these container (List, Array, etc.) updates for 0.15.
I think it would be more efficient to do a systematic update of all our container libraries, where we start with a proposal of what we want our final result to be, rather than through a bunch of piecemeal PRs.

JordanMartinez · 2020-12-31T20:34:52Z

A few questions I have.

First, how is union different from <|>?

Second, in the following examples of difference/\\, what should the output be?

[1, 2, 3, 4, 3, 2] \\ [1]
-- I assume [2, 3, 4, 3, 2]

[1, 2, 3, 4, 3, 2] \\ [2]
[1, 2, 3, 4, 3, 2] \\ [2, 2]
-- It depends on whether each 2 in the right array
-- removes only one or all 2 element(s) in the left array

milesfrain · 2020-12-31T22:23:07Z

how is union different from <|>?

union does a set-like union of the two arrays. <|> is the same as append/<>.
union keeps any duplicates that already exist in the first array and removes duplicates from the second array. Keeping these duplicates seems like the lesser-of-evils, since you're forced to break one of these rules when applying set operations on non-sets:

a union b == b union a  -- we break this rule
a union empty == a      -- we follow this rule
a union a == a          -- we follow this rule
a union (a <> a) == a   -- we follow this rule

I think difference behaves reasonably for arrays too, but it's still somewhat of an imperfect set-like operation.

Example input/output:

  logShow $ union [1,2,3,3] [2,2,4,4]
  logShow $ [1,2,3,3] <|> [2,2,4,4]
  logShow $ [1,2,3,3] <> [2,2,4,4]
  logShow $ [1, 2, 3, 4, 3, 2] \\ [1]
  logShow $ [1, 2, 3, 4, 3, 2] \\ [2]
  logShow $ [1, 2, 3, 4, 3, 2] \\ [2, 2]

[1,2,3,3,4]
[1,2,3,3,2,2,4,4]
[1,2,3,3,2,2,4,4]
[2,3,4,3,2]
[1,3,4,3,2]
[1,3,4,3]

https://try.ps.ai/?gist=aea5bdaf7669d2fd00c8d80cecc01b38

milesfrain · 2020-12-31T22:31:59Z

The main reason I don't really want to just deprecate these functions and tell people to just use sets instead is that arrays have ordering, whereas sets don't, which means that the set-like Array/List operations can preserve ordering, and this might be useful in some cases.

I think an ordered set would be even better. http://hackage.haskell.org/package/ordered-containers-0.2.2/docs/Data-Set-Ordered.html
But no objections to keeping the set-like operations for arrays.

JordanMartinez · 2020-12-31T22:40:20Z

ordered set would be even better.

Isn't better subjective to the problem one is solving?

milesfrain · 2020-12-31T23:07:02Z

Better if we assume these requirements:

maintain ordering
not having any weirdness with set-like operations

Of course, there can be other requirements, but I didn't see anything else mentioned.

JordanMartinez · 2020-12-31T23:52:27Z

a union b == b union a  -- we break this rule
a union empty == a      -- we follow this rule
a union a == a          -- we follow this rule
a union (a <> a) == a   -- we follow this rule

logShow $ union [1,2,3,3] [2,2,4,4]

Thanks @milesfrain. These examples helped! Pretty sure #203 is ready to go now.

razcore-rad · 2021-01-14T11:39:11Z

Should other containers than sets behave like sets during operations like union, intersect etc?

I personally would assume that an-order preserving container like Array would behave like this:

 -- union is <>
[ 0, 0, 1, 1 ] `union` [ 2, 2, 1 ] == [ 0, 0, 1, 1 ] <> [ 2, 2, 1 ] == [ 0, 0, 1, 1, 2, 2, 1 ]

-- intersect - occurrences are matched
[ 0, 0, 1, 1, 2, 2 ] `intersect` [ 1, 2, 2 ] == [ 1, 2, 2 ]

-- difference - occurrences are matched like `intersect`
[ 0, 0, 1, 2, 2 ] `difference` [ 1, 1, 2 ] == [ 0, 0, 2 ]

All preserving order so union isn't commutative. But the idea is that since these containers do preserve order, and allow duplicates, unlike the set container, then it makes sense that union isn't commutative. And I'd say that this interpretation of the operations makes sense cause of this. But I'm no expert in math so... there's that too.

milesfrain · 2021-01-14T20:50:58Z

union is <>

If that is the case, then I don't think we should offer union at all.

But there is another option for union that builds on your matching proposal for intersect and difference:

union a b = difference a b <> intersect a b <> difference b a

a = [0, 0, 1, 1, 2, 2]
b = [1, 2, 2, 3]

difference a b == [0, 0, 1]
intersect a b == [1, 2, 2]
difference b a == [3]

union a b == [0, 0, 1, 1, 2, 2, 3]
union b a == [3, 1, 2, 2, 0, 0, 1]
sort(union a b) == sort(union b a)

We could include index tracking and recovery to preserve ordering, so:

union b a == [1, 2, 2, 3, 0, 0, 1]

Edit: I realize my above definition could be simplified to:

union a b = a <> difference b a

Which appears to be pretty close to the existing behavior of:

union a b = a <> nub (difference b a)

I think the nub-free version is better-behaved.

But my number-one preference is to eliminate these set-like functions from Array, which rely on nonintuitive matching, and add an ordered set collection.

razcore-rad · 2021-01-15T06:38:26Z

I think there's a strong argument to be made that these functions only make sense for sets so I'm not opposed to removing them for array. I was just offering what I thought too be a better intuition for theirs functionality for this data type. And union as you point out could be improved.

As for ordered set are you talking about this? https://pursuit.purescript.org/packages/purescript-ordered-set/0.4.0

milesfrain · 2021-01-15T07:21:17Z

As for ordered set are you talking about this? https://pursuit.purescript.org/packages/purescript-ordered-set/0.4.0

Oh, nice! Didn't find that on my initial survey.

hdgarrood · 2021-01-15T17:25:28Z

As far as I'm aware, in all of these discussions, we haven't yet had one person indicate that they're currently using union, intersection, or difference, so I am starting to come around to removing them too.

natefaubion · 2021-01-15T17:26:40Z

We use intersection and difference in a few places. I don't think we use union.

hdgarrood · 2021-01-15T17:29:25Z

Ah, fantastic. I'm curious: Would you miss them if we removed them? Would you notice if their behaviour around the handling of duplicates changed? Would it be awkward to refactor that code to use the ordered-set package instead?

hdgarrood · 2021-01-15T17:32:16Z

(Just to be clear, this isn't sarcastic: I'm pleased that we can hear from someone who is actually using these functions)

natefaubion · 2021-01-15T17:42:58Z

I think we are assuming there are no duplicates, we just need the ordering and convenience of Arrays, likely. I don't think we should remove them without having a recommended replacement. We have not had any issues with the existing implementation.

razcore-rad · 2021-01-16T08:09:53Z

I think the discussion is more about the way they should behave, like if we were to take the concepts from scratch, what would be the "natural" behavior. Not that they can be used the way they are implemented as is, cause sure we can use them as they are, as long as they're consistent in some way we can find usages.

It strikes me that the current behavior is weird, that is removing duplicates from one array and not the other in some operations.

Don't you think that your particular case is better covered by purescript-ordered-set instead?

hdgarrood · 2021-01-16T12:46:47Z

The problem with suggesting ordered-set as an alternative is that a) it’s a third party library and it’s hard to be sure how well maintained it’s going to be, and b) that package provides no O(1) conversion to Array, and converting from Array is necessarily O(n log n); if these functions work for arrays already no conversion is necessary. The obvious response to a) is that we should bring ordered-set into core but that’s not appealing to me either, as a huge part of the reason it’s taken us so long to get 0.14 released is that there is so much stuff in core already.

Since this is such a widely used module, and as long as we can’t identify one ideal way the Array functions should behave, I think my preference would now be to leave the behaviour alone, but change them to use Ord for better performance in the next round of breaking changes.

JordanMartinez · 2022-03-25T02:21:51Z

Following up with this, what needs to change?

natefaubion · 2022-03-25T03:04:38Z

It would require someone rewriting these functions in terms of Ord. Without anyone volunteering to do that, we should defer this.

JordanMartinez · 2022-03-25T15:39:53Z

Ah, I remember this now. I tried to resolve this issue in #203 but stopped because performance suffered greatly: #203 (comment)

hdgarrood mentioned this issue Dec 18, 2020

Match nub* functions with Array purescript/purescript-lists#179

Merged

3 tasks

kl0tl mentioned this issue Dec 19, 2020

Core library updates for 0.14 purescript/purescript#3942

Closed

57 tasks

milesfrain mentioned this issue Dec 23, 2020

Improve consistency with other libraries purescript/purescript-lists#184

Open

JordanMartinez mentioned this issue Dec 31, 2020

Make union*, difference*, and intersect* run linearly by sorting elements first #203

Closed

JordanMartinez added purs-0.15 A reminder to address this issue or merge this PR before we release PureScript v0.15.0 type: breaking change A change that requires a major version bump. labels Dec 1, 2021

JordanMartinez added the status: blocked This issue or PR is blocked by something and cannot make progress. label Mar 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Ord by default for all set-like operations #192

Use Ord by default for all set-like operations #192

hdgarrood commented Dec 18, 2020

hdgarrood commented Dec 18, 2020

milesfrain commented Dec 18, 2020

hdgarrood commented Dec 19, 2020

JordanMartinez commented Dec 30, 2020

milesfrain commented Dec 30, 2020 •

edited

Loading

JordanMartinez commented Dec 31, 2020

milesfrain commented Dec 31, 2020

milesfrain commented Dec 31, 2020

JordanMartinez commented Dec 31, 2020

milesfrain commented Dec 31, 2020

JordanMartinez commented Dec 31, 2020

razcore-rad commented Jan 14, 2021

milesfrain commented Jan 14, 2021 •

edited

Loading

razcore-rad commented Jan 15, 2021

milesfrain commented Jan 15, 2021

hdgarrood commented Jan 15, 2021

natefaubion commented Jan 15, 2021

hdgarrood commented Jan 15, 2021

hdgarrood commented Jan 15, 2021

natefaubion commented Jan 15, 2021

razcore-rad commented Jan 16, 2021

hdgarrood commented Jan 16, 2021

JordanMartinez commented Mar 25, 2022

natefaubion commented Mar 25, 2022

JordanMartinez commented Mar 25, 2022

Use Ord by default for all set-like operations #192

Use Ord by default for all set-like operations #192

Comments

hdgarrood commented Dec 18, 2020

hdgarrood commented Dec 18, 2020

milesfrain commented Dec 18, 2020

hdgarrood commented Dec 19, 2020

JordanMartinez commented Dec 30, 2020

milesfrain commented Dec 30, 2020 • edited Loading

JordanMartinez commented Dec 31, 2020

milesfrain commented Dec 31, 2020

milesfrain commented Dec 31, 2020

JordanMartinez commented Dec 31, 2020

milesfrain commented Dec 31, 2020

JordanMartinez commented Dec 31, 2020

razcore-rad commented Jan 14, 2021

milesfrain commented Jan 14, 2021 • edited Loading

razcore-rad commented Jan 15, 2021

milesfrain commented Jan 15, 2021

hdgarrood commented Jan 15, 2021

natefaubion commented Jan 15, 2021

hdgarrood commented Jan 15, 2021

hdgarrood commented Jan 15, 2021

natefaubion commented Jan 15, 2021

razcore-rad commented Jan 16, 2021

hdgarrood commented Jan 16, 2021

JordanMartinez commented Mar 25, 2022

natefaubion commented Mar 25, 2022

JordanMartinez commented Mar 25, 2022

milesfrain commented Dec 30, 2020 •

edited

Loading

milesfrain commented Jan 14, 2021 •

edited

Loading