-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Ord by default for all set-like operations #192
Comments
If we go ahead with this, the same changes should propagate through to the |
Yes, this seems really strange, even if it does match how things are done in Haskell. I'm curious if any projects rely on this asymmetric duplicate-preserving behavior. If we match Another option is to thin-out these libraries a bit and recommend using Perhaps we should open a discourse thread to discuss these proposals, since this affects multiple libraries. |
The main reason I don't really want to just deprecate these functions and tell people to just use sets instead is that arrays have ordering, whereas sets don't, which means that the set-like Array/List operations can preserve ordering, and this might be useful in some cases. |
This is the last issue in this repo that doesn't currently have a PR opened for it. |
I'm thinking we should table all these container ( |
A few questions I have. First, how is Second, in the following examples of [1, 2, 3, 4, 3, 2] \\ [1]
-- I assume [2, 3, 4, 3, 2]
[1, 2, 3, 4, 3, 2] \\ [2]
[1, 2, 3, 4, 3, 2] \\ [2, 2]
-- It depends on whether each 2 in the right array
-- removes only one or all 2 element(s) in the left array |
a union b == b union a -- we break this rule
a union empty == a -- we follow this rule
a union a == a -- we follow this rule
a union (a <> a) == a -- we follow this rule I think Example input/output: logShow $ union [1,2,3,3] [2,2,4,4]
logShow $ [1,2,3,3] <|> [2,2,4,4]
logShow $ [1,2,3,3] <> [2,2,4,4]
logShow $ [1, 2, 3, 4, 3, 2] \\ [1]
logShow $ [1, 2, 3, 4, 3, 2] \\ [2]
logShow $ [1, 2, 3, 4, 3, 2] \\ [2, 2]
|
I think an ordered set would be even better. http://hackage.haskell.org/package/ordered-containers-0.2.2/docs/Data-Set-Ordered.html |
Isn't |
Better if we assume these requirements:
Of course, there can be other requirements, but I didn't see anything else mentioned. |
Thanks @milesfrain. These examples helped! Pretty sure #203 is ready to go now. |
Should other containers than sets behave like sets during operations like I personally would assume that an-order preserving container like -- union is <>
[ 0, 0, 1, 1 ] `union` [ 2, 2, 1 ] == [ 0, 0, 1, 1 ] <> [ 2, 2, 1 ] == [ 0, 0, 1, 1, 2, 2, 1 ]
-- intersect - occurrences are matched
[ 0, 0, 1, 1, 2, 2 ] `intersect` [ 1, 2, 2 ] == [ 1, 2, 2 ]
-- difference - occurrences are matched like `intersect`
[ 0, 0, 1, 2, 2 ] `difference` [ 1, 1, 2 ] == [ 0, 0, 2 ] All preserving order so |
If that is the case, then I don't think we should offer But there is another option for union a b = difference a b <> intersect a b <> difference b a
a = [0, 0, 1, 1, 2, 2]
b = [1, 2, 2, 3]
difference a b == [0, 0, 1]
intersect a b == [1, 2, 2]
difference b a == [3]
union a b == [0, 0, 1, 1, 2, 2, 3]
union b a == [3, 1, 2, 2, 0, 0, 1]
sort(union a b) == sort(union b a) We could include index tracking and recovery to preserve ordering, so:
Edit: I realize my above definition could be simplified to: union a b = a <> difference b a Which appears to be pretty close to the existing behavior of: union a b = a <> nub (difference b a) I think the nub-free version is better-behaved. But my number-one preference is to eliminate these set-like functions from Array, which rely on nonintuitive matching, and add an ordered set collection. |
I think there's a strong argument to be made that these functions only make sense for sets so I'm not opposed to removing them for array. I was just offering what I thought too be a better intuition for theirs functionality for this data type. And As for ordered set are you talking about this? https://pursuit.purescript.org/packages/purescript-ordered-set/0.4.0 |
Oh, nice! Didn't find that on my initial survey. |
As far as I'm aware, in all of these discussions, we haven't yet had one person indicate that they're currently using union, intersection, or difference, so I am starting to come around to removing them too. |
We use |
Ah, fantastic. I'm curious: Would you miss them if we removed them? Would you notice if their behaviour around the handling of duplicates changed? Would it be awkward to refactor that code to use the ordered-set package instead? |
(Just to be clear, this isn't sarcastic: I'm pleased that we can hear from someone who is actually using these functions) |
I think we are assuming there are no duplicates, we just need the ordering and convenience of Arrays, likely. I don't think we should remove them without having a recommended replacement. We have not had any issues with the existing implementation. |
I think the discussion is more about the way they should behave, like if we were to take the concepts from scratch, what would be the "natural" behavior. Not that they can be used the way they are implemented as is, cause sure we can use them as they are, as long as they're consistent in some way we can find usages. It strikes me that the current behavior is weird, that is removing duplicates from one array and not the other in some operations. Don't you think that your particular case is better covered by purescript-ordered-set instead? |
The problem with suggesting ordered-set as an alternative is that a) it’s a third party library and it’s hard to be sure how well maintained it’s going to be, and b) that package provides no O(1) conversion to Array, and converting from Array is necessarily O(n log n); if these functions work for arrays already no conversion is necessary. The obvious response to a) is that we should bring ordered-set into core but that’s not appealing to me either, as a huge part of the reason it’s taken us so long to get 0.14 released is that there is so much stuff in core already. Since this is such a widely used module, and as long as we can’t identify one ideal way the Array functions should behave, I think my preference would now be to leave the behaviour alone, but change them to use Ord for better performance in the next round of breaking changes. |
Following up with this, what needs to change? |
It would require someone rewriting these functions in terms of Ord. Without anyone volunteering to do that, we should defer this. |
Ah, I remember this now. I tried to resolve this issue in #203 but stopped because performance suffered greatly: #203 (comment) |
We already made
nub
use Ord by default in #91, but there are a few other unnecessarily quadratic functions in this library still which probably deserve the same treatment. I think these areunion
,intersect
, anddifference
.As well as being quadratic,
union
anddifference
have a strange-seeming behaviour where duplicates are only preserved in the first argument:See #56 for more info. We ought to be careful about changing this behaviour but I think we should consider it, since I think it’s very likely to trip people up and we will be making breaking changes to these functions anyway if we do this.
The text was updated successfully, but these errors were encountered: