Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model: Fix the Set contract for data classes that implement Comparable #6229

Closed
wants to merge 1 commit into from

Conversation

sschuberth
Copy link
Member

Kotlin's data classes have auto-generated equals() functions that take all primary properties into account. When implementing the Comparable interface for those classes, i.e. to give them a defined order in a SortedSet, special attention needs to be put on compareTo() to return 0 if and only if equals() returns true. This is required because "a sorted set performs all element comparisons using its compareTo method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal" 1. This means that sorted sets are considered equal if for all their respective elements compareTo() returns 0. So if compareTo() is implemented to not consider all properties for an equality check, this can lead to unexpected results, for example when using Kotest's data class matchers.

Fix this and obey the general contract of the Set interface by taking all properties into account if the properties to sort on are equal. As the ordering does not matter if not all properties are equal but the properties to sort on are, arbitrarily return 1 from compareTo() in that case.

Signed-off-by: Sebastian Schuberth [email protected]

…able`

Kotlin's data classes have auto-generated `equals()` functions that take
all primary properties into account. When implementing the `Comparable`
interface for those classes, i.e. to give them a defined order in a
`SortedSet`, special attention needs to be put on `compareTo()` to
return 0 if and only if `equals()` returns true. This is required because
"a sorted set performs all element comparisons using its compareTo
method, so two elements that are deemed equal by this method are, from the
standpoint of the sorted set, equal" [1]. This means that sorted sets are
considered equal if for all their respective elements `compareTo()`
returns 0. So if `compareTo()` is implemented to not consider all
properties for an equality check, this can lead to unexpected results, for
example when using Kotest's data class matchers.

Fix this and obey the general contract of the `Set` interface by taking
all properties into account if the properties to sort on are equal. As
the ordering does not matter if not all properties are equal but the
properties to sort on are, arbitrarily return 1 from `compareTo()` in
that case.

[1]: https://docs.oracle.com/javase/8/docs/api/java/util/SortedSet.html

Signed-off-by: Sebastian Schuberth <[email protected]>
when {
id < other.id -> -1
id > other.id -> 1
else -> if (this == other) 0 else 1 // Return arbitrary inequality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this commit do impact the performance negatively.

In this particular case it leads to recursion: the whole subtree rooted at this and other is compared, which I doubt is a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole subtree rooted at this and other is compared, which I doubt is a good idea.

Actually, that's exactly the point in order to be able to use Kotest's data class matchers e.g. on whole OrtResults. This would allow to get rid of the fragile text line comparison which is really hard to grasp on CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the text comparison also makes it very easy to see the effect of changes during development and to update the expected results files, if the plan is to change the matching for the functional analyzer tests we need to consider this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the text comparison also makes it very easy to see the effect of changes

True, and it makes it easy to take over new expected results from actual results. My main point is that I believe Kotest's data class matchers should work with out code base in case someone decides to use them in tests, but currently they don't as soon as a SortedSet is involved anywhere in the hierarchy.

model/src/main/kotlin/Scope.kt Show resolved Hide resolved
metadata.id < other.metadata.id -> -1
metadata.id > other.metadata.id -> 1
else -> if (this == other) 0 else 1 // Return arbitrary inequality.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this commit, I would again like to bring up the question whether
using a combination of the following two for the task of sorting things in serialization is the right approach:

  1. Sorted collections
  2. Implementing Comparable

E.g.: If implementing Comparable was avoided (for ordering things in the serializer), would these kind of problems then be not present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If implementing Comparable was avoided (for ordering things in the serializer), would these kind of problems then be not present?

Yes. I was also thinking about not using any sorted sets in the model classes at all, and only sort upon serialization, as sorting should only matter for human readability anyway. Doing this should also improve insertion performance into the (now non-sorted) sets. However, I did not find an elegant way to sort collections on serialization without writing custom serializers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we agree that this could be good to do. I've tried to write this down in an issue, see #6235.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you prefer to move this PR forward?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know yet. This is just a draft to play with (I wanted to see what tests it breaks, if any).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I did not find an elegant way to sort collections on serialization without writing custom serializers.

There's actually an existing issue in Jackson for making this easily configurable, see FasterXML/jackson-databind#3166.

@sschuberth
Copy link
Member Author

I'm closing this in favor of #6244.

@sschuberth sschuberth closed this Dec 19, 2022
@sschuberth sschuberth deleted the fix-set-contract branch December 19, 2022 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants