poor implementation of hashCode of LinkedHashMap #2733

ghost · 2023-02-05T16:09:21Z

System.out.println(LinkedHashMap.of("a",1, "b",2).hashCode());

System.out.println(LinkedHashMap.of("a",2, "b",1 ).hashCode());

Both maps have the same hashCode and this is wrong. I found this bug while participating in Advent of Code. An algorithm was extremely slow because of that. Java implementations is right

danieldietrich · 2023-02-05T18:27:42Z

Thank you for your finding and the minimal example! What do you mean by "Java implementation is right"? Do you suggest to use the hash algorithm used by Java's LinkedHashMap?

ghost · 2023-02-05T18:34:16Z

I mean Java native hashCode implementation for HashMap returns different hashcodes for that example. The problem I faced is that I had a grid modeled with a linkedhashmap. The grid had always the same elements BUT in different positions. The current implementation returns always the same hashCode no matter the positions of the elements. Since I was storing the grids in a Set and all of them had the same hashcode, the performance was really bad...

jarlah · 2023-04-26T13:20:38Z

I made a stab at this, but I'm not sure if its the right way to "fix it".

wrandelshofer · 2023-05-07T15:43:20Z

@jarlah You can not change hashCode without changing equals. The result of these two methods must agree. The contract for these two methods is defined in java.lang.Object. So, always change them both, and you are good.

I believe, there is a contradiction between the API of the interface Traversable and the classes that implement it and their unit tests. (?)

Interface Traversable states that hashCode and equals is different for collections with predictable iteration order and for collections with arbitrary iteration order.
Traversable.hashCode
Traversable.equals

However, TreeMap and LinkedHashMap implement hashCode and equals for unordered collections:
TreeMap.hashCode
TreeMap.equals
LinkedHashMap.hashCode
LinkedHashMap.equals

Maybe I am just confused about the Javadoc in Traversable?

TreeMap.isOrdered() returns true, TreeMap.isSequential() returns false. I believe this makes TreeMap a collection with predictable iteration sequence.

LinkedHashMap.isOrdered() returns false, LinkedHashMap.isSequential() returns true. I believe this makes LinkedHashMap a collection with predictable iteration sequence.

jarlah · 2023-05-07T17:15:59Z

@jarlah You can not change hashCode without changing equals. The result of these two methods must agree. The contract for these two methods is defined in java.lang.Object. So, always change them both, and you are good.

Yes, of course. What i did was just to make a PR hightlighting how easy it was to change the behaviour. But it totally is not ready.

wrandelshofer · 2023-05-07T17:17:22Z

@jarlah What do you think about the Javadoc in Traversable.hashCode and Traversable.equals?
Is my interpretation that there is a contradiction correct? Or am I misinterpreting it?

jarlah · 2023-05-08T05:49:24Z

I think vavr is making itself a disservice by redifining such concepts. But i guess its ok, because vavrs collections will only be used and compapred to with vavr collections.

I see your point @wrandelshofer that the doc talks about predictable iteration order and arbitrary iteration order. But i cannot say what it means. Or if you are right.

What i can say however is that to fix my equals in my PR i would need to sort both collections and compare the two sorted collections. I dont think thats very effective, and im worried about the performance impliciations.

wrandelshofer · 2023-05-08T05:59:20Z

Yes, I can not tell whether it is correct or incorrect. I would err on the side of the implementation, and assume that the hashValue/equals depends on the collection types Set/Seq/Map.

I - personally - would not go in the direction of your fix. It is convenient to be able to check sets and maps for equality/hashCode regardless of their iteration order. But I am not the designer. So, it is a viable design direction, of course.

wrandelshofer · 2023-05-08T10:54:10Z

I think vavr is making itself a disservice by redifining such concepts. But i guess its ok, because vavrs collections will only be used and compapred to with vavr collections.

For me - ideally vavr Collections can be swapped in and out with java.util Collections. The only difference being that the vavr Collections have persistent mutability, and have no API methods that can throw UnsupportedOperationException.

SkittishSloth · 2023-07-12T20:52:04Z

I understand that the hash is computed via the underlying state, but - to me - the confusion arises in how those two hash maps - {"a": 1, "b": 2} and {"a": 2, "b": 1} - have the same underlying state? Wouldn't the keys and the values be associated with each other, regardless of any ordering concerns? In that case, why wouldn't the hash code calculation take the key-value relation into consideration?

Playing with this a little because I'm bored and stalling on my "real" work, I can confirm that the only thing that seems to matter for the hash code is that all the keys and values are included - ordering of the keys doesn't matter. So {"b": 1, "a": 2} would have the same hash code as the other examples. This also isn't an edge case for a 2-pair map; I wasn't expecting that it was, but I did it with 3 pairs with the same results.

It also doesn't appear to be a fluke where those permutations just happen to generate the same hash code - which would be unlikely, but there's no reason it would be impossible. It's acceptable - and even expected - that random states for a given object will have the same hash code, even if they're not equal; the important thing is that they have the same hash code when they are equal. So you could always return 1 as your hash code and you'd meet the contract requirements. You'd just have horrible performance if you used it as a key in a hash map. (And there's probably other places that use the hash code, but that's the obvious one at the moment.)

From @wrandelshofer:

@jarlah You can not change hashCode without changing equals. The result of these two methods must agree. The contract for these two methods is defined in java.lang.Object. So, always change them both, and you are good.

It's a little more subtle than this; in this particular case, where it appears that hashCode has an incorrect implementation but equals is correct, then by all means update hashCode and leave equals alone. The important thing is - like you said - they must agree, meaning that they're based on the same internal state. Since equals appears to take key-value relations into account, but hashCode doesn't, I'm going to say they're not based on the same state.

As an aside, while equals takes key-value pairs into consideration, it doesn't seem to take the order of those pairs into consideration, which I would expect from a "linked" collection. So {"a": 1, "b": 2} and {"b": 2, "a": 1} are considered equal, although they shouldn't be. (Given the earlier comments regarding isSequential and isOrdered, I believe the expected result should be that they're not equal.)

pivovarit · 2024-08-23T06:47:03Z

I have an idea how to approach this, PoC: #2803

wrandelshofer · 2024-08-28T15:18:52Z

I looked at your Pull Request #2803.
Shouldn't this fix include a change in LinkedHashMap.hashCode()?
Because in this method we currently have:
return Collections.hashUnordered(this);

But this should be
return Collections.hashOrdered(this);

vavr/src/main/java/io/vavr/collection/LinkedHashMap.java

Line 974 in 5ba70a2

return Collections.hashUnordered(this);

pivovarit · 2024-08-28T15:20:08Z

Here's the explanation: #2803 (comment)

wrandelshofer · 2024-08-28T15:23:40Z

I see your explanation. The explanation is fine.

However, LinkedHashMap extends from Traversable. And Traversable defines the contract of the hashCode() Method.

vavr/src/main/java/io/vavr/collection/Traversable.java

Lines 686 to 698 in 5ba70a2

    
                * Collections with predictable iteration order are hashed as follows: 
        
                * 
        
                * <pre>{@code 
        
                * int hash = 1; 
        
                * for (T t : this) { hash = hash * 31 + Objects.hashCode(t); } 
        
                * }</pre> 
        
                * 
        
                * Collections with arbitrary iteration order are hashed in a way such that the hash of a fixed number of elements is independent of their iteration order. 
        
                * 
        
                * <pre>{@code 
        
                * int hash = 1; 
        
                * for (T t : this) { hash += Objects.hashCode(t); } 
        
                * }</pre>

Maybe, in your pull request, you should change the contract in Traversable. Because LinkedHashMap does not fulfill the contract.

pivovarit · 2024-08-28T15:27:42Z

Ok, that's a great point - I did not realize that Traversable contract defined hashing rules. In this case, this is an easy choice - we need to obey the contract. Thanks!

wrandelshofer · 2024-08-28T15:31:40Z

Or, alternatively, move the contract for equals/hashCode out of Traversable into more specific interfaces.

In the Java JDK API, the Iterable and Collection interfaces do not define equals/hashCode. This happens only in more specific interfaces: in the List, Map, Set interfaces.

pivovarit · 2024-08-28T15:38:05Z

We'll probably need to do both:

for a major release: move the contract into specific interfaces
for a minor/bugfix release, Collections.hashOrdered

The current implementation of the hashCode() method for `Tuple2` has a significant flaw that can easily lead to hash collisions, particularly in data structures like `LinkedHashMap`. ```java LinkedHashMap<String, Integer> m1 = LinkedHashMap.of("a", 1, "b", 2); LinkedHashMap<String, Integer> m2 = LinkedHashMap.of("a", 2, "b", 1); System.out.println(m1.hashCode() == m2.hashCode()); // true ``` In this case, _m1_ and _m2_ should ideally produce different hash codes, but they don't. This is because the current implementation of `Tuple#hashCode()` simply sums the hash codes of all elements, which can result in identical hash codes for different tuples. A potential solution is to apply the XOR (^) operator to the hash codes of all elements. XOR is order-sensitive, so it would resolve the issue in the example above, where the order of elements differs between tuples. However, XOR has its limitations and is only a suitable solution for tuples with up to two elements. This is because XOR is a commutative and associative operation, meaning that the XOR of multiple elements can still result in rather bad collisions: ```java Tuple3<Integer, Integer, Integer> t1 = Tuple.of(1, 2, 3); Tuple3<Integer, Integer, Integer> t2 = Tuple.of(1, 3, 2); System.out.println(t1.hashCode() == t2.hashCode()); // true ``` The new implementation is not perfect as well: ```java HashMap<Integer, Integer> hm1 = HashMap.of(0, 1, 1, 2); HashMap<Integer, Integer> hm2 = HashMap.of(0, 2, 1, 3); System.out.println(hm1.hashCode() == hm2.hashCode()); // true ``` But it is imperfect in the same way as standard library: ```java java.util.HashMap<Integer, Integer> jhm1 = new java.util.HashMap<>(); jhm1.put(0, 1); jhm1.put(1, 2); java.util.HashMap<Integer, Integer> jhm2 = new java.util.HashMap<>(); jhm2.put(0, 2); jhm2.put(1, 3); System.out.println(jhm1.hashCode() == jhm2.hashCode()); // true ``` ---- Related: #2733

pivovarit · 2024-08-28T20:22:00Z

Well, it's slightly more complicated since ordered hashcode breaks equals/hashcode contracts. Yes, this could be fixed by adjusting the equals() implementation, but this is a relatively big library that is used by many people who would not appreciate such a change even in a major version. I need more data to make an informed decision. Stay tuned

wrandelshofer · 2024-08-29T14:15:34Z

I just noticed that LinkedHashSet also uses Collections.hashUnordered().

vavr/src/main/java/io/vavr/collection/LinkedHashSet.java

Lines 966 to 968 in b1ce608

    
           public int hashCode() { 
        
               return Collections.hashUnordered(this); 
        
           }

Maybe the most sensible thing would be to change the Javadoc in Traversable.

danieldietrich added the design/refactoring/improvement label Feb 5, 2023

jarlah mentioned this issue Apr 26, 2023

discussion: should hashcode be calculated based on order #2742

Closed

pivovarit mentioned this issue Aug 22, 2024

Improve hashCode() implementation for Tuple2 #2803

Merged

pivovarit mentioned this issue Aug 28, 2024

Ordered LinkedHashMap#hashCode() implementation #2806

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

poor implementation of hashCode of LinkedHashMap #2733

poor implementation of hashCode of LinkedHashMap #2733

ghost commented Feb 5, 2023 •

edited by ghost

Loading

danieldietrich commented Feb 5, 2023

ghost commented Feb 5, 2023 •

edited by ghost

Loading

jarlah commented Apr 26, 2023

wrandelshofer commented May 7, 2023 •

edited

Loading

jarlah commented May 7, 2023

wrandelshofer commented May 7, 2023

jarlah commented May 8, 2023

wrandelshofer commented May 8, 2023

wrandelshofer commented May 8, 2023

SkittishSloth commented Jul 12, 2023

pivovarit commented Aug 23, 2024

wrandelshofer commented Aug 28, 2024

pivovarit commented Aug 28, 2024

wrandelshofer commented Aug 28, 2024

pivovarit commented Aug 28, 2024 •

edited

Loading

wrandelshofer commented Aug 28, 2024

pivovarit commented Aug 28, 2024

pivovarit commented Aug 28, 2024

wrandelshofer commented Aug 29, 2024 •

edited

Loading

poor implementation of hashCode of LinkedHashMap #2733

poor implementation of hashCode of LinkedHashMap #2733

Comments

ghost commented Feb 5, 2023 • edited by ghost Loading

danieldietrich commented Feb 5, 2023

ghost commented Feb 5, 2023 • edited by ghost Loading

jarlah commented Apr 26, 2023

wrandelshofer commented May 7, 2023 • edited Loading

jarlah commented May 7, 2023

wrandelshofer commented May 7, 2023

jarlah commented May 8, 2023

wrandelshofer commented May 8, 2023

wrandelshofer commented May 8, 2023

SkittishSloth commented Jul 12, 2023

pivovarit commented Aug 23, 2024

wrandelshofer commented Aug 28, 2024

pivovarit commented Aug 28, 2024

wrandelshofer commented Aug 28, 2024

pivovarit commented Aug 28, 2024 • edited Loading

wrandelshofer commented Aug 28, 2024

pivovarit commented Aug 28, 2024

pivovarit commented Aug 28, 2024

wrandelshofer commented Aug 29, 2024 • edited Loading

ghost commented Feb 5, 2023 •

edited by ghost

Loading

ghost commented Feb 5, 2023 •

edited by ghost

Loading

wrandelshofer commented May 7, 2023 •

edited

Loading

pivovarit commented Aug 28, 2024 •

edited

Loading

wrandelshofer commented Aug 29, 2024 •

edited

Loading