Entry set puzzler

5

My colleague Chris Dennis, sent this puzzler around today:

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;

public class EntrySetSize {

  public static void main(String[] args) {
    printSetSize(new ConcurrentHashMap());
    printSetSize(new HashMap());
  }

  private static void printSetSize(Map map) {
    map.put("hello", "world");
    map.put("world", "hello");
    Set set = new HashSet(map.entrySet());
    System.out.print(set.size() + " ");
  }
}

The question is what does this program print when executed?

  • a) 2 2
  • b) 1 2
  • c) 0 2
  • d) something else</li>

It seems like the answer should be a) and you will actually see that in JDK 1.6+. However, in JDK 1.5 you’ll see b). Understanding why requires a bit of a jaunt through the CHM code, where dragons lurk. Also, it’s actually possible to see several possible answers (at least a and b) in JDK 1.5 depending on what keys and values you use.

map.entrySet()

This call on a CHM does the following:

    public Set<Map.Entry<K,V>> entrySet() {
        Set<Map.Entry<K,V>> es = entrySet;
        return (es != null) ? es : (entrySet = (Set<Map.Entry<K,V>>) (Set) new EntrySet());
    }

All that does is create (and cache) an EntrySet instance, which is an inner (non-static) class specific to CHM. It has access to the real CHM state and implements the interface using that state.

new HashSet()

The HashSet is constructed with the EntrySet instance. A new HashSet with an initialization collection will do this:

    public HashSet(Collection<? extends E> c) {
        map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

So, the state from the source collection is added through addAll().

HashSet.addAll()

    public boolean addAll(Collection<? extends E> c) {
        boolean modified = false;
        Iterator<? extends E> e = c.iterator();
        while (e.hasNext()) {
            if (add(e.next()))
                modified = true;
            }
            return modified;
        }

There is some extra stuff here to manage the modified flag needed for fail-fast iteration but basically, this code walks through the collection’s iterator calling next(), then add() on the set under construction. Here’s where things get interesting. We need to look at what the EntrySet uses for an iterator and what happens when you call next().

EntrySet.iterator()

        public Iterator<Map.Entry<K,V>> iterator() {
            return new EntryIterator();
        }

Here we return yet another inner non-static class EntryIterator. EntryIterator is defined as follows:

final class EntryIterator extends HashIterator implements Map.Entry<K,V>, Iterator<Entry<K,V>>

This should give you some pause. We fully expect to see “extends HashIterator” – there are several key/value/entry iterator flavors buried in here and they share most of their implementation in HashIterator. We also fully expect to see “implements Iterator<Entry<K,V>>” since that’s what we’re doing. But seeing “implements Map.Entry<K,V>” should make you go huh? This iterator walks over Entry<K,V> instances in the CHM yet it itself is typed as an entry?

What does it do on Iterator.next()?

EntryIterator.next()

public Map.Entry<K,V> next() {
nextEntry();
return this;
}

Here we see the iterator advance to the next entry (it walks through segments at the high level and entries at the low level) but in the end it returns this?

The key here is that the EntrySet’s iterator just stores the current key/value during iteration and returns itself as the entry object for every single actual entry. If you modify that entry, it writes through appropriately to the CHM. Presumably, this saves on the creation and destruction of lots of little Map.Entry instances that will be used just during iteration and typically never looked at again.

Therefore, when the EntrySet iterator is walked through in HashSet.addAll(), AND because of how EntryIterator.hashCode() is implemented:

        public int hashCode() {
            // If not acting as entry, just use default.
            if (lastReturned == null)
                return super.hashCode();

            Object k = getKey();
            Object v = getValue();
            return ((k == null) ? 0 : k.hashCode()) ^
                   ((v == null) ? 0 : v.hashCode());
        }

Here the hashCode is created as the bitwise OR of the key hash code and the value hash code. So, regardless of what the key and value hashcodes are, a ^ b == b ^ a. In our case we used two strings “hello” and “world” swapped. Thus the second entry instance will be seen as an identical hash code (and instance) and will simply put over the first entry instance in the HashSet.

A subtly different bug will happen in the case where this property (of swapped key/value) doesn’t occur. In that case, both entries will end up in the HashSet (size == 2 as expected) but they are the same instance and were put into the set with different hash codes. It’s always a bad deal to put something in a HashSet or HashMap and then modify the object in ways that modify the hashCode(). This breaks assumptions in the HashMap class that can cause lost elements in the collection and other weird behavior.

As I mentioned, this same instance behavior is probably surprising to most people. It’s doc’ed here in Sun bug #6312706; as a general problem in several maps (CHM, IdentityHashMap, and EnumMap). That bug is marked as “fix understood” which is kind of surprising because this behavior is already changed and fixed in JDK 1.6. In 1.6, the EntryIterator is implemented as:

    final class EntryIterator
	extends HashIterator
	implements Iterator<Entry<K,V>>
    {
        public Map.Entry<K,V> next() {
            HashEntry<K,V> e = super.nextEntry();
            return new WriteThroughEntry(e.key, e.value);
        }
    }

Here EntryIterator does not implement Map.Entry and next() returns a new instance of WriteThroughEntry, which causes the HashSet to be much happier on creation.

Interestingly, another smart colleague noted that the existing Terracotta ConcurrentHashMap instrumentation fixes the 1.5 implementation by creating a wrapper object in the iterator as well.

Comments

5 Responses to “Entry set puzzler”
  1. It isn’t fixed in IdentityHashMap.

  2. EnumMap is still borked as well. Good write-up btw.

  3. James Roper says:

    Thus the second entry instance will be seen as an identical hash code (and instance) and will simply put over the first entry instance in the HashSet.

    That’s not quite correct. You could implement hashCode to always return 0, thats still a valid hashCode function, it just means that every entry in the map will collide, and so make any operations on the map run in n time (ie very slow) because every entry will hash to the same position. Java’s HashMap implementation uses chaining to resolve collisions, that is, each entry is a linked list to other entries that collide with it.

    The reason it gets overwritten is because after the two entries get hashed to the same place, it then iterates through the entries linked list, checking for identity equality first, then object equality:


    int i = indexFor(hash, table.length);
    for (Entry e = table[i]; e != null; e = e.next) {
    Object k;
    if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
    V oldValue = e.value;
    e.value = value;
    e.recordAccess(this);
    return oldValue;
    }
    }

    Because the iterator is returning itself each time, the if statement will always return true after the first entry is added, and so subsequent entries will always overwrite it.

    The same bug can occur, just not with the same level of reproducability, if the key and value are not swapped, because even if their hashCodes are different, they may still end up colliding (the default initial capacity of a HashMap is only 16, so there’s a 1 in 16 chance that they will collide).

  4. Also, have a look at the contents of the set (using IdHashMap and EnumMap) if you add elements that do not have colliding hashcodes. (print the toString())

  5. anonymous says:

    Sorry, but this isnt a puzzler. Just a bug.