LBQ + GC = Slow

12

There was an interesting exchange today on the concurrency-interest mailing list about LinkedBlockingQueue, one of the most important and commonly used queue implementations in java.util.concurrent. LBQ is an optionally-bounded thread-safe queue based on a linked list.

The key method in question is the extract() method used to remove the head of the queue:

private E extract() {
Node<E> first = head.next;
head = first;
E x = first.item;
first.item = null;
return x;
}

This method is going to release the first item (head) out into the world and head.next will become the new head. The issue here is that head will be pulled off, used, and probably become garbage but the head Node retains its head.next link to the next Node.

This is bad because it means that there will be a chain of references back through all the old Node references even after they’ve been removed from the queue and thrown away. GC will of course take care of all those references since the Nodes are garbage.

But the garbage collectors we use are generational – they periodically sweep through young objects and kill the vast numbers of them that have been short-lived. The ones that survive live to the next generation and eventually if objects are old enough they move to the old generation where they are collected less frequently by a full GC.

The problem in the queue nodes is that if they live in the queue long enough (if the queue is big), then one of them will enter the old gen and at that point the reference from old gen into young gen prevents the whole chain of dead nodes from being reclaimed by young generational collection. This is highly inefficient for the GC.

This problem can of course be solved simply by nulling the Node next reference:

private E extract() {
Node<E> first = head.next;
head.next = null; // Kill this reference
head = first;
E x = first.item;
first.item = null;
return x;
}

This breaks the chain and prevents the reference from old gen into young gen. This makes a huge difference in GC and performance. You can see some results from a test demonstrating the problems:

Test 1 took: 1595ms
Test 2 took: 2639ms
Exception in thread "main" java.lang.AssertionError: Full GC has
occured during the test: 47 times
	at TestLBQ.main(TestLBQ.java:72)

My workaround of null'ing head.next before moving the head works
perfectly and displays (as expected):
Test 1 took: 1629ms
Test 2 took: 1592ms

On the original version (the one in the JDK), the second pass through the test is significantly slower and full gc occurred 47 times as opposed to the fixed version which shows a faster second pass and no full gc.

Even worse was a test with the new G1 garbage collector in Java 7. This new collector takes the generational idea and pushes it even further. Unfortunately, that means it’s even more affected by this problem (while in general it is better for many apps):

JDK7 version:
Test 1 took: 5456ms
Test 2 took: 5575ms

With null assignment:
Test 1 took: 1698ms
Test 2 took: 1602ms

It seems this was not the first discovery of this issue as a couple of people were already using modified LBQs with this change. Hopefully this issue will be fixed in Java 7 at least.

Comments

12 Responses to “LBQ + GC = Slow”
  1. I agree that is an interesting bug.
    It’s a good example for a typical problem with memory usage. Rule : when pooling objects always think about cleaning up large objects as early as possible.

    Maybe I should write a series of blogs about typical memory usage problems, similiar to your great series about concurrency problems :)

  2. Ismael Juma says:

    Hi,

    Note that Doug Lea has already committed the suggested change to the CVS repository where the concurrency libraries are hosted. It just needs to be backported into OpenJDK now.

    Ismael

  3. Josef Svenningsson says:

    Very interesting bug! It’s remarkable what a big impact this stray reference has.
    I should mention though that it’s been known for quite some time that queues and generational garbage collection doesn’t work that well together. You can read more about these things i the work by Lars Hansen, for instance here.

  4. Alex says:

    @Markus: That would be great!
    @Ismael: Yes, that is good to hear.
    @Josef: Thanks for the link…

  5. Vince says:

    Wow, this is actually hitting us in production from what I can tell. I’ve had to deal with random FullGCs despite having almost no long-lived objects in our system.

    Sure enough, I stumble upon this article (thank you!) and we’re using LinkedBlockingQueues at the core of our ‘event framework’.

    No ‘memory leaks’ in dev, but sure enough, in production the queues grow and ‘shrink’ a lot faster, and the items in our queue hold references to a lot of temporary garbage….

  6. kirk says:

    Nice entry. Setting a variable to null to help GC is rarely the right solution in my experience. As you’ve noted, objects in old are included in the GC root set for a young gen collection. Unfortunately this means that dead objects in old will cause objects in young to survive when they really shouldn’t. However there is not way to correct this unless you mark tenured as well as young. So this phenomena, called nepotism, is much more common than most realize.

    In this case one would hope that HotSpot could recognize that anything referenced by queue shouldn’t be promoted or at worst, pushed back into young (something it is capable of doing).

    Now to the bench its self. I’m not sure of how representative this bench is to have people setting objects to null. I generally find that setting objects to null to help GC (or just about any other reason) implies that they are too broadly scoped. Code smell for redesign or refactoring. In this case I suspect that any object caught in queue is going to survive at least 1 GC. also, when collections are promoted they tend to cause survivor to be flushed. Read premature tenuring. In this case, it simply the shape of the collection that is working against the microbenchmark. You can get even worse effects by using different collections and using different patterns to dereference contained elements. Some of the effects can be quite dramatic.

  7. As far as i can see ConcurrentLinkedQueue has the same issue:

    public E poll() {
    for (;;) {
    Node<E> h = head;
    Node<E> t = tail;
    Node<E> first = h.getNext();
    if (h == head) {
    if (h == t) {
    if (first == null)
    return null;
    else
    casTail(t, first);
    } else if (casHead(h, first)) {
    E item = first.getItem();
    // null "next" of old head node "h" here
    if (item != null) {
    first.setItem(null);
    return item;
    }
    // else skip over deleted item, continue loop,
    }
    }
    }
    }

  8. Update to my comment: The “casHead(..)” method is the more suitable place to clear the next pointer.


    private boolean casHead(Node<E> cmp, Node<E> val) {
    return headUpdater.compareAndSet(this, cmp, val);
    }

    would become


    private boolean casHead(Node<E> cmp, Node<E> val) {
    final result = headUpdater.compareAndSet(this, cmp, val);
    if (result) h.setNext(null);
    return result;
    }

  9. Vince says:

    Can someone please explain why this might be worse with G1?

    From what I understand, G1 marks the whole stack each collection, so shouldn’t it see that everything is dead right off the bat? I thought the problem was that the dead nodes in the tenured generation were keeping everything alive even in the new gen, at least until you do an expensive full (compacting?) gc?

    Maybe the bug is over my head and I’m missing it…

  10. Alex says:

    I certainly am not a G1 guru so I’m just guessing here. But I think the problem is references across generations. In G1, instead of 2-3 generations, there are many many more. So, the problem of bogus references across generations is that much more likely and (incorrectly) prevents garbage from being reclaimed across generations.

  11. Ryan Eccles says:

    I’ve heard Tony Printezis refer to this promotion problem as nepotism.

  12. Amit says:

    Good read but I found that this issue is fixed in JDK 1.6 (atleast for update 23)

    private E dequeue() {
    // assert takeLock.isHeldByCurrentThread();
    Node h = head;
    Node first = h.next;
    h.next = h; // help GC
    head = first;
    E x = first.item;
    first.item = null;
    return x;
    }