Iterator idioms

4

Here is some old school iterator style (written here with generics for comparison with later examples):

Iterator<Foo> iter = list.iterator();
while(iter.hasNext()) {
Foo foo = iter.next();
foo.bar();
}

A better alternative is to move the Iterator construction into the loop. This is better because it reduces the scope of the iterator instance to just the loop. Using this style eliminates a class of potential bugs / confusing code that can arise from reusing that iterator instance after the loop ends.

for(Iterator<Foo> iter = list.iterator(); iter.hasNext(); ) {
Foo foo = iter.next();
foo.bar();
}

Of course, in Java 5+ you can push that even farther by hiding the use of the iterator completely:

for(Foo foo : list) {
foo.bar();
}

This also assumes that the list was created with a generic collection and is typed as List<Foo>. This is the sweet spot for generics and the new Java 5 for loop.

If you drop down a level, it’s interesting to look at the bytecode comparison as well. The fun thing is that these three are basically all equivalent at the bytecode level. The notion of “block scope” is really a matter of Java language scope, not really one that translates into the bytecode. The bytecode for these three examples is essentially identical:

L0
ALOAD 1: list
INVOKEINTERFACE List.iterator() : Iterator
ASTORE 2: iter
L1
GOTO L2
L3
ALOAD 2: iter
INVOKEINTERFACE Iterator.next() : Object
CHECKCAST Foo
ASTORE 3: foo
L4
ALOAD 3: foo
INVOKEVIRTUAL Foo.bar() : void
L2
ALOAD 2: iter
INVOKEINTERFACE Iterator.hasNext() : boolean
IFNE L3
L5
RETURN
L6

For those not conversant at the bytecode level, let’s walk through the first column. In the byte code, everything comes down to the local variables (which are numbered) and the stack. When a method is called, the stack contains a local variable #0, which is always “this” and subsequent local variables for each argument passed to the method.

  • L0 – Label 0, beginning of method
  • ALOAD 1: list – Read local variable #1 (the list) and push onto the stack
  • INVOKEINTERFACE List.iterator() : Iterator – Invoke the interface method on the object on the stack and place the result back on the stack.
  • ASTORE 2: iter – Store the top of the stack into local variable #2
  • L1 – Label 1, beginning of method
  • GOTO L2 – Go to end loop termination check. (And here you thought Java didn’t have goto!)
  • L3 – Label 3, beginning of loop
  • ALOAD 2: iter – Load local variable #2 (iter) onto stack
  • INVOKEINTERFACE Iterator.next() : Object – Invoke the next method on iter and place result on stack
  • CHECKCAST Foo – Check that the returned object is a Foo
  • ASTORE 3: foo – Store foo from stack into local variable 3
  • L4 – Label 4, ready to call foo.bar()
  • ALOAD 3: foo – Load local var #3 (foo) onto stack
  • INVOKEVIRTUAL Foo.bar() : void – Invoke foo.bar()
  • L2 – Label 2, check loop termination condition
  • ALOAD 2: iter – Load iter onto stack
  • INVOKEINTERFACE Iterator.hasNext() : boolean – Invoke hasNext()
  • IFNE L3 – If result is not equal to 0 (false), then go to label L3
  • L5 – Label 5, done with the loop
  • RETURN – Return from method
  • L6 – Label 6, end of method

Notice how the notion of scope in the loop is completely absent from this code?

FYI, the original byte code had LINENUMBER bytecode instructions which I removed to shorten things up. Many of the labels exist solely for use by the LINENUMBER instructions added in debug mode, so many of these labels are not strictly necessary.

Comments

4 Responses to “Iterator idioms”
  1. How does one generate the bytecode? Which JDK tool did you use? (A blindspot in my Java education, obviously!)

  2. Alex says:

    Great question! I should have mentioned it. The tool you already have is javap, which is part of the JDK. Javap take a class and will show the bytecode, signatures, etc. You would run it here with something like:

    javap -c -classpath ~/project test.TestIterator

    The -c does disassembly (shows bytecode). Otherwise, it will print method signatures by default. There are other flags as help if you run javap -help.

    The tool I actually used is the fantastic ASM ByteCode Outline plugin for Eclipse. It lets you look at the bytecode for a selected method and can even show you the code to generate that bytecode using ASM, which is fairly indispensable when working at Terracotta. :)

  3. Although not surprising, this is a very interesting insight to what happens with your Java code. It actually gives a hint that Java source code optimization (e.g. chaining calls) to reduce the number of lines does nothing but obfuscate your intentions. Maybe interesting to see some other code patterns. Like me, often assigning a pile of final variables before calling a method.

  4. Guillaume Poirier says:

    There is actually a difference in the bytecode generated by the while and for idoms, it’s just that it’s subtle and doesn’t show in your exemple.

    Take this example:

    - for idiom :

    List list = Arrays.asList(“hello”, “world”);
    for (String s : list) {
    System.out.println(s);
    }
    String name = “World”;
    String str = “Hello, ” + name;
    System.out.println(str);

    - while idom:

    List list = Arrays.asList(“hello”, “world”);
    Iterator it = list.iterator();
    while (it.hasNext()) {
    String s = it.next();
    System.out.println(s);
    }
    String name = “World”;
    String str = “Hello, ” + name;
    System.out.println(str);

    In that exemple, the generated bycode will differ, the while idom will require more memory on its stack frame for local variables than the for idom. The for idom uses 3 memory addresses and the while uses 4, because it cannot reuse the slot that was used for “it”.

    It’s really a small difference though, the most important advantage of the for idom is really at compile time, that you you can safely reuse the “it” name for future variables and have it isolated from the one used by the control loop.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!