Printing your Java source stats

2

Eric Burke posed a challenge for next week to post the longest human-coded method.

I mentioned in his comments that you can use JavaNCSS for this. Just in case that wasn’t quite enough detail, here’s a brief sketch of how to do so. You’ll need to make some tweaks for your environment but I think you’ll get the idea.

Install JavaNCSS

  1. Go to the link above, download JavaNCSS and expand it
  2. export JAVA_HOME=…
  3. export JAVANCSS_HOME=…
  4. export PATH=$PATH:$JAVANCSS_HOME/bin

Install Xalan

This is just to run some XSLT stylesheets on the output XML file.

  1. Download Xalan from Apache and expand it somewhere
  2. export XALAN_HOME=…
  3. export CLASSPATH=$XALAN_HOME/xalan.jar:\
    $XALAN_HOME/xercesImpl.jar:\
    $XALAN_HOME/xml-apis.jar:\
    $XALAN_HOME/serializer.jar

Run JavaNCSS on your source tree

  1. cd …root of all source files…
  2. javancss -all -xml -out $JAVANCSS_HOME/ncss.xml -recursive *

Transform the output

  1. cd $JAVANCSS_HOME
  2. java org.apache.xalan.xslt.Process -in ncss.xml -xsl xslt/wfncss.xsl -out ncss.html

You will now have an ncss.html file generated from your output that you can use to look for huge methods (and other interesting bad stuff).

Caveats

If you want to be more precise about including many source trees, more work is involved. I wrote an Ant task to make this easier (the referenced one didn’t work for me on modern Ant versions) but I don’t have the code anymore. Ant does such a nice job making it easy to collect source flexibly! I’ll leave this as an exercise for the reader….

There are a bunch of XSLT reports in $JAVANCSS_HOME/xslt. You might find others more interesting. The one I chose just outputs violations of the default warning levels. It doesn’t sort the output as nicely as you’d wish, but should cut it down enough to scan it by eye. Or, you can modify it to sort as desired. Another exercise for the reader…

Hope that helped. You might learn some other interesting stuff here as well if you’ve never looked at your source stats!

PS. 707 was the longest human-coded method I found in our source.

Comments

2 Responses to “Printing your Java source stats”
  1. It’s interesting, how programmers often come back to “lines of code” in several situations. As it seems obvious, nowadays, that LoC is no good measure for an engineers productivity (although, I am sure some companies still use it), it seems a good measure for understandability.
    I don’t know the method of yours being as long as an airplane is named, but I assume, it could be broken down to several sections playing separate roles being stuffed in a method to share some resources. Most times, such methods could be replaced with some stateful object being used in a Builder like pattern or parts of the code could be outsourced to sub methods (which most likely are only used once, but object-oriented coding is not only about reuse).
    In my early years, when I was coding in Smalltalk, we had a rule about a method spanning at most two pages on screen. Of course, this is no hard rule but to support readability and understandability. However, some guy, for the sake of keeping the rule, had simply split his method into several short ones, chaining them by having a call at the end of each method invoking the next piece.

  2. Alex says:

    Yeah, there is clearly a correlation between LOC, complexity, and ability to test. I have read papers in software engineering where people did some fairly sophisticated analysis over previous released branches of software, bug rates, SCM change histories, etc to devise a scheme to predict which methods/classes were the “most likely to fail”. Not surprisingly, there was a high correlation to large methods and classes.

    I can actually point you to the method in question in this case, which is a repackaged version of ASM, specifically ClassReader which is the class on the front end of the bytecode traversal / modification.

    This class has an accept() method that is 100s of lines long and is basically just a very long procedure for read/visit a class’s bytecode. I don’t know for sure, but I suspect there is little reuse to be had from further breaking this method down. This method seems to suffer from having both a serial procedure with many many steps and big switch statements over bytecode choices. Both of those are good opportunities to further break down the code into understandable chunks.

    However, I sometimes wonder in such code whether you lose the flow (and thus negatively impact the understandability). In this case, I think the line count is high enough that it would overall benefit.

    Probably the other bigger benefit would be that you could actually test the chunks at a much finer-grained precision, which is basically impossible with 100s of lines of code in a method.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!