Implementing a scripting language with Antlr (Part 3: Tree Walker)

19

In Part 1 and Part 2 of this series, we focused on writing the lexer and parser for our scripting language. In this entry, we focus on writing a tree walking grammar to build custom objects from our AST. When we closed the previous entry, we had modified our parser to generate the AST we wished to build but were reviewing our options for actually building our own objects from that AST.

Before we dive into that, let’s first step back and write some simple objects that will hold the output state of this transformation. The objects are Script, which holds a series of named Blocks, each with a list of Commands, which have a name and properties. In the interest of brevity, I’m not going to show the code here (see the links at the end). I defined a toString() on each object that should reproduce a valid script (we’ll use that for testing later).

Tree grammar rules are written with a lisp-ish notation #(node child1 child2 …). This will match a node (based on token) if it has children child1, child2, etc. Deeper structures can be embedded within this.

Let’s build this tree grammar from the top down. First we’ll start with the script node itself. We’ll look for a SCRIPT token with child tokens that are BLOCKs.

class ScriptWalker extends TreeParser;

	script returns [Script script]
	{
	    script = new Script();
	}
	    :  #( SCRIPT (block[script] )* );

This tree parser has a single rule (“script”) which matches the SCRIPT node, then matches any number of “block” sub-rules. We see some new notation here as well. The script rule has a returns [Script script] section that defines the return type of this rule (and the generated Java method). The [ ] are used instead of ( ) as the ( ) is used as a grouping construct in the rule definitions instead. Then we see a code block or action in { } that initializes the Script instance. Within the rule itself, we see a reference to the block rule and the script variable is passed as an argument to the rule. The script variable will collect Block objects as they are found.

Let’s continue defining the tree grammar by defining the block rule:

	block [Script script]
	{
	    Block block = null;
	}
	    :  #(BLOCK
		     (
		        b:STRING  {  block = new Block(b.getText()); }
			(command[block])*
		    )
		)
		{ script.addBlock(block); }
		;

Here the block rule takes a parameter in [ ], which is the Script to add to. We look for a BLOCK token with a child block name (the STRING token) and any number of commands, as defined in a separate command rule.

Let’s finish it out with the command and property rules:

	command [Block block]
	{
	    Command command = null;
	}
	    :  #(COMMAND
		    (
		        c:STRING
                        {
			    command = new Command(c.getText());
			    block.addCommand(command);
			}
			(property[command])*
		    )
		);

	property [Command command]
		: #( PROPERTY
			(
		    	    p1:STRING (p2:STRING)?
			    {
				if(p2 == null) {
				    command.addProperty(p1.getText());
				} else {
			   	    command.addProperty(p1.getText(), p2.getText());
				}
			    }
			)
		);

These should now be pretty familiar. One note is on the property rule – the second STRING child node is optional (only there if using the key=value syntax) and that’s indicated with the ?, just as in the other grammars.

One item of interest here is what happens if the string is a quoted string. In that case, the quotes will be included in the token. So, we need to strip the quotes if they exist. To do this, you can add an arbitrary chunk of code after the definition of the ScriptWalker that defines a helper method:

class ScriptWalker extends TreeParser;

{
    private String removeQuotes(String value) {
        if(value != null && value.startsWith("\"")) {
            return value.substring(1, value.length()-1);
        }
        return value;
    }
}

And then we just wrap a call to removeQuotes() around each call to getText() in the prior functions. And that’s it! We’re all done with our ScriptWalker. Here’s a dumb test class to try parsing the file into an AST, then running the walker to generate objects, then print the objects back out as a script:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.Reader;

import antlr.CommonAST;

public class TestWalker {

    public static void main(String args[]) throws Exception {
        System.out.println("Walking: " + args[0]);

        Reader reader = new BufferedReader(new FileReader(args[0]));
        ScriptLexer lexer = new ScriptLexer(reader);
        ScriptParser parser = new ScriptParser(lexer);

        parser.script();
        CommonAST t = (CommonAST)parser.getAST();

        // Traverse the tree created by the parser
        ScriptWalker walker = new ScriptWalker();
        Script script = walker.script(t);
        System.out.println(script);
    }
}

I’ve defined my tree parser in the same file along with my lexer and parser, so I need to call antlr.Tool again to regenerate everything. You’ll see a new generated file ScriptWalker.java show up. Compile, then run java TestWalker test.script to try it out:

Walking: src/test.script
morning {
  wake_up;
  eat bacon;
  brush_teeth;
  wash hair=true minutes=10;
}

evening {
  read book="How To Be a Ninja";
  brush_teeth;
  sleep hours=8;
}

Success!! We can now move on to writing an interpreter for this generic scripting language. For the complete lexer, parser, and tree parser, see script.g in the file list below. Everything is in one file and even with comments, it’s not too long.

For a language as simple as this, you might wonder why you would go to the trouble of learning and using a tool like Antlr or JavaCC. For one reason, that code is tedious, error-prone, and very likely to be a maintenance nightmare. For another, languages often start simple and increase in complexity with new requirements. Generally speaking, a simple change in the language will result in a simple change in the grammar. With a hand-written parser, often a simple change in the language can be a daunting task.

Hopefully you had fun working through this example with me. I’m considering writing a follow-up series to show the actual interpreter code as well. If you would be interested in such a series, drop me a note by email (see upper-right corner) or comment. Thanks for reading!

Links to the rest of the series:

Comments

19 Responses to “Implementing a scripting language with Antlr (Part 3: Tree Walker)”
  1. Patrick Wright says:

    Yes, please continue with the interpreter. You write well and it would be great to have an end-to-end example. Most postings I’ve seen in this area are either too complex or incomplete. Don’t stop! It’s good stuff. Cheers Patrick

  2. Oh Yeah says:

    Definitely, please continue the series with the interpreter implementation!

  3. Paul says:

    I’d be very interested in seeing this series continue to talk about the actual interpreter!

  4. Florian says:

    It was much fun reading part 1 to 3. Very good job! It would be great if you would continue this series. This would round up the whole thing.

  5. Bob says:

    Great Stuff! Please, Please continue with the actual interpreter (and please do it soon) Cheers for a great job

  6. Jen says:

    I’d be interested in seeing the interpreter as well. Thanks for this helpful article!

  7. Olle Jonsson says:

    Yes, everybody loves this series.

    (A question: I guess this article series is using ANTLR v2 syntax, is that right? Some of the syntax details have a different aroma than in v3, covered in the ANTLR reference book, which currently is “only” in PDF beta.)

    http://www.pragmaticprogrammer.com/titles/tpantlr

  8. Alex says:

    Yep, this is all Antler v2 syntax. I hope to update everything to v3 at some point.

    Also, to anyone that has requested the rest of the interpreter, I am working on pulling it out and making it accessible, but there is enough code that I’m struggling with how to make it accessible via blog. Once I get some time and figure all that out, I’ll post it here.

    Alex

  9. Deniz Oguz says:

    I also like the series of articles. Please continue.

  10. Qwang says:

    Very good introduction of Antlr. Please continue.

  11. JC says:

    Very well done. I believe this would be useful for teaching me to do write the search engine query parser.

  12. Naimal says:

    Hi,
    I read your very informative article. I tried to read many before but as mentioned in one of the post earlier, most of the time they will end up just a generated parser and thats all…But in this case we are having something very useful. I’m eagerly waiting for the actual interpreter.
    thanks

  13. Jerome says:

    Interesting read, please continue with the interpreter!

  14. corku says:

    when oh when will the interpreter come!

    still excellent stuff!

  15. Alex says:

    Sadly, I suspect the interpreter will never see the light of day. This example is based on code I wrote at a former job and while I do have a simplified version of it somewhere, I don’t know that I’ll ever get around to pulling it out or updating to Antlr 3. Maybe I will for a No Fluff conference talk at some point. If I do, I’ll be sure to post it.

  16. Lawrence Tsang says:

    Good explanation for an ANTLR starter. I’ve got a very practical experience with your scripts. Although the scripts run on ANTLR v2 (i use “antlr-2.7.7.jar” included in ANTLR v3.0.1), they trigger a deeper understanding of the parser-generating process.

    It’s best if the “interpreter” part could come. No matter it is in ANTLR v2 or v3.

    Good work.

Trackbacks

Check out what others are saying about this post...
  1. [...] Implementing a scripting language with Antlr (Part 3: Tree Walker) [...]

  2. [...] moich poszukiwaniach trafiłem na tutorial Alexa Millera, niestety używa on składni antlr’a 2. W moich bojach udało mi się uruchomić opisywane [...]

  3. [...] course, so I sought help from my good friend Bob. We chatted a bit about it, and he pointed me to an article about another feature of ANTLR, tree walker grammars. I had already skimmed the article, but reread [...]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!