📘 AST (abstract syntax tree) and attributes in Perl 6 grammars

Now, we are ready to simplify the grammar again after we split the assignment and printout rules into two alternatives each. The difficulty was that without the split, it was not possible to understand which branch had been triggered. You either needed to read the value from the value token or get the name of the variable from the identifier token and look it up in the variable storage.

Perl 6’s grammars offer a great mechanism that is common in language parsing theory, the abstract syntax tree, shortened as AST.

First of all, update the rules and remove the alternatives from some of them. The only rule containing two branches is the expression rule.

rule assignment {
    <identifier> '=' <expression>
}
rule printout {
    'print' <expression>
}
rule expression {
    | <identifier>
    | <value>
}

The syntax tree that is built during the parse phase can contain the results of the calculations in the previous steps. The Match object has a field ast, dedicated especially to keep the calculated values on each node. It is possible to simply read the value to get the result of the previously completed actions. The tree is called abstract because how the value is calculated is not very important. What is important is that when the action is triggered, you have a single place with the result you need to complete an action.

The action can save its own result (and thus pass it further on the tree) by calling the $/.make method. The data you save there are accessible via the made field, which has the synonym ast.

Let’s fill the attribute of the syntax tree for the identifier and value tokens. The match with an identifier produces the variable name; when the value is found, the action generates a number. Here are the methods of the actions’ class.

method identifier($/) {
    $/.make(~$0);
}
method value($/) {
    $/.make(+$0);
}

Move one step higher, where we build the value of the expression. It can be either a variable value or an integer.

As the expression rule has two alternatives, the first task will be to understand which one matches. For that, check the presence of the corresponding fields in the $/ object.

(If you use the recommended variable name $/ in the signature of the action method, you may access its fields differently. The full syntax is $/<identifier>, but there is an alternative version $<identifier>.)

The two branches of the expression method behave differently. For a number, it extracts the value directly from the captured substring. For a variable, it gets the value from the %var hash. In both cases, the result is stored in the AST using the make method.

method expression($/) {
    if $<identifier> {
        $/.make(%var{$<identifier>});
    }
    else {
        $/.make(+$<value>);
    }
}

To use the variables that are not yet defined, we can add the defined-or operator to initialise the variable with the zero value.

$/.make(%var{$<identifier>} // 0);

Now, the expression will have a value attributed to it, but the source of the value is not known anymore. It can be a variable value or a constant from the file. This makes the assignment and printout actions simpler:

method printout($/) {
    say $<expression>.ast;
}

All you need for printing the value is to get it from the ast field.

For the assignment, it is a bit more complex but can still be written in a single line.

method assignment($/) {
    %var{$<identifier>} = $<expression>.made;
}

The method gets the $/ object and uses the values of its identifier and expression elements. The first one is converted to the string and becomes the key of the %var hash. From the second one, we get the value by fetching the made attribute.

Finally, let us stop using the global variable storage and move the hash into the action class (we don’t need it in the grammar itself). It thus will be declared as has %!var; and used as a private key variable in the body of the actions: %!var{…}.

After this change, it is important to create an instance of the actions class before paring it with a grammar:

Lang.parsefile(
    'test.lang',
    :actions(LangActions.new())
);

Here is the complete code of the parser with actions.

grammar Lang {
    rule TOP {
        ^ <statements> $
    }
    rule statements {
        <statement>+ %% ';'
    }
    rule statement {
        | <assignment>
        | <printout>
    }
    rule assignment {
        <identifier> '=' <expression>
    }
    rule printout {
        'print' <expression>
    }
    rule expression {
        | <identifier>
        | <value>
    }
    token identifier {
        (<:alpha>+)
    }
    token value {
        (\d+)
    }
}

class LangActions {
    has %var;

    method assignment($/) {
        %!var{$<identifier>} = $<expression>.made;
    }
    method printout($/) {
        say $<expression>.ast;
    }
    method expression($/) {
        if $<identifier> {
            $/.make(%!var{$<identifier>} // 0);
        }
        else {
            $/.make(+$<value>);
        }
    }
    method identifier($/) {
        $/.make(~$0);
    }
    method value($/) {
        $/.make(+$0);
    }
} 

Lang.parsefile(
    'test.lang',
    :actions(LangActions.new())
);

📘 Actions in Perl 6 grammars

The grammars in Perl 6 allow actions in response to the rule or token matching. Actions are code blocks that are executed when the corresponding rule or token is found in the parsed text. Actions receive an object $/, where you can see the details of the match. For example, the value of $<identifier> will contain an object of the Match type with the information about the substring that actually was consumed by the grammar.

rule assignment {
    | <identifier> '=' <value>
          {say "$<identifier>=$<value>"}
    | <identifier> '=' <identifier>
}

If you update the grammar with the action above and run the programme against the same sample file, then you will see the substring x=42 in the output.

The Match objects are converted to strings when they are interpolated in double quotes as in the given example: “$<identifier>=$<value>”. To use the text value from outside the quoted string, you should make an explicit typecast:

rule assignment {
    | <identifier> '=' <value>
          {%var{~$<identifier>} = +$<value>}
    | <identifier> '=' <identifier>
}

So far, we’ve got an action for assigning a value to a variable and can process the first line of the file. The variable storage will contain the pair {x => 42}.

In the second alternative of the assignment rule, the <identifier> name is mentioned twice; that is why you can reference it as to an array element of $<identifier>.

rule assignment {
    | <identifier> '=' <value>
      {
          %var{~$<identifier>} = +$<value>
      }
    | <identifier> '=' <identifier>
      {
          %var{~$<identifier>[0]} =
          %var{~$<identifier>[1]}
      }
}

This addition to the code makes it possible to parse an assignment with two variables: y = x. The %var hash will contain both values: {x => 42, y => 42}.

Alternatively, capturing parentheses may be used. In this case, to access the captured substring, use special variables, such as $0:

rule assignment {
    | (<identifier>) '=' (<value>)
      {
           %var{$0} = +$1
      } 
    | (<identifier>) '=' (<identifier>)
      {
           %var{$0} = %var{$1}
      }
}

Here, the unary ~ is no longer required when the variable is used as a hash key, but the unary + before $1 is still needed to convert the Match object to a number.

Similarly, create the actions for printing.

rule printout {
    | 'print' <value>
      {
          say +$<value>
      }
    | 'print' <identifier>
      {
          say %var{$<identifier>}
      }
}

Now, the grammar is able to do all the actions required by the language design, and it will print the requested values:

42
42
7

As soon as we used capturing parentheses in the rules, the parse tree will contain entries named as 0 and 1, together with the named strings, such as identifier. You can clearly see it when parsing the y = x string:

statement => 「y = x」
 assignment => 「y = x」
  0 => 「y」
   identifier => 「y」
  1 => 「x」
   identifier => 「x」

An updated parser looks like this:

my %var; 

grammar Lang {
    rule TOP {
        ^ <statements> $
    }
    rule statements {
        <statement>+ %% ';'
    }
    rule statement {
        | <assignment>
        | <printout>
    }
    rule assignment {
        | (<identifier>) '=' (<value>)
          {
              %var{$0} = +$1
          }
        | (<identifier>) '=' (<identifier>)
          {
              %var{$0} = %var{$1}
          }
    }
    rule printout {
        | 'print' <value>
          {
              say +$<value>
          }
        | 'print' <identifier>
          {
              say %var{$<identifier>}
          }
    }
    token identifier {
        <:alpha>+
    }
    token value {
        \d+
    }
}

Lang.parsefile('test.lang');

For convenience, it is possible to put the code of actions in a separate class. This helps a lot when the actions are more complex and contain more than one or two lines of code.

To create an external action, create a class, which will later be referenced via the :actions parameter upon the call of the parse or parsefile methods of the grammar. As with built-in actions, the actions in an external class receive the $/ object of the Match type.

First, we will train on a small isolated example and then return to our custom language parser.

grammar G {
    rule TOP {^ \d+ $}
}

class A {
    method TOP($/) {say ~$/}
}

G.parse("42", :actions(A));

Both the grammar G and the action class A have a method called TOP. The common name connects the action with the corresponding rule. When the grammar parses the provided test string and consumes the value of 42 by the ^ \d $ rule, the A::TOP action is triggered, and the $/ argument is passed to it, which is immediately printed.