🦋11. Compiler stages and targets in Perl 6

Welcome to the new year! Today, let us switch for a while from the discussion about obsolete messages to something different.

Stages

If you followed the exercises in the previous posts, you might have noticed that some statistics was printed in the console when compiling Rakudo:

Stage start      :   0.000
Stage parse      :  44.914
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   4.245
Stage mast       :   9.476
Stage mbc        :   0.200

You could have also noticed that the bigger the file you changed, the slower it is compiled, up to dozens of seconds when you modify Grammar.pm.

It is also possible to see the statistics for your own programs. The --stagestats command-line option does the job:

$ ./perl6 --stagestats -e'say 42'
Stage start      :   0.000
Stage parse      :   0.065
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.001
Stage mast       :   0.003
Stage mbc        :   0.000
Stage moar       :   0.000
42

So, let’s look at these stages. Roughly, half of them is about Perl 6, and half is about MoarVM. In the case Rakudo is configured to work with the JVM backend, the output will differ in the second half.

The Perl 6 part is clearly visible in the src/main.nqp file:

# Create and configure compiler object.
my $comp := Perl6::Compiler.new();
$comp.language('perl6');
$comp.parsegrammar(Perl6::Grammar);
$comp.parseactions(Perl6::Actions);
$comp.addstage('syntaxcheck', :before);
$comp.addstage('optimize', :after);
hll-config($comp.config);
nqp::bindhllsym('perl6', '$COMPILER_CONFIG', $comp.config);

Look at the selected lines. If you have played with Perl 6 Grammars, you know that big grammars are usually split into two parts: the grammar itself and the actions. The Perl 6 compiler does exactly the same thing for the Perl 6 grammar. There are two files: src/Perl6/Grammar.nqp and src/Perl6/Actions.nqp.

When looking at src/main.nqp, it is not quite clear that there are eight stages. Add the following line to the file:

for ($comp.stages()) { nqp::say($_) }

Now, recompile Rakudo and run any program:

$ ./perl6 -e'say 42'
start
parse
syntaxcheck
ast
optimize
mast
mbc
moar
42

Here they are.

The names of the first three stages—start, parse, and syntaxcheck—are quite self-explanatory. The ast stage is the stage of building an abstract syntax tree, which is then optimized in the optimize stage.

At this point, your Perl 6 program has been transformed into the abstract syntax tree and is about to be passed to the backend, MoarVM virtual machine in our case. The stages names start with m. The mast stage is the stage of the MoarVM assembly (not abstract) syntax tree, mbc stands for MoarVM bytecode and moar is when the VM executes the code.

Targets

Now that we know the stages of the Perl 6 program workflow, let’s make use of them. The --target option lets the compiler to stop at the given stage and display the result of it. This option supports the following values: parse, syntaxcheck, ast, optimize, and mast. With those options, Rakudo prints the output as a tree, and you can see how the program changes at different stages.

Even for small programs, the output, especially with the abstract syntax tree or an assembly tree of the VM is quite verbose. Let’s look at the parse tree of the ‘Hello, World!’ program, for example:

$ ./perl6 --target=parse -e'say "Hello, World!"'
- statementlist: say "Hello, World!"
  - statement: 1 matches
    - EXPR: say "Hello, World!"
      - args:  "Hello, World!"
        - arglist: "Hello, World!"
          - EXPR: "Hello, World!"
            - value: "Hello, World!"
              - quote: "Hello, World!"
                - nibble: Hello, World!
      - longname: say
        - name: say
          - identifier: say
          - morename:  isa NQPArray
        - colonpair:  isa NQPArray

All the names here correspond to rules, tokens, or methods of the Grammar. You can find them in src/Perl6/Grammar.nqp. As an exercise, try predicting if the name is a method, or a rule, or a token. Say, a value should be a token, as it is supposed to be a compact string, while a statementlist is a rule.

🦋1. The proto keyword in Perl 6

Today, we are looking precisely at the proto keyword. It gives a hint for the compiler about your intention to create multi-subs.

Example 1

Consider an example of the function that either flips a string or negates an integer.

multi sub f(Int $x) {
    return -$x;
}

multi sub f(Str $x) {
    return $x.flip;
}

say f(42);      # -42
say f('Hello'); # olleH

What if we create another variant of the function that takes two arguments.

multi sub f($a, $b) {
    return $a + $b;
}

say f(1, 2); # 3

This code perfectly works, but it looks like its harmony is broken. Even if the name of the function says nothing about what it does, we intended to have a function that somehow returns a ‘reflected’ version of its argument. The function that adds up two numbers does not fit this idea.

So, it is time to clearly announce the intention with the help of the proto keyword.

proto sub f($x) {*}

Now, an attempt of calling the two-argument function won’t compile:

===SORRY!=== Error while compiling proto.pl
Calling f(Int, Int) will never work with proto signature ($x)
at proto.pl:15
------> say f(1,2)

The calls of the one-argument variants work perfectly. The proto-definition creates a pattern for the function f: its name is f, and it takes one scalar argument. Multi-functions specify the behaviour and narrow their expertise to either integers or strings.

Example 2

Another example involves a proto-definition with two typed arguments in the function signature.

proto sub g(Int $x, Int $y) {*}

In this example, the function returns a sum of the two integers. When one of the numbers is much bigger than the other, the smaller number is just ignored as being not significant enough:

multi sub g(Int $x, Int $y) {
   return $x + $y;
}

multi sub g(Int $x, Int $y where {$y > 1_000_000 * $x}) {
   return $y;
}

Call the function with integer arguments and see how Perl 6 picks the correct variant:

say g(1, 2);          # 3
say g(3, 10_000_000); # 10000000

Didn’t you forget that the prototype insists on two integers? Try it out passing floating-point numbers:

say g(pi, e);

We got a compile-time error:

===SORRY!=== Error while compiling proto-int.pl
Calling g(Num, Num) will never work with proto signature (Int $x, Int $y)
at proto-int.pl:13
------> say ⏏g(pi, e);

The prototype has caught the error in the function usage. What happens if there is no proto for the g sub? The function is still not called, but the error message is different. It happens at run-time this time:

Cannot resolve caller g(3.14159265358979e0, 2.71828182845905e0); none of these signatures match:
 (Int $x, Int $y)
 (Int $x, Int $y where { ... })
 in block <unit> at proto-int.pl line 13

We still have no acceptable signature for the floating-point numbers, but the compiler cannot see that until the program flow reaches the code.