13. Let 1 + 2 * 3 = 9

Is it easy to break the behaviour of Perl 6? Well, the answer probably depends on what exactly you want to break.

Playing with operator precedence, I wanted to change the rules of arithmetical operators + and * so that they are executed in different order, namely, multiplication first, addition second.

Sounds like an easy task. Go to src/Perl6/Grammar.nqp and change a couple of lines that set the precedence of the + and * infixes:

- token infix:sym<*>    { <sym> <O(|%multiplicative)> }
+ token infix:sym<*>    { <sym> <O(|%additive)> }
. . .
- token infix:sym<+>    { <sym> <O(|%additive)> }
+ token infix:sym<+>    { <sym> <O(|%multiplicative)> }

Ready? Compile!

Recompiling the grammar takes a long time, so at first it looks promising, but after a few seconds, the compilation stops with an error:

Month out of range. Is: -935111296, should be in 1..12

Makefile:517: recipe for target 'perl6-m' failed
make: *** [perl6-m] Error 1

Month out of range?? Oh, we changed the rules of the Universe and before Perl 6 is even compiled, the new rules of arithmetics are already applied.

OK, let’s add some anaesthesia and suppress the error message. The code that checks for the correct month value is located in src/core/DateTime.pm, namely, inside the DateTime constructor. Comment that line out:

method !new-from-positional(DateTime:
    Int() $year,
    Int() $month,
    Int() $day,
    Int() $hour,
    Int() $minute,
        $second,
        %extra,
    :$timezone = 0,
    :&formatter,
) {
    # (1..12).in-range($month,'Month');
    (1 .. self.DAYS-IN-MONTH($year,$month)).in-range($day,'Day');
    (0..23).in-range($hour,'Hour');
    (0..59).in-range($minute,'Minute');
    (^61).in-range($second,'Second');
    . . .

This time, the month range check doesn’t stop us from going further but another error breaks in:

MVMArray: Index out of bounds

Makefile:517: recipe for target 'perl6-m' failed
make: *** [perl6-m] Error 1

Looks cryptic. MVMArray is a MoarVM array, obviously. So, we not only broke Perl 6 but MoarVM, too. Let’s go fix it.

The sources of MoarVM are located in a separate git repository at nqp/MoarVM. The message we saw can be found in nqp/MoarVM/src/6model/reprs/VMArray.c:

if (index < 0)
    MVM_exception_throw_adhoc(tc, "MVMArray: Index out of bounds");

There are two places like that, so let’s not guess which of them we need and preventatively change both of them to the following:

if (index < 0)
    index = 0;
    // MVM_exception_throw_adhoc(tc, "MVMArray: Index out of bounds");

(This is C, not Perl.)

From nqp/MoarVM, compile and re-install MoarVM and later try compiling Rakudo:

~/rakudo/nqp/MoarVM$ make
~/rakudo/nqp/MoarVM$ make install

~/rakudo/nqp/MoarVM$ cd ../..
~/rakudo$ make

This time, the error pops up immediately (as no NQP files are compiled):

Use of Nil in numeric context

Use of Nil in numeric context

Day out of range. Is: -51, should be in 1..0

Makefile:517: recipe for target 'perl6-m' failed
make: *** [perl6-m] Error 1

It looks like we can ignore Nils at the moment, but the DateTime hurts us again. We know the remedy:

# (1..12).in-range($month,'Month');
# (1 .. self.DAYS-IN-MONTH($year,$month)).in-range($day,'Day');

Yahoo! This time, the compilation process was calm and we got a new perl6 executable, which works as we wanted:

$ ./perl6 -e'say 1+2*3'
9

Don’t forget to restore the files before further experiments with Perl 6 🙂

Update

In the comment to this blog post, you can see a reference to the commit, which changes the way Rakudo checks the validity of the DateTime object. Instead of using the in-range method, simpler checks are used now, for example:

1 <= $month <= 12
    || X::OutOfRange.new(:what<Month>,:got($month),:range<1..12>).throw;

Here are the time measures of the two runs of a loop creating DateTime objects before and after the update:

time ./perl6 -e'DateTime.new(2018,1,5,12,30,0) for ^500000'
real 0m7.261s
user 0m7.276s
sys 0m0.020s

. . .

$ time ./perl6 -e'DateTime.new(2018,1,5,12,30,0) for ^500000'
real 0m4.457s
user 0m4.476s
sys 0m0.012s

12. The beginning of the Grammar of Perl 6

Yesterday, we talked about the stages of the compiling process of a Perl 6 program and saw the parse tree of a simple ‘Hello, World!’ program. Today, our journey begins at the starting point of the Grammar.

So, here is the program:

say 'Hello, World!'

The grammar of Perl 6 is written in Not Quite Perl 6 and is located in Grammar.nqp 🙂 And that is amazing, as if you know how to work with grammars, you will be able to read the heart of the language.

The Perl 6 Grammar is defined as following:

grammar Perl6::Grammar is HLL::Grammar does STD {
    . . .
}

It is a class derived from HLL::Grammar (HLL stands for High-Level Language) and implements the STD (Standard) role. Let’s not focus on the hierarchy for now, though.

The Grammar has the TOP method. Notice that this is a method, not a rule or a token. The main feature of the method is that it is assumed that it contains some Perl 6 code, not regexes.

As we did earlier, let’s use our beloved method of reverse engineering by adding our own printing instructions to different places of Rakudo sources, recompiling it and watching how it works. The first target is the TOP method:

grammar Perl6::Grammar is HLL::Grammar does STD {
    my $sc_id := 0;
    method TOP() {
        nqp::say('At the TOP');
        . . .

As this is NQP, you need to call functions in the nqp:: namespace (although say is available without the namespace prefix, too). One of the notable differences between Perl 6 and NQP is the need to always have parentheses in function calls: if you omit them, the code won’t compile.

Perl inside regexes inside Perl

For training purposes, let’s try adding similar instruction to the comp_unit token (computational unit). This token is a part of the Grammar and is also called as one of the first methods during parsing Perl 6.

The body of the above shown TOP method is written in NQP. The body of a token is another language, and you should use regexes instead. Thus, to embed an instruction in Perl (or NQP), you need to switch the language.

There are two options: use a code block in curly braces or the colon-prefixed syntax that is very widely used in Rakudo sources to declare variables.

token comp_unit {
    {
        nqp::say('comp_unit');
    }
    :my $x := nqp::say('Var in grammar');
    . . .

Notice that it NQP, the binding := operator have to be used in place of the assignment =.

Statement list

So, back to the grammar. In the output that the --target=parse command-line option produces, we can see a statementlist node at the top of the parse tree. Let us look at its implementation in the Grammar. With some simplifications, it looks very lightweight:

rule statementlist($*statement_level = 0) {
    . . .
    <.ws>
    [
    | $
    | <?before <.[\)\]\}]>>
    | [ <statement> <.eat_terminator> ]*
    ]
    . . .
}

Basically, it says that a statement list is a list of zero or more statements. Square brackets in Perl 6 grammars create a non-capturing group, and we see three alternatives inside. One of the alternatives is just the end of data, another one is the end of the block (e. g., ending with a closing curly brace). For the sake of art, an additional vertical bar is added before the first alternative too.

The top-level rule is simple but the rest is becoming more and more complex. For example, let’s have a quick look at the eat terminator:

token eat_terminator {
    || ';'
    || <?MARKED('endstmt')> <.ws>
    || <?before ')' | ']' | '}' >
    || $
    || <?stopper>
    || <?before [if|while|for|loop|repeat|given|when] » > {
       $/.'!clear_highwater'(); self.typed_panic(
          'X::Syntax::Confused', reason => "Missing semicolon" ) }
    || { $/.typed_panic( 'X::Syntax::Confused', reason => "Confused" ) }
}

And this is just a small separator between the statements 🙂

The grammar file is more than 5500 lines of code; it is not possible to discuss and understand it all in a single blog post. Let us stop here for today and continue with easier stuff tomorrow.

11. Compiler stages and targets in Perl 6

Welcome to the new year! Today, let us switch for a while from the discussion about obsolete messages to something different.

Stages

If you followed the exercises in the previous posts, you might have noticed that some statistics was printed in the console when compiling Rakudo:

Stage start      :   0.000
Stage parse      :  44.914
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   4.245
Stage mast       :   9.476
Stage mbc        :   0.200

You could have also noticed that the bigger the file you changed, the slower it is compiled, up to dozens of seconds when you modify Grammar.pm.

It is also possible to see the statistics for your own programs. The --stagestats command-line option does the job:

$ ./perl6 --stagestats -e'say 42'
Stage start      :   0.000
Stage parse      :   0.065
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.001
Stage mast       :   0.003
Stage mbc        :   0.000
Stage moar       :   0.000
42

So, let’s look at these stages. Roughly, half of them is about Perl 6, and half is about MoarVM. In the case Rakudo is configured to work with the JVM backend, the output will differ in the second half.

The Perl 6 part is clearly visible in the src/main.nqp file:

# Create and configure compiler object.
my $comp := Perl6::Compiler.new();
$comp.language('perl6');
$comp.parsegrammar(Perl6::Grammar);
$comp.parseactions(Perl6::Actions);
$comp.addstage('syntaxcheck', :before);
$comp.addstage('optimize', :after);
hll-config($comp.config);
nqp::bindhllsym('perl6', '$COMPILER_CONFIG', $comp.config);

Look at the selected lines. If you have played with Perl 6 Grammars, you know that big grammars are usually split into two parts: the grammar itself and the actions. The Perl 6 compiler does exactly the same thing for the Perl 6 grammar. There are two files: src/Perl6/Grammar.nqp and src/Perl6/Actions.nqp.

When looking at src/main.nqp, it is not quite clear that there are eight stages. Add the following line to the file:

for ($comp.stages()) { nqp::say($_) }

Now, recompile Rakudo and run any program:

$ ./perl6 -e'say 42'
start
parse
syntaxcheck
ast
optimize
mast
mbc
moar
42

Here they are.

The names of the first three stages—start, parse, and syntaxcheck—are quite self-explanatory. The ast stage is the stage of building an abstract syntax tree, which is then optimized in the optimize stage.

At this point, your Perl 6 program has been transformed into the abstract syntax tree and is about to be passed to the backend, MoarVM virtual machine in our case. The stages names start with m. The mast stage is the stage of the MoarVM assembly (not abstract) syntax tree, mbc stands for MoarVM bytecode and moar is when the VM executes the code.

Targets

Now that we know the stages of the Perl 6 program workflow, let’s make use of them. The --target option lets the compiler to stop at the given stage and display the result of it. This option supports the following values: parse, syntaxcheck, ast, optimize, and mast. With those options, Rakudo prints the output as a tree, and you can see how the program changes at different stages.

Even for small programs, the output, especially with the abstract syntax tree or an assembly tree of the VM is quite verbose. Let’s look at the parse tree of the ‘Hello, World!’ program, for example:

$ ./perl6 --target=parse -e'say "Hello, World!"'
- statementlist: say "Hello, World!"
  - statement: 1 matches
    - EXPR: say "Hello, World!"
      - args:  "Hello, World!"
        - arglist: "Hello, World!"
          - EXPR: "Hello, World!"
            - value: "Hello, World!"
              - quote: "Hello, World!"
                - nibble: Hello, World!
      - longname: say
        - name: say
          - identifier: say
          - morename:  isa NQPArray
        - colonpair:  isa NQPArray

All the names here correspond to rules, tokens, or methods of the Grammar. You can find them in src/Perl6/Grammar.nqp. As an exercise, try predicting if the name is a method, or a rule, or a token. Say, a value should be a token, as it is supposed to be a compact string, while a statementlist is a rule.

10. Obsolete syntax error messages, part 2

Today, we continue exploring the error messages that Rakudo developers embedded to detect old Perl 5 constructions in the Perl 6 code.

The obs method

But first, let’s make a small experiment and add a call to the obs method in the rule parsing the for keyword.

rule statement_control:sym<for> {
    <sym><.kok> {}
    [ <?before 'my'? '$'\w+\s+'(' >
        <.obs('Hello', 'World!')> <.typed_panic: 'X::Syntax::P5'> ]?
    [ <?before '(' <.EXPR>? ';' <.EXPR>? ';' <.EXPR>? ')' >
        <.obs('C-style "for (;;)" loop', '"loop (;;)"')> ]?
    <xblock(1)>
}

The dot before the name of the method prevents creating a named element in the Match object. Actually, that is not that important as soon as the obs call generates an exception. In many other cases, the dot is very useful, of course.

Compile Rakudo and feed it with the erroneous Perl 6 code:

$ ./perl6 -e'for my $x (@a) {}'
===SORRY!=== Error while compiling -e
Unsupported use of Hello; in Perl 6 please use World!
at -e:1
------> for ⏏my $x (@a) {}

As you see, we’ve generated some rubbish message but the X::Syntax::P5 exception did not have a chance to appear, as the parsing stopped at the place the obs method was called.

No foreach anymore

Another error message appears when you try using the foreach keyword:

$ ./perl6 -e'foreach @a {}'
===SORRY!=== Error while compiling -e
Unsupported use of 'foreach'; in Perl 6 please use 'for'
at -e:1
------> foreach⏏ @a {}

Notice that the compiler stopped even before figuring out that the @a variable is not defined.

Here is the rule that finds the outdated keyword:

rule statement_control:sym<foreach> {
    <sym><.end_keyword> <.obs("'foreach'", "'for'")>
}

The end_keyword method is a token that matches the right edge of the keyword; this is not a method to report about the end of support of the keyword 🙂 You can see this method in many other rules in the grammar.

token end_keyword {
    » <!before <.[ \( \\ ' \- ]> || \h* '=>'>
}

No do anymore

Another potential mistake is creating the do blocks instead of the new repeat/while or repeat/until.

$ ./perl6 -e'do {} while 1'
===SORRY!=== Error while compiling -e
Unsupported use of do...while;
in Perl 6 please use repeat...while or repeat...until
at -e:1

This time, the logic for detecting the error is hidden deeply inside the statement token:

token statement($*LABEL = '') {
    . . .
    my $sp := $<EXPR><statement_prefix>;
    if $sp && $sp<sym> eq 'do' {
         my $s := $<statement_mod_loop><sym>;
         $/.obs("do..." ~ $s, "repeat...while or repeat...until");
    }
    . . .
}

The second symbol is taken from the $<statement_mod_loop><sym> value, so the error message contains the proper instruction for both do {} until and do {} for blocks.

Let’s stop here for today. We’ll examine more obsolete syntax in the next year. Meanwhile, I wish you all the best and success with using Perl 6 in 2018!

6. The dd routine of Rakudo Perl 6

In Rakudo, there is a useful routine dd, which is not a part of Perl 6 itself. It dumps its argument(s) in a way that you immediately see the type and content of a variable. For example:

$ ./perl6 -e'my Bool $b = True; dd($b)'
Bool $b = Bool::True

It works well with data of other types, for example, with arrays:

$ ./perl6 -e'my @a = < a b c >; dd(@a)'
Array @a = ["a", "b", "c"]

Today, we will look at the definition of the dd routine.

It is located in the src/core/Any.pm module as part of the Any class. The code is quite small, so let us show it here:

sub dd(|) {
    my Mu $args := nqp::p6argvmarray();
    if nqp::elems($args) {
        while $args {
            my $var  := nqp::shift($args);
            my $name := try $var.VAR.?name;
            my $type := $var.WHAT.^name;
            my $what := $var.?is-lazy
              ?? $var[^10].perl.chop ~ "... lazy list)"
              !! $var.perl;
            note $name ?? "$type $name = $what" !! $what;
        }
    }
    else { # tell where we are
        note .name
          ?? "{lc .^name} {.name}{.signature.gist}"
          !! "{lc .^name} {.signature.gist}"
          with callframe(1).code;
    }
    return
}

Call with arguments

The vertical bar, which we have already seen earlier, is a signature that captures argument lists with no type checking. It is not possible to omit it and leave empty parentheses, as in that case the routine can only be called without arguments.

Inside, some NQP-magic happens but that is quite readable for us. If there are arguments, the routine loops over them, shifting the next argument in each cycle.

Then, there is an attempt to get the name, type and content:

my $name := try $var.VAR.?name;
my $type := $var.WHAT.^name;

Notice the presence of try and ? in the method call. We already saw the pattern when we were taking about string interpolation. The ?name is only called on an object if the method exists there, and does not generate an error if not.

The content is a bit more difficult thing:

my $what := $var.?is-lazy
    ?? $var[^10].perl.chop ~ "... lazy list)"
    !! $var.perl;

The result depends on whether an object is a lazy list or not. For example, try dumping an infinite range:

$ ./perl6 -e'dd 1..∞'
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10... lazy list)

Only the first ten items are listed. For a non-lazy object, the perl method is called.

Finally, the result is printed to STDERR:

note $name ?? "$type $name = $what" !! $what;

Call with no arguments

The second branch of the dd routine is triggered when there are no arguments. In that case, the routine tries to give some information about the place where it is called. Look at the following example:

sub f() { dd }
f;

The result of running this program shows the name and the signature of the function:

sub f()

A good use case can be thus to use dd in multi-functions instead of printing manual text messages.

multi sub f(Int) { dd }
multi sub f(Str) { dd }

f(42);
f('42');

Run the program, and it prints an extremely useful debugging information:

sub f(Int)
sub f(Str)

That’s all for today. See you tomorrow!

4. Exploring the Bool type in Perl 6, part 2

Today, we are continuing reading the source codes of the Bool class: src/core/Bool.pm, and will look at the methods that calculate the next or the previous values, or increment and decrement the values. For the Boolean type, it sounds simple, but you still have to determine the behaviour of the edge cases.

pred and succ

In Perl 6, there are two complementary methods: pred and succ that should return, correspondingly, the preceding and the succeeding values. This is how they are defined for the Bool type:

Bool.^add_method('pred', my method pred() { Bool::False });
Bool.^add_method('succ', my method succ() { Bool::True });

As you see, these methods are regular (not multi) methods and do not distinguish between defined or undefined arguments. The result neither depends on the value!

If you take two Boolean variables, one set to False and another to True, the prec method returns False for both variables:

my Bool $f = False;
my Bool $t = True;
my Bool $u;

say $f.pred;    # False
say $t.pred;    # False
say $u.pred;    # False
say False.pred; # False
say True.pred;  # False

Similarly, the succ method always returns True:

say $f.succ;    # True
say $t.succ;    # True
say $u.succ;    # True
say False.succ; # True
say True.succ;  # True

Increment and decrement

The variety of the ++ and -- operations is even more, as another dimension—prefix or postfix—is added.

First, the two prefixal forms:

multi sub prefix:<++>(Bool $a is rw) { $a = True; }
multi sub prefix:<-->(Bool $a is rw) { $a = False; }

When you read the sources, you start slowly understand that many strangely behaving bits of the language may be well explained, because the developers have to think about huge combinations of arguments, variables, positions, etc., about which you may not even think when using the language.

The prefix forms simply set the value of the variable to either True or False, and it happens for both defined and undefined variables. The is rw trait allows modifying the argument.

Now, the postfix forms. This time, the state of the variable matters.

multi sub postfix:<++>(Bool:U $a is rw --> False) { $a = True }
multi sub postfix:<-->(Bool:U $a is rw) { $a = False; }

We see a new element of syntax—the return value is mentioned after an arrow in the sub signature:

(Bool:U $a is rw --> False)

The bodies of the operators that work on defined variables, are wordier. If you look at the code precisely, you can see that it avoids assigning the new value to a variable if, for example, a variable containing True is incremented.

multi sub postfix:<++>(Bool:D $a is rw) {
    if $a {
        True
    }
    else {
        $a = True;
        False
    }
}


multi sub postfix:<-->(Bool:D $a is rw) {
    if $a {
        $a = False;
        True
    }
    else {
        False
    }
}

As you see, the changed value of the variable after the operation may be different from what the operator returns.

3. Playing with the code of Rakudo Perl 6

Yesterday, we looked at the two methods of the Bool class that return strings. The string representation that the functions produce is hardcoded in the source code.

Let’s use this observation and try changing the texts.

So, here is the fragment that we will modify:

Bool.^add_multi_method('gist', my multi method gist(Bool:D:) {
    self ?? 'True' !! 'False'
});

This gist method is used to stringify a defined variable.

To make things happen, you need to have the source codes of Rakudo on your computer so that you can compile them. Clone the project from GitHub first:

$ git clone https://github.com/rakudo/rakudo.git

Compile with MoarVM:

$ cd rakudo
$ perl Configure.pl --gen-moar --gen-nqp --backends=moar
$ make

Having that done, you get the perl6 executable in the rakudo directory.

Now, open the src/core/Bool.pm file and change the strings of the gist method to use the Unicode thumbs instead of plain text:

Bool.^add_multi_method('gist', my multi method gist(Bool:D:) {
    self ?? '👍' !! '👎'
});

After saving the file, you need to recompile Rakudo. Bool.pm is in the list of files to be compiled in Makefile:

M_CORE_SOURCES = \
    src/core/core_prologue.pm\
    src/core/traits.pm\
    src/core/Positional.pm\
    . . .
    src/core/Bool.pm\
    . . .

Run make and get the updated perl6. Run it and enjoy the result:

:~/rakudo$ ./perl6
To exit type 'exit' or '^D'
> my Bool $b = True;
👍
> $b = !$b; 
👎
>

As an exercise, let us improve your local Perl 6 by adding the gist method for undefined values. By default, it does not exist, and we saw that yesterday. It means that an attempt to interpolate an undefined variable in a string will be rejected. Let’s make it better.

Interpolation uses the Str method. It is similar to both gist and perl, so you will have no difficulties in creating the new version.

This is what currently is in Perl 6:

Bool.^add_multi_method('Str', my multi method Str(Bool:D:) {
    self ?? 'True' !! 'False'
});

This is what you need to add:

Bool.^add_multi_method('Str', my multi method Str(Bool:U:) {
    '¯\_(ツ)_/¯'
});

Notice that self is not needed (and cannot be used) in the second variant.

Compile and run perl6:

$ ./perl6
To exit type 'exit' or '^D'
> my Bool $b;
(Bool)
> "Here is my variable: $b"
Here is my variable: ¯\_(ツ)_/¯
>

It works as expected. Congratulations, you’ve just changed the behaviour of Perl 6 yourself!