🦋17. Parameterised roles in Perl 6

Today, a small excursus into the syntax. Did you know that roles in Perl 6 can have a parameter that makes them similar to generic templates in, say, C++? Here’s a small example:

role R {
    has $.value;
 
    method add($b) {
        $.value + $b.value
    }

    method div($b) {
        $.value / $b.value
    }
}

The R role defines an interface that has a value and two methods for arithmetical operations: add and div.

Now, create a class using the role, initialise two variables and use the methods to get the results:

class C does R {}

my C $x = C.new(value => 10);
my C $y = C.new(value => 3);

say $x.add($y); # 13
say $x.div($y); # 3.333333

Although the values here were integers, Perl did a good job and returned a rational number for the division. You can easily see it by calling the WHAT method:

say $x.add($y).WHAT; # (Int)
say $x.div($y).WHAT; # (Rat)

If you have two integers, the result of their division is always of the Rat type. The actual operator, which is triggered in this case, is the one from src/core/Rat.pm:

multi sub infix:</>(Int \a, Int \b) {
    DIVIDE_NUMBERS a, b, a, b
}

The DIVIDE_NUMBERS sub returns a Rat value.

Defining a role

How to modify the C class so that it performs integer division? One of the options is to use a parameterised role:

role R[::T] {
    has T $.value;
    
    method add($b) {
        T.new($.value + $b.value)
    }

    method div($b) {   
        T.new($.value / $b.value)
    }
}

The parameter in square brackets after the role name restricts both the type of the $.value attribute and the return type of the methods, which return a new object of the type T. Here, in the template of the role, T is just a name, which should later be specified when the role is used.

Using the role

So, let’s make it integer:

class N does R[Int] {}

Now the parts of the role that employ the T name replace it with Int, so the class is equivalent to the following definition:

class C {
    has Int $.value;
    
    method add($b) {
        Int.new($.value + $b.value)
    }

    method div($b) {   
        Int.new($.value / $b.value)
    }
}

The new class operates with integers, and the result of the division is an exact 3:

class N does R[Int] {}

my N $i = N.new(value => 10);
my N $j = N.new(value => 3);

say $i.add($j); # 13
say $i.div($j); # 3

It is also possible to force floating-point values by instructing the role accordingly:

class F does R[Num] {}

my F $x = F.new(value => 10e0);
my F $y = F.new(value => 3e0);

say $x.add($y); # 13
say $x.div($y); # 3.33333333333333

Notice that both values, including 13, are of the Num type now, not Int or Rat as it was before:

say $x.add($y).WHAT; # (Num)
say $x.div($y).WHAT; # (Num)

🔬16. Unifying the implementation of ‘say’ in Perl 6

For the last two days, the topic of this blog was the internals of the say routine in Rakudo Perl 6. (By the way, the term routine is a good choice if you need to talk about both subs and methods.)

In src/core/io_operators.pm, other routines are also defined. The main focus of today is on the implementation details of print, say, put, and note for multiple arguments. Let us look at the functions having this signature: (**@args is raw).

multi sub print(**@args is raw) { 
    $*OUT.print: @args.join
}
multi sub put(**@args is raw) {
    my $out := $*OUT;
    $out.print: @args.join ~ $out.nl-out
}

multi sub note(**@args is raw) {
    my $err := $*ERR;
    my str $str;
    $str = nqp::concat($str,nqp::unbox_s(.gist)) for @args;
    $err.print(nqp::concat($str,$err.nl-out));
}

multi sub say(**@args is raw) {
    my str $str;
    my $iter := @args.iterator;
    nqp::until(
      nqp::eqaddr(($_ := $iter.pull-one), IterationEnd),
      $str = nqp::concat($str, nqp::unbox_s(.gist)));
    my $out := $*OUT;
    $out.print(nqp::concat($str,$out.nl-out));
}

I sorted the functions by the size of their bodies. As you can see, print has the simplest implementation, while say is way more complicated. Let us try to understand if it is possible to simplify it.

First, re-write the body of say in the way note is implemented. The main difference between the behaviour of say and note is the output stream: it is either standard output or standard error. By default, $*OUT and $*ERR dynamic variables are connected to STDOUT and STDERR.

Both say and note call the gist method to stringify the values. So, change the name of the variable and copy the rest.

multi sub say(**@args is raw) {
    my $out := $*OUT;
    my str $str;
    $str = nqp::concat($str,nqp::unbox_s(.gist)) for @args;
    $out.print(nqp::concat($str,$out.nl-out));
}

Try it out:

$ ./perl6 -e'say(Bool::True, 2, 3)'
True23

Seems to be OK, although such changes must be tested more thoroughly. So, let’s run the spec tests:

$ make spectest

This command initiates the tests from the Roast test suite—a huge set of tests covering thousands of syntax corners of Perl 6. The command above also downloads the test suite if needed. The whole run may take a few minutes.

In my case, the only difference between the run on a fresh Rakudo and the one after the modification of say was a failing t/spec/S07-hyperrace/basics.t, which did not happen in the second run and when I ran it individually. So, I think, my change passed the test suite.

The body of say is now more compact but it is still bigger than the implementation of print or put. Let us take them as inspiration. What is missing there is a call to gist, which is easy to add, though:

multi sub say(**@args is raw) {
    my $out := $*OUT;
    $out.print: @args.map(*.gist).join ~ $out.nl-out;
}

To make sure nothing is broken, run the spec tests again.

 

🔬15. Variants of ‘say’ in Perl 6

Yesterday, we saw four different variants of the multi sub called say. Today, let’s look at them more precisely. The functions are located in the src/core/io_operators.pm file.

Start with the first and the simplest one:

multi sub say() { $*OUT.print-nl }

It just prints the newline to the $*OUT stream. Probably, it would be wise mentioning that parentheses are required in the call:

$ ./perl6 -e'say'
===SORRY!===
Argument to "say" seems to be malformed
at -e:1
------> say⏏

The following code is correct:

$ ./perl6 -e'say()'

Move on to the sub that expects a defined string:

multi sub say(Str:D \x) {    
    my $out := $*OUT;
    $out.print(nqp::concat(nqp::unbox_s(x),$out.nl-out));
}

Even if not everything is clear here, the general idea can be seen: this function passes its argument to the print method if $*OUT (which equals to STDIN by default) and adds a new line in the end.

The next variant is suitable for the variables of other types:

multi sub say(\x) {
    my $out := $*OUT;
    $out.print(nqp::concat(nqp::unbox_s(x.gist),$out.nl-out));
}

Can you spot the difference with the previous sub?

It is x.gist instead of x. In the case of a string, there is no need to stringify it. In all other cases, say, for integers, the gist method is called. We already talked about the gist method of the Bool class. That’s how the call of say with a Boolean argument gets a string representation of it: its gist method just returns a string, either ‘True’ or ‘False’.

OK, one more variant for calls with multiple arguments:

multi sub say(**@args is raw) {
    my str $str;
    my $iter := @args.iterator;
    nqp::until(
        nqp::eqaddr(($_ := $iter.pull-one), IterationEnd),
        $str = nqp::concat($str, nqp::unbox_s(.gist)));
    my $out := $*OUT;
    $out.print(nqp::concat($str,$out.nl-out));
}

Well, it looks complex but again, the main idea is visible with the naked eye: iterate over all arguments, concatenate them and print the resulting string with a newline after it:

$ ./perl6 -e'say(1, 2, 3)'
say(**@args is raw)
123

I would avoid digging in into the details of the NQP calls in this subroutine for now. Especially, if you compare the implementation of say with similar functions print and put:

multi sub print(**@args is raw) { $*OUT.print: @args.join }

multi sub put(**@args is raw) {
    my $out := $*OUT;
    $out.print: @args.join ~ $out.nl-out
}

Finally, the variant of say for junctions:

multi sub say(Junction:D \j) {
    j.THREAD(&say)
}

In this implementation, printing a junction means creating a junction, each branch of which is a call of say with the corresponding value. So, say(1|2) is something equivalent to say(1) | say(2), and I assume that the result that you see in the console may be different in each run.

$ ./perl6 -e'say 1|2'
say(Junction:D \j)
1
2

Notice that say 1|2 is not the same as say 1 ~~ 1|2. In the first case, the sub gets a junction, while in the second case it is called with a single Boolean value:

$ ./perl6 -e'say 1 ~~ 1|2'
True

🔬14. Tracking down the ‘say’ calls in Perl 6

Welcome back! Today, we’ll try to do a simple thing using some knowledge from the previous days.

Compare the two lines:

say 'Hello, World';
'Hello, World'.say;

Is there any difference between them? Well, of course. Although the result is the same in both cases, syntactically they differ a lot.

In the first case, say is a stand-alone function that gets a string argument. In the second case, the say method is called on a string.

Compare the two lines on the parse level. First, as a function call:

- statementlist: say 'Hello, World'
  - statement: 1 matches
    - EXPR: say 'Hello, World'
      - args:  'Hello, World'
        - arglist: 'Hello, World'
          - EXPR: 'Hello, World'
            - value: 'Hello, World'
              - quote: 'Hello, World'
                - nibble: Hello, World
      - longname: say
        - name: say
          - identifier: say
          - morename:  isa NQPArray
        - colonpair:  isa NQPArray

Second, as a method:

- statementlist: 'Hello, World'.say
  - statement: 1 matches
    - EXPR: .say
      - 0: 'Hello, World'
        - value: 'Hello, World'
          - quote: 'Hello, World'
            - nibble: Hello, World
      - dotty: .say
        - sym: .
        - dottyop: say
          - methodop: say
            - longname: say
              - name: say
                - identifier: say
                - morename:  isa NQPArray
              - colonpair:  isa NQPArray
        - O: 
      - postfix_prefix_meta_operator:  isa NQPArray
      - OPER: .say
        - sym: .
        - dottyop: say
          - methodop: say
            - longname: say
              - name: say
                - identifier: say
                - morename:  isa NQPArray
              - colonpair:  isa NQPArray
        - O:

Although the result of the two lines is the same, the parse trees look different, which is quite explainable. Instead of examining the parse trees, let us try locating the place where Perl 6 prints the string.

The say sub

This function is a multi-sub, which is defined in the src/core/io_operators.pm file in four different variants:

proto sub say(|) {*}
multi sub say() { . . . }
multi sub say(Junction:D \j) { . . . }
multi sub say(Str:D \x) { . . . }
multi sub say(\x) { . . . }

It should be quite logically that say 'Hello, World' is using the say(Str:D) function. To prove it, add a printing instruction as usual:

multi sub say(Str:D \x) {
    nqp::say('say(Str:D \x)');
    my $out := $*OUT;
    $out.print(nqp::concat(nqp::unbox_s(x),$out.nl-out));
}

Be very careful here not to type it like this:

say('say(Str:D \x)');

I did that mistake and faced an infinite loop that wanted all CPU and memory resources because our additional instruction used the same variant say(Str:D) for a defined string. Even more, the real printing never happened as the $out.print method is called a bit later and is never reached.

Using the nqp:: namespace easily bypasses the problem.

$ ./perl6 -e'say "Hello, World"'
say(Str:D \x)
Hello, World

The say method

Now, let’s try guessing where the say method can be located. I am talking about our second one-liner, 'Hello, World'.say. The first idea is to look for it in src/core/Str.pm, although you will not see it there.

The method is located in the grandgrandparent class Mu (Str←Cool←Any←Mu). You may be surprised to see how it looks like:

proto method say(|) {*}
multi method say() { say(self) }

The fact that it has a prototype and that it is a multi-sub, although there is only one implementation, is not that important now. What is interesting, is that the method barely calls the say sub, which we examined in the previous section.

Add another nqp::say to the method of Mu:

multi method say() { nqp::say('Mu.say()'); say(self) }

Now, run the second program:

$ ./perl6 -e'"Hello, World".say'
Mu.say()
say(Str:D \x)
Hello, World

As you see, we ended up in the same function. Although the difference between the two parse trees was quite big, the actual work was done by the same function in the end.

That’s all for today. Tomorrow, let’s examine other variants of the say sub.

🔬13. Let 1 + 2 * 3 = 9

Is it easy to break the behaviour of Perl 6? Well, the answer probably depends on what exactly you want to break.

Playing with operator precedence, I wanted to change the rules of arithmetical operators + and * so that they are executed in different order, namely, multiplication first, addition second.

Sounds like an easy task. Go to src/Perl6/Grammar.nqp and change a couple of lines that set the precedence of the + and * infixes:

- token infix:sym<*>    { <sym> <O(|%multiplicative)> }
+ token infix:sym<*>    { <sym> <O(|%additive)> }
. . .
- token infix:sym<+>    { <sym> <O(|%additive)> }
+ token infix:sym<+>    { <sym> <O(|%multiplicative)> }

Ready? Compile!

Recompiling the grammar takes a long time, so at first it looks promising, but after a few seconds, the compilation stops with an error:

Month out of range. Is: -935111296, should be in 1..12

Makefile:517: recipe for target 'perl6-m' failed
make: *** [perl6-m] Error 1

Month out of range?? Oh, we changed the rules of the Universe and before Perl 6 is even compiled, the new rules of arithmetics are already applied.

OK, let’s add some anaesthesia and suppress the error message. The code that checks for the correct month value is located in src/core/DateTime.pm, namely, inside the DateTime constructor. Comment that line out:

method !new-from-positional(DateTime:
    Int() $year,
    Int() $month,
    Int() $day,
    Int() $hour,
    Int() $minute,
        $second,
        %extra,
    :$timezone = 0,
    :&formatter,
) {
    # (1..12).in-range($month,'Month');
    (1 .. self.DAYS-IN-MONTH($year,$month)).in-range($day,'Day');
    (0..23).in-range($hour,'Hour');
    (0..59).in-range($minute,'Minute');
    (^61).in-range($second,'Second');
    . . .

This time, the month range check doesn’t stop us from going further but another error breaks in:

MVMArray: Index out of bounds

Makefile:517: recipe for target 'perl6-m' failed
make: *** [perl6-m] Error 1

Looks cryptic. MVMArray is a MoarVM array, obviously. So, we not only broke Perl 6 but MoarVM, too. Let’s go fix it.

The sources of MoarVM are located in a separate git repository at nqp/MoarVM. The message we saw can be found in nqp/MoarVM/src/6model/reprs/VMArray.c:

if (index < 0)
    MVM_exception_throw_adhoc(tc, "MVMArray: Index out of bounds");

There are two places like that, so let’s not guess which of them we need and preventatively change both of them to the following:

if (index < 0)
    index = 0;
    // MVM_exception_throw_adhoc(tc, "MVMArray: Index out of bounds");

(This is C, not Perl.)

From nqp/MoarVM, compile and re-install MoarVM and later try compiling Rakudo:

~/rakudo/nqp/MoarVM$ make
~/rakudo/nqp/MoarVM$ make install

~/rakudo/nqp/MoarVM$ cd ../..
~/rakudo$ make

This time, the error pops up immediately (as no NQP files are compiled):

Use of Nil in numeric context

Use of Nil in numeric context

Day out of range. Is: -51, should be in 1..0

Makefile:517: recipe for target 'perl6-m' failed
make: *** [perl6-m] Error 1

It looks like we can ignore Nils at the moment, but the DateTime hurts us again. We know the remedy:

# (1..12).in-range($month,'Month');
# (1 .. self.DAYS-IN-MONTH($year,$month)).in-range($day,'Day');

Yahoo! This time, the compilation process was calm and we got a new perl6 executable, which works as we wanted:

$ ./perl6 -e'say 1+2*3'
9

Don’t forget to restore the files before further experiments with Perl 6 🙂

Update

In the comment to this blog post, you can see a reference to the commit, which changes the way Rakudo checks the validity of the DateTime object. Instead of using the in-range method, simpler checks are used now, for example:

1 <= $month <= 12
    || X::OutOfRange.new(:what<Month>,:got($month),:range<1..12>).throw;

Here are the time measures of the two runs of a loop creating DateTime objects before and after the update:

time ./perl6 -e'DateTime.new(2018,1,5,12,30,0) for ^500000'
real 0m7.261s
user 0m7.276s
sys 0m0.020s

. . .

$ time ./perl6 -e'DateTime.new(2018,1,5,12,30,0) for ^500000'
real 0m4.457s
user 0m4.476s
sys 0m0.012s

🔬12. The beginning of the Grammar of Perl 6

Yesterday, we talked about the stages of the compiling process of a Perl 6 program and saw the parse tree of a simple ‘Hello, World!’ program. Today, our journey begins at the starting point of the Grammar.

So, here is the program:

say 'Hello, World!'

The grammar of Perl 6 is written in Not Quite Perl 6 and is located in Grammar.nqp 🙂 And that is amazing, as if you know how to work with grammars, you will be able to read the heart of the language.

The Perl 6 Grammar is defined as following:

grammar Perl6::Grammar is HLL::Grammar does STD {
    . . .
}

It is a class derived from HLL::Grammar (HLL stands for High-Level Language) and implements the STD (Standard) role. Let’s not focus on the hierarchy for now, though.

The Grammar has the TOP method. Notice that this is a method, not a rule or a token. The main feature of the method is that it is assumed that it contains some Perl 6 code, not regexes.

As we did earlier, let’s use our beloved method of reverse engineering by adding our own printing instructions to different places of Rakudo sources, recompiling it and watching how it works. The first target is the TOP method:

grammar Perl6::Grammar is HLL::Grammar does STD {
    my $sc_id := 0;
    method TOP() {
        nqp::say('At the TOP');
        . . .

As this is NQP, you need to call functions in the nqp:: namespace (although say is available without the namespace prefix, too). One of the notable differences between Perl 6 and NQP is the need to always have parentheses in function calls: if you omit them, the code won’t compile.

Perl inside regexes inside Perl

For training purposes, let’s try adding similar instruction to the comp_unit token (computational unit). This token is a part of the Grammar and is also called as one of the first methods during parsing Perl 6.

The body of the above shown TOP method is written in NQP. The body of a token is another language, and you should use regexes instead. Thus, to embed an instruction in Perl (or NQP), you need to switch the language.

There are two options: use a code block in curly braces or the colon-prefixed syntax that is very widely used in Rakudo sources to declare variables.

token comp_unit {
    {
        nqp::say('comp_unit');
    }
    :my $x := nqp::say('Var in grammar');
    . . .

Notice that it NQP, the binding := operator have to be used in place of the assignment =.

Statement list

So, back to the grammar. In the output that the --target=parse command-line option produces, we can see a statementlist node at the top of the parse tree. Let us look at its implementation in the Grammar. With some simplifications, it looks very lightweight:

rule statementlist($*statement_level = 0) {
    . . .
    <.ws>
    [
    | $
    | <?before <.[\)\]\}]>>
    | [ <statement> <.eat_terminator> ]*
    ]
    . . .
}

Basically, it says that a statement list is a list of zero or more statements. Square brackets in Perl 6 grammars create a non-capturing group, and we see three alternatives inside. One of the alternatives is just the end of data, another one is the end of the block (e. g., ending with a closing curly brace). For the sake of art, an additional vertical bar is added before the first alternative too.

The top-level rule is simple but the rest is becoming more and more complex. For example, let’s have a quick look at the eat terminator:

token eat_terminator {
    || ';'
    || <?MARKED('endstmt')> <.ws>
    || <?before ')' | ']' | '}' >
    || $
    || <?stopper>
    || <?before [if|while|for|loop|repeat|given|when] » > {
       $/.'!clear_highwater'(); self.typed_panic(
          'X::Syntax::Confused', reason => "Missing semicolon" ) }
    || { $/.typed_panic( 'X::Syntax::Confused', reason => "Confused" ) }
}

And this is just a small separator between the statements 🙂

The grammar file is more than 5500 lines of code; it is not possible to discuss and understand it all in a single blog post. Let us stop here for today and continue with easier stuff tomorrow.

🦋11. Compiler stages and targets in Perl 6

Welcome to the new year! Today, let us switch for a while from the discussion about obsolete messages to something different.

Stages

If you followed the exercises in the previous posts, you might have noticed that some statistics was printed in the console when compiling Rakudo:

Stage start      :   0.000
Stage parse      :  44.914
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   4.245
Stage mast       :   9.476
Stage mbc        :   0.200

You could have also noticed that the bigger the file you changed, the slower it is compiled, up to dozens of seconds when you modify Grammar.pm.

It is also possible to see the statistics for your own programs. The --stagestats command-line option does the job:

$ ./perl6 --stagestats -e'say 42'
Stage start      :   0.000
Stage parse      :   0.065
Stage syntaxcheck:   0.000
Stage ast        :   0.000
Stage optimize   :   0.001
Stage mast       :   0.003
Stage mbc        :   0.000
Stage moar       :   0.000
42

So, let’s look at these stages. Roughly, half of them is about Perl 6, and half is about MoarVM. In the case Rakudo is configured to work with the JVM backend, the output will differ in the second half.

The Perl 6 part is clearly visible in the src/main.nqp file:

# Create and configure compiler object.
my $comp := Perl6::Compiler.new();
$comp.language('perl6');
$comp.parsegrammar(Perl6::Grammar);
$comp.parseactions(Perl6::Actions);
$comp.addstage('syntaxcheck', :before);
$comp.addstage('optimize', :after);
hll-config($comp.config);
nqp::bindhllsym('perl6', '$COMPILER_CONFIG', $comp.config);

Look at the selected lines. If you have played with Perl 6 Grammars, you know that big grammars are usually split into two parts: the grammar itself and the actions. The Perl 6 compiler does exactly the same thing for the Perl 6 grammar. There are two files: src/Perl6/Grammar.nqp and src/Perl6/Actions.nqp.

When looking at src/main.nqp, it is not quite clear that there are eight stages. Add the following line to the file:

for ($comp.stages()) { nqp::say($_) }

Now, recompile Rakudo and run any program:

$ ./perl6 -e'say 42'
start
parse
syntaxcheck
ast
optimize
mast
mbc
moar
42

Here they are.

The names of the first three stages—start, parse, and syntaxcheck—are quite self-explanatory. The ast stage is the stage of building an abstract syntax tree, which is then optimized in the optimize stage.

At this point, your Perl 6 program has been transformed into the abstract syntax tree and is about to be passed to the backend, MoarVM virtual machine in our case. The stages names start with m. The mast stage is the stage of the MoarVM assembly (not abstract) syntax tree, mbc stands for MoarVM bytecode and moar is when the VM executes the code.

Targets

Now that we know the stages of the Perl 6 program workflow, let’s make use of them. The --target option lets the compiler to stop at the given stage and display the result of it. This option supports the following values: parse, syntaxcheck, ast, optimize, and mast. With those options, Rakudo prints the output as a tree, and you can see how the program changes at different stages.

Even for small programs, the output, especially with the abstract syntax tree or an assembly tree of the VM is quite verbose. Let’s look at the parse tree of the ‘Hello, World!’ program, for example:

$ ./perl6 --target=parse -e'say "Hello, World!"'
- statementlist: say "Hello, World!"
  - statement: 1 matches
    - EXPR: say "Hello, World!"
      - args:  "Hello, World!"
        - arglist: "Hello, World!"
          - EXPR: "Hello, World!"
            - value: "Hello, World!"
              - quote: "Hello, World!"
                - nibble: Hello, World!
      - longname: say
        - name: say
          - identifier: say
          - morename:  isa NQPArray
        - colonpair:  isa NQPArray

All the names here correspond to rules, tokens, or methods of the Grammar. You can find them in src/Perl6/Grammar.nqp. As an exercise, try predicting if the name is a method, or a rule, or a token. Say, a value should be a token, as it is supposed to be a compact string, while a statementlist is a rule.