🔬10. Obsolete syntax error messages, part 2

Today, we continue exploring the error messages that Rakudo developers embedded to detect old Perl 5 constructions in the Perl 6 code.

The obs method

But first, let’s make a small experiment and add a call to the obs method in the rule parsing the for keyword.

rule statement_control:sym<for> {
    <sym><.kok> {}
    [ <?before 'my'? '$'\w+\s+'(' >
        <.obs('Hello', 'World!')> <.typed_panic: 'X::Syntax::P5'> ]?
    [ <?before '(' <.EXPR>? ';' <.EXPR>? ';' <.EXPR>? ')' >
        <.obs('C-style "for (;;)" loop', '"loop (;;)"')> ]?
    <xblock(1)>
}

The dot before the name of the method prevents creating a named element in the Match object. Actually, that is not that important as soon as the obs call generates an exception. In many other cases, the dot is very useful, of course.

Compile Rakudo and feed it with the erroneous Perl 6 code:

$ ./perl6 -e'for my $x (@a) {}'
===SORRY!=== Error while compiling -e
Unsupported use of Hello; in Perl 6 please use World!
at -e:1
------> for ⏏my $x (@a) {}

As you see, we’ve generated some rubbish message but the X::Syntax::P5 exception did not have a chance to appear, as the parsing stopped at the place the obs method was called.

No foreach anymore

Another error message appears when you try using the foreach keyword:

$ ./perl6 -e'foreach @a {}'
===SORRY!=== Error while compiling -e
Unsupported use of 'foreach'; in Perl 6 please use 'for'
at -e:1
------> foreach⏏ @a {}

Notice that the compiler stopped even before figuring out that the @a variable is not defined.

Here is the rule that finds the outdated keyword:

rule statement_control:sym<foreach> {
    <sym><.end_keyword> <.obs("'foreach'", "'for'")>
}

The end_keyword method is a token that matches the right edge of the keyword; this is not a method to report about the end of support of the keyword 🙂 You can see this method in many other rules in the grammar.

token end_keyword {
    » <!before <.[ \( \\ ' \- ]> || \h* '=>'>
}

No do anymore

Another potential mistake is creating the do blocks instead of the new repeat/while or repeat/until.

$ ./perl6 -e'do {} while 1'
===SORRY!=== Error while compiling -e
Unsupported use of do...while;
in Perl 6 please use repeat...while or repeat...until
at -e:1

This time, the logic for detecting the error is hidden deeply inside the statement token:

token statement($*LABEL = '') {
    . . .
    my $sp := $<EXPR><statement_prefix>;
    if $sp && $sp<sym> eq 'do' {
         my $s := $<statement_mod_loop><sym>;
         $/.obs("do..." ~ $s, "repeat...while or repeat...until");
    }
    . . .
}

The second symbol is taken from the $<statement_mod_loop><sym> value, so the error message contains the proper instruction for both do {} until and do {} for blocks.

Let’s stop here for today. We’ll examine more obsolete syntax in the next year. Meanwhile, I wish you all the best and success with using Perl 6 in 2018!

🔬9. Obsolete syntax error messages in Perl 6, part 1

Yesterday, we saw an error message about the improper syntax of the ternary operator. Let’s look at other similar things that the Rakudo designers has implemented for us to make the transition from Perl 5 smoother.

First of all, the Perl 6 grammar file (src/Perl6/Grammar.nqp) contains four different methods for reacting to obsolete syntax:

method obs($old, $new, $when = 'in Perl 6') {
    $*W.throw(self.MATCH(), ['X', 'Obsolete'],
        old         => $old,
        replacement => $new,
        when        => $when,
    );
}
method obsvar($name) {
    $*W.throw(self.MATCH(), ['X', 'Syntax', 'Perl5Var'], :$name);
}

method sorryobs($old, $new, $when = 'in Perl 6') {
    $*W.throw(self.MATCH(), ['X', 'Obsolete'],
        old         => $old,
        replacement => $new,
        when        => $when,
    );
}

method worryobs($old, $new, $when = 'in Perl 6') {
    self.typed_worry('X::Obsolete',
        old         => $old,
        replacement => $new,
        when        => $when,
    );
}

Three of these methods throw exceptions, the fourth one prints a warning. The final text of the error message is using the information from the arguments of the methods. For example, this is what we saw yesterday:

<.obs('? and : for the ternary conditional operator', '?? and !!')>

This part of the token regex is transformed to the following error message (the parts from the regex are highlighted):

Unsupported use of ? and : for the ternary conditional operator;
in Perl 6 please use ?? and !!

Obsolete syntax

Let us see what other messages we have in the current Rakudo Perl 6 compiler.

Negative indices

The first example is very likely one of the most common mistake that a Perl 5 programmer faces when programming in Perl 6.

$ perl6 -e'my @a; say @a[-1]'
===SORRY!=== Error while compiling -e
Unsupported use of a negative -1 subscript to index from the end;
in Perl 6 please use a function such as *-1
at -e:1
------> my @a; say @a[-1]⏏

To count from the end of the array, you should use a WhateverCode instead of negative integers. This is how the error message is encoded in the src/Perl6/Actions.nqp file (notice that this is an NQP module, not the Perl 6 one, while the syntax is very clear):

method postcircumfix:sym<[ ]>($/) {
    . . .
    my $ix := $_ ~~ / [ ^ | '..' ] \s* <( '-' \d+ )> \s* $ /;
    if $ix {
        $c.obs("a negative " ~ $ix ~ " subscript to index from the end", 
               "a function such as *" ~ $ix);
    }
    . . .
}

The $c variable is the current symbol in the syntax tree, and the $ix is a negative index taken from the square brackets (notice the position of the capturing parentheses inside the regex). If there is a negative index, an error message is generated for your pleasure.

The rest of the .obs calls happen in the src/Perl6/Grammar.nqp file.

Perl 6 loop, not C-style for

The for loop in Perl 6 is designed to work with lists or arrays, so using it in the C-style, which is allowed in Perl 5, is prohibited:

$ perl6 -e'for (my $i = 1; $i != 10; $i++) {}'
===SORRY!=== Error while compiling -e
Unsupported use of C-style "for (;;)" loop;
in Perl 6 please use "loop (;;)"
at -e:1
------> for ⏏(my $i = 1; $i != 10; $i++) {}

Localise that error message in the grammar:

rule statement_control:sym<for> {
    <sym><.kok> {}
    [ <?before 'my'? '$'\w+\s+'(' >
        <.typed_panic: 'X::Syntax::P5'> ]?
    [ <?before '(' <.EXPR>? ';' <.EXPR>? ';' <.EXPR>? ')' >
        <.obs('C-style "for (;;)" loop', '"loop (;;)"')> ]?
    <xblock(1)>
}

Here, you also can see another type of error message regarding the Perl 5 syntax (see where the typed_panic method matches):

$ ./perl6 -e'for my $x (@a) {}'
===SORRY!=== Error while compiling -e
This appears to be Perl 5 code
at -e:1
------> for ⏏my $x (@a) {}

Interestingly, this is the only place where the X::Syntax::P5 exception is used.

That’s all for today, stay tuned for more error messages tomorrow! 🙂

 

🔬8. Digging into operator precedence in Perl 6, part 2

Yesterday, we took a look at how the ? and so operators are dispatched depending on the type of the variable. We did it with the intention to understand what is the difference between them.

Here is once again an excerpt from the src/core/Bool.pm file, where the bodies of the subs look alike:

proto sub prefix:<?>(Mu $) is pure {*}
multi sub prefix:<?>(Bool:D \a) { a }
multi sub prefix:<?>(Bool:U \a) { Bool::False }
multi sub prefix:<?>(Mu \a) { a.Bool }

proto sub prefix:<so>(Mu $) is pure {*}
multi sub prefix:<so>(Bool:D \a) { a }
multi sub prefix:<so>(Bool:U \a) { Bool::False }
multi sub prefix:<so>(Mu \a) { a.Bool }

Both of them coerce the arguments to a Bool value. The difference is in their operator precedence. You cannot say for sure what is the precedence if you only look at the Bool.pm file. You will find more details in the src/Perl6/Grammar.nqp file describing the Perl 6 language grammar. Here are the fragments we need:

token prefix:sym<so> { <sym><.end_prefix> <O(|%loose_unary)> }
. . .
token prefix:sym<?> { <sym> <!before '??'> <O(|%symbolic_unary)> }

These look complex but let’s first concentrate only on the last part of the token definitions: <O(|%loose_unary)> and <O(|%symbolic_unary)>. Obviously, these are what define the rules for precedence. You can find a list of about 30 different kind of precedences in the same file:

## Operators

. . .
my %symbolic_unary := nqp::hash('prec', 'v=', 'assoc', 'unary', 'dba', 'symbolic unary');
. . .
my %list_assignment := nqp::hash('prec', 'i=', 'assoc', 'right', 'dba', 'list assignment', 'sub', 'e=', 'fiddly', 1);
my %loose_unary := nqp::hash('prec', 'h=', 'assoc', 'unary', 'dba', 'loose unary');
my %comma := nqp::hash('prec', 'g=', 'assoc', 'list', 'dba', 'comma', 'nextterm', 'nulltermish', 'fiddly', 1);
. . .

Let’s avoid digging deeper into how it works at the moment. Looking at the list you can guess that the letters k, j, h, and g define the preference order of different kinds of preference rules. As well a right or left dictate the associativity of the operators.

So, the so operator has the loose unary precedence level and the ? operator has a higher symbolic unary precedence.

The old conditional operator

Before we wrap up for today, let’s look at another interesting place where the single question mark can be caught in the Perl 6 program. I am talking about the following token in the grammar (notice that this time this is for an infix, not for a prefix):

token infix:sym<?> {
    <sym> {} <![?]> <?before <.-[;]>*?':'>
    <.obs('? and : for the ternary conditional operator', '?? and !!')>
    <O(|%conditional)>
}

This code catches the usage of a single ?, which was a part of the ternary operator in Perl 5 unlike the double ?? from the ternary operator in Perl 6.

The <.obs...> part of the token regex prints a warning about obsolete syntax:

$ ./perl6 -e'say 1 ? True : False'
===SORRY!=== Error while compiling -e
Unsupported use of ? and : for the ternary conditional operator;
in Perl 6 please use ?? and !!
at -e:1
------> say 1 ?⏏ True : False

So, if you use the old syntax, you’ll get not only an error message but also a hint on how to fix the issue.

🔬7. Digging into operator precedence in Perl 6, part 1

Today, we’ll once again look at the src/core/Bool.pm file. This is a good example of a full-fledged Perl 6 class, which is still not very difficult to examine.

Look at the definitions of the ? and so operators:

proto sub prefix:<?>(Mu $) is pure {*}
multi sub prefix:<?>(Bool:D \a) { a }
multi sub prefix:<?>(Bool:U \a) { Bool::False }
multi sub prefix:<?>(Mu \a) { a.Bool }

proto sub prefix:<so>(Mu $) is pure {*}
multi sub prefix:<so>(Bool:D \a) { a }
multi sub prefix:<so>(Bool:U \a) { Bool::False }
multi sub prefix:<so>(Mu \a) { a.Bool }

There’s no visual difference between the two implementations, but it would be a mistake to conclude that there is no difference between the two of them. Both ? and so cast a value to the Bool type.

When am I called?

Before we go discussing the precedence, let us first examine when the above subs are called. For simplifying the task, add a few printing instructions into their bodies:

proto sub prefix:<?>(Mu $) is pure {*}
multi sub prefix:<?>(Bool:D \a) { say 1; a }
multi sub prefix:<?>(Bool:U \a) { say 2; Bool::False }
multi sub prefix:<?>(Mu \a) { say 3; a.Bool }

proto sub prefix:<so>(Mu $) is pure {*}
multi sub prefix:<so>(Bool:D \a) { say 4; a }
multi sub prefix:<so>(Bool:U \a) { say 5; Bool::False }
multi sub prefix:<so>(Mu \a) { say 6; a.Bool }

Re-compile Rakudo and make a few tests with both ? and so (you’ll get some numbers printed before the prompt appears):

$ ./perl6
> my Bool $b;
(Bool)
> ?$b;
2
> so $b;
5
>

At the moment, there are no surprises. For an undefined Boolean variable, those subs are called that have the (Bool:U) signature.

Now, try an integer:

> my Int $i;
(Int)
> ?$i;
3
> so $i;
6

Although the variable is of the Int type, the compiler calls the subs from Bool.pm (notice that those functions are regular subs, not the methods of the Bool class). This time, the subs having the (Mu) signature are called, as Int is a grand-grandchild of Mu (via Cool and Any). For the undefined variable, the subs call the Bool method from the Mu class.

proto method Bool() {*}
multi method Bool(Mu:U: --> False) { }
multi method Bool(Mu:D:) { self.defined }

For a defined integer, the Bool method of the Int class is used instead:

multi method Bool(Int:D:) {
    nqp::p6bool(nqp::bool_I(self));
}

To visualise the routes, add more printing commands to the files. In src/core/Mu.pm:

proto method Bool() {*}
multi method Bool(Mu:U:) { nqp::say('7'); False }
multi method Bool(Mu:D:) { nqp::say('8'); self.defined }

And in src/core/Int.pm:

multi method Bool(Int:D:) {
    say 9;
    nqp::p6bool(nqp::bool_I(self));
}

During the compilation, a lot of 7s and 8s flood the screen, which means that the changes we’ve just made are already used even during the compilation process.

It’s time to play with defined and undefined integers now:

> my Int $i;
(Int)
> say $i.Bool;
7
False
> $i = 42;
42
> say $i.Bool;
9
True

For the undefined variable, the method from the Mu class (printing 7) is triggered; for the defined variable, the one from Int (9).

🔬6. The dd routine of Rakudo Perl 6

In Rakudo, there is a useful routine dd, which is not a part of Perl 6 itself. It dumps its argument(s) in a way that you immediately see the type and content of a variable. For example:

$ ./perl6 -e'my Bool $b = True; dd($b)'
Bool $b = Bool::True

It works well with data of other types, for example, with arrays:

$ ./perl6 -e'my @a = < a b c >; dd(@a)'
Array @a = ["a", "b", "c"]

Today, we will look at the definition of the dd routine.

It is located in the src/core/Any.pm module as part of the Any class. The code is quite small, so let us show it here:

sub dd(|) {
    my Mu $args := nqp::p6argvmarray();
    if nqp::elems($args) {
        while $args {
            my $var  := nqp::shift($args);
            my $name := try $var.VAR.?name;
            my $type := $var.WHAT.^name;
            my $what := $var.?is-lazy
              ?? $var[^10].perl.chop ~ "... lazy list)"
              !! $var.perl;
            note $name ?? "$type $name = $what" !! $what;
        }
    }
    else { # tell where we are
        note .name
          ?? "{lc .^name} {.name}{.signature.gist}"
          !! "{lc .^name} {.signature.gist}"
          with callframe(1).code;
    }
    return
}

Call with arguments

The vertical bar, which we have already seen earlier, is a signature that captures argument lists with no type checking. It is not possible to omit it and leave empty parentheses, as in that case the routine can only be called without arguments.

Inside, some NQP-magic happens but that is quite readable for us. If there are arguments, the routine loops over them, shifting the next argument in each cycle.

Then, there is an attempt to get the name, type and content:

my $name := try $var.VAR.?name;
my $type := $var.WHAT.^name;

Notice the presence of try and ? in the method call. We already saw the pattern when we were taking about string interpolation. The ?name is only called on an object if the method exists there, and does not generate an error if not.

The content is a bit more difficult thing:

my $what := $var.?is-lazy
    ?? $var[^10].perl.chop ~ "... lazy list)"
    !! $var.perl;

The result depends on whether an object is a lazy list or not. For example, try dumping an infinite range:

$ ./perl6 -e'dd 1..∞'
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10... lazy list)

Only the first ten items are listed. For a non-lazy object, the perl method is called.

Finally, the result is printed to STDERR:

note $name ?? "$type $name = $what" !! $what;

Call with no arguments

The second branch of the dd routine is triggered when there are no arguments. In that case, the routine tries to give some information about the place where it is called. Look at the following example:

sub f() { dd }
f;

The result of running this program shows the name and the signature of the function:

sub f()

A good use case can be thus to use dd in multi-functions instead of printing manual text messages.

multi sub f(Int) { dd }
multi sub f(Str) { dd }

f(42);
f('42');

Run the program, and it prints an extremely useful debugging information:

sub f(Int)
sub f(Str)

That’s all for today. See you tomorrow!

🔬5. Lurking behind interpolation in Perl 6

In the previous articles, we’ve seen that the undefined value cannot be easily interpolated in a string, as an exception occurs. Today, our goal is to see where exactly that happens in the source code of Rakudo.

So, as soon as we’ve looked at the Boolean values, let’s continue with them. Open perl6 in the REPL mode and create a variable:

$ perl6
To exit type 'exit' or '^D'
> my $b
(Any)

The variable is undefined, so be ready to get an exception when interpolating it:

> "$b"
Use of uninitialized value $b of type Any in string context.
Methods .^name, .perl, .gist, or .say can be used to stringify it to something meaningful.
 in block  at  line 1

Interpolation uses the Str method. For undefined values, this method is absent in the Bool class. So we have to trace back to the Mu class, where we can see the following collection of base methods:

proto method Str(|) {*}

multi method Str(Mu:U \v:) {
   my $name = (defined($*VAR_NAME) ?? $*VAR_NAME !! try v.VAR.?name) // '';
   $name ~= ' ' if $name ne '';
 
   warn "Use of uninitialized value {$name}of type {self.^name} in string"
      ~ " context.\nMethods .^name, .perl, .gist, or .say can be"
      ~ " used to stringify it to something meaningful.";
   ''
}

multi method Str(Mu:D:) {
    nqp::if(
        nqp::eqaddr(self,IterationEnd),
        "IterationEnd",
        self.^name ~ '<' ~ nqp::tostr_I(nqp::objectid(self)) ~ '>'
    )
}

The proto-definition gives the pattern for the Str methods. The vertical bar in the signature indicates that the proto does not validate the type of the argument and can also capture more arguments.

In the Str(Mu:U) method you can easily see the text of the error message. This method is called for the undefined variable. In our case, with the Boolean variable, there’s no Str(Bool:U) method in the Bool class, so the call is dispatched to the method of the Mu class.

Notice how the variable name is obtained:

my $name = (defined($*VAR_NAME) ?? $*VAR_NAME !! try v.VAR.?name) // '';

It tries either the dynamic variable $*VAR_NAME or the name method of the VAR object.

You can easily see which branch is used: just add a couple of printing instructions to the Mu class and recompile Rakudo:

proto method Str(|) {*}
multi method Str(Mu:U \v:) {
    warn "VAR_NAME=$*VAR_NAME" if defined $*VAR_NAME;
    warn "v.VAR.name=" ~ v.VAR.name if v.VAR.?name;
    . . .

Now execute the same interpolation:

> my $b ;
(Any)
> "$b"
VAR_NAME=$b
  in block  at  line 1

So, the name was taken from the $*VAR_NAME variable.

What about the second multi-method Str(Mu:D:)? It is important to understand that it will not be called for a defined Boolean object because the Bool class has a proper variant already.

🔬4. Exploring the Bool type in Perl 6, part 2

Today, we are continuing reading the source codes of the Bool class: src/core/Bool.pm, and will look at the methods that calculate the next or the previous values, or increment and decrement the values. For the Boolean type, it sounds simple, but you still have to determine the behaviour of the edge cases.

pred and succ

In Perl 6, there are two complementary methods: pred and succ that should return, correspondingly, the preceding and the succeeding values. This is how they are defined for the Bool type:

Bool.^add_method('pred', my method pred() { Bool::False });
Bool.^add_method('succ', my method succ() { Bool::True });

As you see, these methods are regular (not multi) methods and do not distinguish between defined or undefined arguments. The result neither depends on the value!

If you take two Boolean variables, one set to False and another to True, the prec method returns False for both variables:

my Bool $f = False;
my Bool $t = True;
my Bool $u;

say $f.pred;    # False
say $t.pred;    # False
say $u.pred;    # False
say False.pred; # False
say True.pred;  # False

Similarly, the succ method always returns True:

say $f.succ;    # True
say $t.succ;    # True
say $u.succ;    # True
say False.succ; # True
say True.succ;  # True

Increment and decrement

The variety of the ++ and -- operations is even more, as another dimension—prefix or postfix—is added.

First, the two prefixal forms:

multi sub prefix:<++>(Bool $a is rw) { $a = True; }
multi sub prefix:<-->(Bool $a is rw) { $a = False; }

When you read the sources, you start slowly understand that many strangely behaving bits of the language may be well explained, because the developers have to think about huge combinations of arguments, variables, positions, etc., about which you may not even think when using the language.

The prefix forms simply set the value of the variable to either True or False, and it happens for both defined and undefined variables. The is rw trait allows modifying the argument.

Now, the postfix forms. This time, the state of the variable matters.

multi sub postfix:<++>(Bool:U $a is rw --> False) { $a = True }
multi sub postfix:<-->(Bool:U $a is rw) { $a = False; }

We see a new element of syntax—the return value is mentioned after an arrow in the sub signature:

(Bool:U $a is rw --> False)

The bodies of the operators that work on defined variables, are wordier. If you look at the code precisely, you can see that it avoids assigning the new value to a variable if, for example, a variable containing True is incremented.

multi sub postfix:<++>(Bool:D $a is rw) {
    if $a {
        True
    }
    else {
        $a = True;
        False
    }
}


multi sub postfix:<-->(Bool:D $a is rw) {
    if $a {
        $a = False;
        True
    }
    else {
        False
    }
}

As you see, the changed value of the variable after the operation may be different from what the operator returns.