27. Obsolete syntax error messages in Perl 6, part 4

So far, we covered a lot of different error messages that Rakudo Perl 6 generates when you accidentally use the Perl 5 syntax. This is a really nice feature for easy migration to the new language.

Let us continue and cover another couple of errors.

new X

It was one of the hottest topics in Perl 5 to forbid indirect method calls. Personally, I always preferred to use an arrow for method calls while still feeling better with new X(...) when creating objects. Now, Perl 6 prevents that and it looks like it knows something about my first language:

$ perl6 -e'say new Int;'
===SORRY!=== Error while compiling -e
Unsupported use of C++ constructor syntax;
in Perl 6 please use method call syntax
at -e:1
------> say new Int⏏;

The attempt to use a C++ constructor call is blocked by the following rule in the Grammar:

token term:sym<new> {
    'new' \h+ <longname> \h* <![:]>
    <.obs("C++ constructor syntax", "method call syntax")>
}

It allows the following code, though:

my $c = new Int:;
$c++;
say $c; # 1

-> vs .

Another aspect of object-oriented programming is the way methods are called. In Perl 5, it used to be an arrow while in Perl 6 methods are called with a dot.

So, neither $x->meth nor $x->() should work. The rules that catch that are defined as the following:

# TODO: report the correct bracket in error message
token postfix:sym«->» {
    <sym>
    [
    | ['[' | '{' | '(' ] <.obs('->(), ->{} or ->[] as postfix dereferencer', '.(), .[] or .{} to deref, or whitespace to delimit a pointy block')>
    | <.obs('-> as postfix', 'either . to call a method, or whitespace to delimit a pointy block')>
    ]
}

The token extracts an arrow and prints one of the two messages depending on the next character.

If the character is an opening brace, it would be nice to make a less generic message, and the TODO comment actually agrees that it is the desired thing. Let us try making that at home.

method bracket_pair($s) {
    $s eq '{' ?? '}' !! $s eq '[' ?? ']' !! ')'
}

token postfix:sym«->» {
    <sym>
    [
    | $<openingbracket>=['[' | '{' | '(' ] {
        my $pair := $<openingbracket> ~ self.bracket_pair(~$<openingbracket>);
        self.obs("->$pair as postfix dereferencer",
                 ".$pair to deref, or whitespace to delimit a pointy block")
    }
    | <.obs('-> as postfix', 'either . to call a method, or whitespace to delimit a pointy block')>
    ]
}

The changes are shown in bold. First, I save the opening brace in $<openingbracket>, then, a simple function finds its matching pair, and finally, the $pair variable gets both parts, so either {}, or [], or ().

The goal has been achieved:

$ ./perl6 -e'say Int->{}'
===SORRY!=== Error while compiling -e
Unsupported use of ->{} as postfix dereferencer;
in Perl 6 please use .{} to deref, or whitespace
to delimit a pointy block
at -e:1
------> say Int->⏏{}
    expecting any of:
        postfix

Maybe it also worth not mentioning pointy blocks for [] and ().

As homework, try using the method that Rakudo is using itself for detecting the closing bracket-pair instead of our function above.

26. Native integers and UInt in Perl 6

As soon as we touched some native integers yesterday, let us look a bit closer at them. The topic is deep, so we limit ourselves with a brief understanding of interconnections between the different integer data types in Perl 6.

UInt and UInt64

The simplest is UInt. This data type is defined in src/core/Int.pm and is literary a one-liner:

my subset UInt of Int where {not .defined or $_ >= 0};

The where clause restricts the values to be non-negative. Thus, the range of a UInt variable is 0..^Inf:

say UInt.Range; # 0..^Inf

There’s another type, UInt64. It is defined similarly but puts additional restriction to the value: it should not exceed 2⁶⁴.

my Int $UINT64_UPPER = nqp::pow_I(2, 64, Num, Int);
subset UInt64 of Int where { 0 <= $_ < $UINT64_UPPER }

By the way, don’t forget that you can use superscripts in Perl 6 directly:

say 2⁶⁴; # 18446744073709551616

OK, let us confirm the borders of the UInt64 class by calling its Range method:

$ perl6 -e'say UInt64.Range'
-Inf^..^Inf

This result is wrong (U stands for unsigned), but I am sure that if you follow my blog you know where to fix it. Of course, the problem sits in the Range method that we were examining yesterday. Here is the fragment in src/core/Int.pm we need:

default { # some other kind of Int
    .^name eq 'UInt'
        ?? Range.new( 0, Inf, :excludes-max )
        !! Range.new( -Inf, Inf, :excludes-min, :excludes-max )
}

Both UInt and UInt64 are the children of Int but only of the them is handled properly. Let us add the missing check like this, for example:

when .^name eq 'UInt'   { Range.new(0, Inf, :excludes-max) }
when .^name eq 'UInt64' { Range.new(0, 2⁶⁴ - 1) }
default                 { Range.new( -Inf, Inf, :excludes-min, :excludes-max) }

Compile and enjoy the result:

say Int.Range;    # -Inf^..^Inf
say UInt.Range;   # 0..^Inf
say UInt64.Range; # 0..18446744073709551615

This is fine but I am still not satisfied with a big chain of when tests. Would UInt and UInt64 be classes, not subsets, you could add individual Range methods to each of them.

Native ints

Another big cluster of integer type definitions can be found in src/core/natives.pm. Let me quote the big part of that file here:

my native   int is repr('P6int') is Int { }
my native  int8 is repr('P6int') is Int is nativesize( 8) { }
my native int16 is repr('P6int') is Int is nativesize(16) { }
my native int32 is repr('P6int') is Int is nativesize(32) { }
my native int64 is repr('P6int') is Int is nativesize(64) { }

my native   uint is repr('P6int') is Int is unsigned { }
my native  uint8 is repr('P6int') is Int is nativesize( 8) is unsigned { }
my native   byte is repr('P6int') is Int is nativesize( 8) is unsigned { }
my native uint16 is repr('P6int') is Int is nativesize(16) is unsigned { }
my native uint32 is repr('P6int') is Int is nativesize(32) is unsigned { }
my native uint64 is repr('P6int') is Int is nativesize(64) is unsigned { }

These native types are all Ints and are represented by P6int data type. If you dig into MoarVM, you will find a directory nqp/MoarVM/src/6model/reprs that contains many C and C header files, including P6int.c and P6int.h. A brief look tells us that this type is universally used for all the native types listed above:

/* Representation used by P6 native ints. */
struct MVMP6intBody {
    /* Integer storage slot. */
    union {
        MVMint64 i64;
        MVMint32 i32;
        MVMint16 i16;
        MVMint8 i8;
        MVMuint64 u64;
        MVMuint32 u32;
        MVMuint16 u16;
        MVMuint8 u8;
    } value;
};

The is nativesize and is unsigned are the traits (src/core/traits.pm) that set some attributes of the NativeHOW object:

multi sub trait_mod:<is>(Mu:U $type, :$nativesize!) {
    $type.^set_nativesize($nativesize);
}
multi sub trait_mod:<is>(Mu:U $type, :$unsigned!) {
    $type.^set_unsigned($unsigned);
}

And let’s make a break for today.

25. The Range method

Today, I started looking into the internals of the Int class (src/core/Int.pm) and faced a strangely looking method, Range.

The Range method returns an object of the Range type showing the minimum and the maximum values that the object can hold. For example, this is how the method is defined for the Num class:

method Range(Num:U:) { Range.new(-Inf,Inf) }

Or, for the Rational type:

method Range(::?CLASS:U:) { Range.new(-Inf, Inf) }

It is possible to call a method directly on the typename or on a variable of that type. For example, let’s display the range of Int:

say Int.Range; # -Inf^..^Inf

my Int $x;
say $x.Range;  # -Inf^..^Inf

Finally, let us look at the body of the method in src/core/Int.pm:

method Range(Int:U:) {
    given self {
        when int  { $?BITS == 64 ??  int64.Range !!  int32.Range }
        when uint { $?BITS == 64 ?? uint64.Range !! uint32.Range }

        when int64  { Range.new(-9223372036854775808, 9223372036854775807) }
        when int32  { Range.new(         -2147483648, 2147483647         ) }
        when int16  { Range.new(              -32768, 32767              ) }
        when int8   { Range.new(                -128, 127                ) }
        # Bring back in a future Perl 6 version, or just put on the type object
        #when int4   { Range.new(                  -8, 7                  ) }
        #when int2   { Range.new(                  -2, 1                  ) }
        #when int1   { Range.new(                  -1, 0                  ) }

        when uint64 { Range.new( 0, 18446744073709551615 ) }
        when uint32 { Range.new( 0, 4294967295           ) }
        when uint16 { Range.new( 0, 65535                ) }
        when uint8  { Range.new( 0, 255                  ) }
        when byte   { Range.new( 0, 255                  ) }
        # Bring back in a future Perl 6 version, or just put on the type object
        #when uint4  { Range.new( 0, 15                   ) }
        #when uint2  { Range.new( 0, 3                    ) }
        #when uint1  { Range.new( 0, 1                    ) }

        default {  # some other kind of Int
            .^name eq 'UInt'
                ?? Range.new(    0, Inf, :excludes-max )
                !! Range.new( -Inf, Inf, :excludes-min, :excludes-max )
        }
    }
}

Indeed, a bit more than expected. Some of the checks are commented out but still, for a bare Int variable, you should pass all the checks for different native times first.

I assume that most Perl users either never or very seldom use native data types (such as int64 or uint32), so for my local instance of Rakudo I removed all the when clauses to see how it affects the speed of this particular method:

method Range(Int:U:) {
    Range.new( -Inf, Inf, :excludes-min, :excludes-max );
}

Compare the speed by calling a method many times. With original code:

$ time ./perl6 -e'Int.Range for ^100_000'
real 0m3.262s
user 0m3.264s
sys 0m0.043s

With a reduced Range method:

$ time ./perl6 -e'Int.Range for ^100_000'
real 0m0.268s
user 0m0.271s
sys 0m0.034s

24. Obsolete syntax error messages in Perl 6, part 3

A couple of weeks ago, we looked at some error messages that Perl 6 generates when it sees the Perl 5 constructions. Let us continue and go through another portion of the messages that are there in today’s Rakudo.

\x[]

We start with a simple error message that informs you to use new syntax when embedding a character by its code. In Perl 5, you could use \x{23} to get a hash characters, while in Perl 6 it is an error:

$ perl6 -e'say "\x{23}"'
===SORRY!=== Error while compiling -e
Unsupported use of curlies around escape argument;
in Perl 6 please use square brackets
at -e:1
------> say "\x{⏏23}"

Neither it works with regexes, for example:

say "###" ~~ /\x{23}/

Replacing braces with square brackets helps:

$ perl6 -e'say "\x[23]"'
#

Similarly, Perl 6 expects the brackets for octal numbers:

$ perl6 -e'say "\o[123]"'
S

In the Grammar, this situation is caught by the following tokens.

For quoted strings:

role b1 {
    token backslash:sym<x> {
        :dba('hex character') <sym> [ <hexint> | 
        '[' ~ ']' <hexints> | '{' <.obsbrace1> ] }
    . . .
}

For regexes:

token backslash:sym<x> { 
    :i :dba('hex character') <sym> [ <hexint> | 
    '[' ~ ']' <hexints> | '{' <.obsbrace> ] }

. . .

token metachar:sym<{}> { \\<[xo]>'{' <.obsbrace> }

The obsbrace method itself is just a simple error message call:

token obsbrace { <.obs('curlies around escape argument',
                       'square brackets')> }

Old regex modifiers

As soon as we are talking about regexes, here’s another set of error catchers complaining about the Perl 5 syntax of the regex modifiers:

token old_rx_mods {
    (<[ i g s m x c e ]>)
    {
        my $m := $/[0].Str;
        if $m eq 'i' { $/.obs('/i',':i'); }
        elsif $m eq 'g' { $/.obs('/g',':g'); }
        elsif $m eq 'm' { $/.obs('/m','^^ and $$ anchors'); }
        elsif $m eq 's' { $/.obs('/s','. or \N'); }
        elsif $m eq 'x' { $/.obs('/x','normal default whitespace'); }
        elsif $m eq 'c' { $/.obs('/c',':c or :p'); }
        elsif $m eq 'e' { $/.obs('/e','interpolated {...} or s{} = ... form'); }
        else { $/.obs('suffix regex modifiers','prefix adverbs'); }
    }
}

This code is quite self-explanatory, so a simple example would be enough:

$ ./perl6 -e'"abc" ~~ /a/i'
===SORRY!=== Error while compiling -e
Unsupported use of /i; in Perl 6 please use :i
at -e:1
------> "abc" ~~ /a/i⏏<EOL>

One of the following correct forms is expected:

$ ./perl6 -e'say "abc" ~~ m:i/A/'
「a」

$ ./perl6 -e'say "abc" ~~ /[:i A]/'
「a」

As an exercise, write an incorrect Perl 6 code that generates the last error message, Unsupported use of suffix regex modifiers, in Perl 6 please use prefix adverbs.

tr///

Another regex-related construct, y/// does not exist in Perl 6, only the tr/// form is supported now:

token quote:sym<y> {
    <sym>
    <?before \h*\W>
    {} <.qok($/)>
    <.obs('y///','tr///')>
}

Here is an example of the correct program:

my $x = "abc";
$x ~~ tr/b/p/;
say $x; # apc

That’s it for today. We will continue with more obsolete errors in a few days.

23. The internals of the ternary operator in Perl 6

Yesterday, we saw that the ternary operator is treated as an infix in the Perl 6 Grammar. The code between the two parts of the operator is caught by the <EXPR> method:

token infix:sym<?? !!> {
    :my $*GOAL := '!!';
    $<sym>='??'
    <.ws>
    <EXPR('i=')>
    [ '!!'
    . . .
    ]
    <O(|%conditional, :reducecheck<ternary>, :pasttype<if>)>
}

Now, our attraction comes to O. Namely, to its reducecheck named argument. It passes some information about the fact that this is a ternary operator.

Now, move to the next level of the compiler, to NQP, and examine the nqp/src/HLL/Grammar.nqp file, and specifically, the following method of the HLL::Grammar class there:

method EXPR_reduce(@termstack, @opstack) {
    . . .
    
    else { # infix op assoc: left|right|ternary|...
        $op[1] := nqp::pop(@termstack); # right
        $op[0] := nqp::pop(@termstack); # left

        $reducecheck := nqp::atkey(%opO, 'reducecheck');
        self."$reducecheck"($op) unless nqp::isnull($reducecheck);
        $key := 'INFIX';
    }

    self.'!reduce_with_match'('EXPR', $key, $op);
}

This is only a fragment but even this tiny part contains a few interesting details.

First, we see that the else branch handles not only the ternary operator but also some others. The left and the right operands are taken from some stack and saved in $op.

Another interesting thing is the method call:

self."$reducecheck"($op)

The name of the method is stored in the $reducecheck variable and for the ternary operator, it should contain ternary.

Here is the method:

method ternary($match) {
    $match[2] := $match[1];
    $match[1] := $match{'infix'}{'EXPR'};
}

Some swap magic here that we can ignore for now, but what is important is that the infix’s EXPR match is read here. Finally, we spotted all the three operands of the ?? !! operator.

Return to the last line of the EXPR_recude method:

self.'!reduce_with_match'('EXPR', $key, $op);

Again, a method is called here; this time the name starts with the exclamation mark. The $op parameter contains the left and the right operands; the value of $key is INFIX.

At this point you should recall that Perl 6 is using a virtual machine, so to see where the actual comparison happens, you have to dig further to the MoarVM assembly tree, which we will not do today. Meanwhile, briefly lurk into nqp/src/QRegex/Cursor.nqp to trace the above call further:

role NQPMatchRole is export {
    . . .
    method !reduce_with_match(str $name, str $key, $match) { 
        my $actions := self.actions;
        nqp::findmethod($actions, $name)($actions, $match, $key)
        if !nqp::isnull($actions) && nqp::can($actions, $name);
    }
    . . .

The highlighted line with two parentheses in a row is a call of the routine that is returned by nqp::findmethod.

Let us return back to the higher level of the compiler. If you want to visualise the data flow and print the variables, make sure you start the line with a hash character. This is needed because some of the code lands in the gen/moar directory as a collection of generated files and all your printouts will be compiled again. So, hide them from the compiler.

else { # infix op assoc: left|right|ternary|...
    $op[1] := nqp::pop(@termstack); # right
    $op[0] := nqp::pop(@termstack); # left

    nqp::say("#left =" ~ $op[0]);
    nqp::say("#right=" ~ $op[1]);

    $reducecheck := nqp::atkey(%opO, 'reducecheck');
    self."$reducecheck"($op) unless nqp::isnull($reducecheck);
    $key := 'INFIX';
}

. . .

method ternary($match) {
    nqp::say('#match=' ~ $match);
    nqp::say('#before 1='~ $match[1]);
    nqp::say('#before 2='~ $match[2]);
    $match[2] := $match[1];
    $match[1] := $match{'infix'}{'EXPR'};
    nqp::say('#after 1='~ $match[1]);
    nqp::say('#after 2='~ $match[2]);
}

Recompile everything:

$ cd nqp
$ make clean
$ make
$ make install
$ cd ..
$ make clean
$ make

And run a program with a trivial ternary condition:

$ ./perl6 -e'say 2 ?? 3 !! 4'
#left =2
#right=4
#match=?? 3 !!
#before 1=4
#before 2=
#after 1=3 
#after 2=4
3

Great! Today, it was a deep dive into the compiler, and I hope it gave you an understanding of how the ternary operator treats its three operands in Perl 6.

22. The infix nature of the ternary operator in Perl 6

The ternary operator ?? !! takes three operands, obviously. Although, it is said in the documentation that the operator is an infix. Let us figure out why.

Here is the fragment from the Grammar that handles the ternary operator:

token infix:sym<?? !!> {
    :my $*GOAL := '!!';
    $<sym>='??'
    <.ws>
    <EXPR('i=')>
    [ '!!'
    || <?before '::' <.-[=]>> { self.typed_panic: "X::Syntax::ConditionalOperator::SecondPartInvalid", second-part => "::" }
    || <?before ':' <.-[=\w]>> { self.typed_panic: "X::Syntax::ConditionalOperator::SecondPartInvalid", second-part => ":" }
    || <infixish> { self.typed_panic: "X::Syntax::ConditionalOperator::PrecedenceTooLoose", operator => ~$<infixish> }
    || <?{ ~$<EXPR> ~~ / '!!' / }> { self.typed_panic: "X::Syntax::ConditionalOperator::SecondPartGobbled" }
    || <?before \N*? [\n\N*?]? '!!'> { self.typed_panic: "X::Syntax::Confused", reason => "Confused: Bogus code found before the !! of conditional operator" }
    || { self.typed_panic: "X::Syntax::Confused", reason => "Confused: Found ?? but no !!" }
    ]
    <O(|%conditional, :reducecheck<ternary>, :pasttype<if>)>
}

The most of the body is filled with different error reporting additions that pop up when something is wrong with the second part of the operator. Let us reduce the noise and implement the simplest form of our own artificial operator ¿¿ ¡¡ that does no such error checking:

token infix:sym<¿¿ ¡¡> {
    '¿¿'
    <.ws>
    <EXPR('i=')>
    '¡¡'
    <O(|%conditional, :reducecheck<ternary>, :pasttype<if>)>
}

Now you can clearly see the structure of the token. It matches the following components: a literal string '¿¿', an expression, and another string '¡¡'. We’ve already seen the use of the O token when we were talking about precedence.

If you consider ¿¿ ¡¡ as an infix operator  ($left ¿¿ $mid ¡¡ $right), then the first and the third operands of the ternary operator are, respectively, the left and the right operands of the combined operator. The code between ¿¿ and ¡¡ is caught by the <EXPR> rule.

Before wrapping up for today, let us print what the Grammar finds at that position:

token infix:sym<¿¿ ¡¡> {
    '¿¿'
    <.ws>
    <EXPR('i=')> {
        nqp::say('Inside we see: ' ~ $<EXPR>)
    }
    '¡¡'
    <O(|%conditional, :reducecheck<ternary>, :pasttype<if>)>
}

Compile and run:

$ ./perl6 -e'say 3 ¿¿ 4 ¡¡ 5'
Inside we see: 4 
4

It is quite difficult, though, to find the place where the actual logic of the ternary operator is defined. I would not even recommend doing that as homework 🙂 Nevertheless, come back tomorrow to see more details of the internals of the ternary operators.

21. The tolerance operator in Perl 6

In Perl 6, there is a so-called approximately-equal operator =~=. It compares two numbers approximately.

If both values are non-zero, the operator calculates their relative difference; the tolerance is defined by the $*TOLERANCE variable, which equals to 1E-15 by default. So, for two numbers $a and $b, the result (in pseudo-code) is:

|$a - $b| / max(|$a|, |$b|) < $*TOLERANCE

(As an exercise, try implementing the absolute value operator so that it looks like the mathematical notation above.)

Let us look at the implementation of the operator. It is located in src/core/Numeric.pm.

First of all, you will notice that the ASCII variant is directly converted to the call of the Unicode version:

sub infix:<=~=>(|c) { infix:<≅>(|c) }

The actual code is placed just above that line.

proto sub infix:<≅>(Mu $?, Mu $?, *%) {*} # note, can't be pure due to dynvar
multi sub infix:<≅>($?) { Bool::True }
multi sub infix:<≅>(\a, \b, :$tolerance = $*TOLERANCE) {
    # If operands are non-0, scale the tolerance to the larger of the abs values.
    # We test b first since $value ≅ 0 is the usual idiom and falsifies faster.
    if b && a && $tolerance {
        abs(a - b) < (a.abs max b.abs) * $tolerance;
    }
    else { # interpret tolerance as absolute
        abs(a.Num - b.Num) < $tolerance;
    }
}

As you see here, the routine checks if both operands are non-zero, and in this case uses the formula. If at least one of the operands is zero, the check is simpler and basically means whether the non-zero value is small enough. (Ignore the presence of the tolerance adverb for simplicity.)

Compare the speed of the two branches by making thousands of comparisons:

$ time ./perl6 -e'0.1 =~= 0 for ^100_000'
$ time ./perl6 -e'0.1 =~= 0.2 for ^100_000'

On my computer, the times were approximately 2.5 and 4.3 seconds. So, indeed, the check is faster if one of the values is zero.

But now think about the algorithm. The subroutine tests its arguments and decides which of the two ways to go. Does it ring a bell for you?

This is exactly what multi-subs are meant for!

So, lets us re-write the code to have all variants in separate multi-subs:

multi sub infix:<≅>(0, 0, :$tolerance = $*TOLERANCE) {
    Bool::True
}

multi sub infix:<≅>(\a, 0, :$tolerance = $*TOLERANCE) {
    a.abs < $tolerance
}

multi sub infix:<≅>(0, \b, :$tolerance = $*TOLERANCE) {
    b.abs < $tolerance
}

multi sub infix:<≅>(\a, \b, :$tolerance = $*TOLERANCE) {
    abs(a - b) < (a.abs max b.abs) * $tolerance;
}

Recompile and run the same time measurements. This time, it was 2.8 and 3.8 seconds. So, for non-zero arguments its became 10-15% faster, and a bit slower in the other case.

Is there more room for improvement? What I don’t really like is an additional named argument that is present everywhere. As we still can change the $*TOLERANCE variable locally, why always passing it? Create more multi-subs:

multi sub infix:<≅>(0, 0) {
    Bool::True
}

multi sub infix:<≅>(\a, 0) {
    a.abs < $*TOLERANCE
}

multi sub infix:<≅>(0, \b) {
    b.abs < $*TOLERANCE
}

multi sub infix:<≅>(\a, \b) {
    abs(a - b) < (a.abs max b.abs) * $*TOLERANCE;
}


multi sub infix:<≅>(0, 0, :$tolerance) {
    Bool::True
}

multi sub infix:<≅>(\a, 0, :$tolerance) {
    a.abs < $tolerance
}

multi sub infix:<≅>(0, \b, :$tolerance) {
    b.abs < $tolerance
}

# multi sub infix:<≅>(\a, \b, :$tolerance) {
#     abs(a - b) < (a.abs max b.abs) * $tolerance;
# }

At this point, there are two sets of multi-subs: pure functions for two arguments, and functions that take the custom tolerance value.

Compile. Run. Measure.

Perl 6 shows its fantastic ability of multiple dispatching. This time, the average time for both cases (0.1 =~= 0 and 0.1 =~= 0.2) was approximately the same: 2.5 seconds. Which speeds up the original operator for about 70%!

(The last sub is commented out as it leads to an infinite error message that one of the variables is undefined ¯\_(ツ)_/¯. I tried to fix it by adding Mu:D before the adverb but it decreased the speed back to 3.8 seconds, which is still better then the original result, though.)