18. Implementing negative array subscripts in Perl 6

A few days ago, we saw how Perl 6 checks the syntax if you are trying to index an array with negative indices. Since then, I was thinking about implementing the support of @a[-1]. It was not that easy, that’s why I did not demonstrate this last time 🙂

Before going further, a small disclaimer. Negative indices are not allowed in Perl 6 for a reason. Unfortunately, this is not very clear from the documentation why. It is said that you should use @a[* - 1], which is equivalent to @a[@a.elems - 1], but no further explanation follows. There is the following phrase in the old design document: ‘Negative subscripts are never allowed for standard subscripts unless the subscript is declared modular.’ So, today, I will ignore this restriction assuming that I don’t understand the limitation completely (actually, at the end of the post you will see what kind of additional problems arise if you allow negative indices). My goal here is just to see how it can be done internally. (Update: also, see the comment to the post for some thoughts about forbidding negative indices.)

First, once again look at the place in src/Perl6/Actions.nqp (the file tightly connected with the Perl 6 Grammar), that implements the behaviour of the [] postcircumfix brackets. If it detects a negative index, a compile-time exception is thrown.

method postcircumfix:sym<[ ]>($/) {
    my $past := QAST::Op.new( :name('&postcircumfix:<[ ]>'), :op('call'), :node($/) );
    if $<semilist> {
        my $c := $/;
        my $ast := $<semilist>.ast;
        $past.push($ast) if nqp::istype($ast, QAST::Stmts);
        if $ast.ann('multislice') {
            $past.name('&postcircumfix:<[; ]>');
        # for nqp::split(';', ~$<semilist>) {
        #     my $ix := $_ ~~ / [ ^ | '..' ] \s* <( '-' \d+ )> \s* $ /;
        #     if $ix {
        #         $c.obs("a negative " ~ $ix ~ " subscript to index from the end", "a function such as *" ~ $ix);
        #     }
        # }
    make WANTED($past, '.[]');

Here, I already commented out the code to remove the check. Of course, it would be too naïve to think that this solves the task.

First, let us try to de-parse what is happening here. The if check looks at the presence of $<semilist>. What is it? Refer to the Grammar (in src/Perl6/Grammar.nqp):

token postcircumfix:sym<[ ]> {
    :my $*QSIGIL := '';
    '[' ~ ']' [ <.ws> <semilist> ]

So,  <semilist> contains everything in-between the brackets. In the method above, we work with its content as with a string:

for nqp::split(';', ~$<semilist>) {
    . . .

That looks promising but it is not possible just to assign a new string value there, so that when the program sees a negative index N, it replaces it with the * - N string, letting Perl parse it further.

The $<semilist> object is not a string, as you can see from, for example, the following usage of it: $<semilist>.ast. My next idea was to build a piece of AST to replace the negative index. I rejected this idea as soon as I saw the AST output of a simple @a[*-1] call:

$ perl6 --target=ast -e'my @a; say @a[*-1]'
. . .
 - QAST::Stmts 
 - QAST::WVal(Array) 
 - QAST::Stmts <sunk> my @a; say @a[*-1]
     - QAST::Stmt <sunk> my @a
     - QAST::Var(lexical @a) <sinkok> :statement_id<?> @a
     - QAST::Stmt <sunk final> say @a[*-1]
     - QAST::Want <sunk>
         - QAST::Op(call &say) <sunk> :statement_id<?> say @a[*-1]
         - QAST::Op(call &postcircumfix:<[ ]>) <wanted> [*-1]
             - QAST::Var(lexical @a) <wanted> @a
             - QAST::Stmts <wanted> *-1
             - QAST::Op(p6capturelex) <wanted> :statement_id<?> :past_block<?> :code_object<?>
                 - QAST::Op(callmethod clone) 
                 - QAST::WVal(WhateverCode) :past_block<?> :code_object<?>
         - v
         - QAST::Op(p6sink) 
         - QAST::Op(call &say) <sunk> :statement_id<?> say @a[*-1]
             - QAST::Op(call &postcircumfix:<[ ]>) <wanted> [*-1]
             - QAST::Var(lexical @a) <wanted> @a
             - QAST::Stmts <wanted> *-1
                 - QAST::Op(p6capturelex) <wanted> :statement_id<?> :past_block<?> :code_object<?>
                 - QAST::Op(callmethod clone) 
                     - QAST::WVal(WhateverCode) :past_block<?> :code_object<?>
 - QAST::WVal(Nil)

It looks to scary to reproduce. A different approach is needed.

Meanwhile, take the second look at the regex that extracts a negative index:

/ [ ^ | '..' ] \s* <( '-' \d+ )> \s* $ /

It accepts only two alternatives: negative integers and something that ends with a range, for example, .. -3. Aha, should I also handle ranges? But in the case of a range, the regex only contains the end of the potentially incorrect string. Again, not clear what to do.

OK, let us then look at that mysterious <semilist>. Here is its definition in the Grammar:

rule semilist {
    :dba('list composer')
    | <?before <.[)\]}]> >
    | [<statement><.eat_terminator> ]*

OMG, it can contain statements inside! Indeed, Perl 6 allows, for example, having a function call or a math operation between the brackets:

say @a[f() + 1];

Yahoo! What does Rakudo say when the calculated index is negative?

$ perl6 -e'my @a = <a b c>; say @a[2-3]'
Index out of range. Is: -1, should be in 0..^Inf
  in block <unit> at -e line 1

This time, an error happens at runtime (ignore the fact the 2-3 expression can be optimised) and the compiler did not catch that (if the case of a function, no optimisation can do that).

The text of the error message leads us to src/core/Array.pm, where among the rest, the AT-POS method is located:

multi method AT-POS(Array:D: Int:D $pos) is raw {
        && nqp::isconcrete(nqp::getattr(self,List,'$!reified')),

The logic here is to call nqp::atpos for non-negative indices, which can be accessed in the array—that call returns the required element. For all the rest (including negative subscripts), the AT-POS-SLOW method is called. The above-shown runtime error happens inside AT-POS-SLOW. So, let us try not to pass control to it.

Now, it is time to remember that our idea was to count from the end of the array if the index is negative. In other words, let us modify the $pos variable here. You may find it very useful to consult the nqp/docs/ops.markdown document that describes NQP operators. After some experimenting, the following lines were added to the method:

multi method AT-POS(Array:D: Int:D $pos) is raw {
      nqp::islt_i($pos, 0),
      $pos := nqp::add_i($pos, nqp::elems(nqp::getattr(self,List,'$!reified')))
    . . .

If $pos is negative (less than zero), add the length of the array to it. The rest of the method remains the same, as the index should be either zero or positive after the update.

Compile and test!

$ ./perl6 -e'my @a = <a b c d>; say @a[-1]'

Isn’t it what we wanted? What about slices?

$ ./perl6 -e'my @a = <a b c d>; say @a[-1,-2]'
(d c)

They also work!


$ ./perl6 -e'my @a = <a b c d>; say @a[-3..-2]'
(b c)

Here you are!

If you want to continue, you have to decide what happens when a negative index it too big for the given array:

$ ./perl6 -e'my @a = <a b c d>; say @a[10-20]'
Index out of range. Is: -6, should be in 0..^Inf
  in block <unit> at -e line 1

For positive indices, going out of the array returns (Any). Probably, this should also be the case for big negative indices. Alternatively, you divide an index by modulo and thus making an index ‘loop.’ I will leave this as an exercise for the reader.

4 thoughts on “18. Implementing negative array subscripts in Perl 6

  1. It’s difficult to nail down the semantics of an element that’s outside the bounds of the array in the negative: if you go outside of the array in the positive, you can actually assign to the result and the array will grow, i.e. `my @a; @a[4] = 3; say @a` giving `[(Any) (Any) (Any) (Any) 3]`. However, if your array is 3 elements big and you index at -4, what would assigning to that do? Would it resize the array by moving all existing elements to the side once, so your newly assigned value would be at the start of the array?

    The reason why Perl 6 refuses to accept negative indices in the first place is that it’s easy to accidentally calculate a negative value as the index to be used, for example with an off-by-one error, and then accidentally accessing an element from the end. Forcing the programmer to use a different syntax makes that a lot less likely to happen.

    Hope that helps!

    Liked by 1 person

  2. “However, if your array is 3 elements big and you index at -4, what would assigning to that do? ”
    Die. Array out-of-bounds access error. And then at declaration, you state if you want to play “negative ball.” I only say this in the context of this interesting example – I’m not suggesting the language actually do this, of course.
    Loving the series, Andrew.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s