๐Ÿ“˜ Parallel file processing in Perl 6

Process the files from the current directory in a few parallel threads.

We have to do something with each file in the directory, and it has to be done in such a way that files are processed independently with a few workers. It is not possible to predict how long the process will take for each individual file, thatโ€™s why we need a common queue, which supplies the filenames for the next available worker.

A good candidate for the queue is a channel.

my $channel = Channel.new();
$channel.send($_) for dir();
$channel.close;

All the file names are sent to the channel, which we close afterward. (On how to read directories, see more details in Task 97, Reading directory contents.)

Channels are designed to work thread-safe. It means that it is possible to get data from the channel using several threads, and each value is processed only once. Perl 6 cannot predict which thread gets which name but it can guarantee that each data item is only read by the threads once.

my @workers;
for 1..4 {
    push @workers, start {
        while (my $file = $channel.poll) {
            do_something($file);
        }
    } 
}

The code on the previous page creates four independent workers using the startkeyword. As they are executed independently not only from each other but also from the main program, it is important to wait until all of them are done:

await(@workers);

The elements of theย @workers array are promises (objects of theย Promise data type). Theย await routine waits until all the promises are kept.

Another practical way of creating and waiting workers is shown in Task 92,ย Sleep Sort: instead of collecting them in an array, you can use the gather andย take keywords.

Examine the main loop:

while (my $file = $channel.poll) {
    do_something($file);
}

On each iteration, a value from the channel is read. Theย poll method ensures that the reading stops after the channel is exhausted.

All four threads are doing similar work and are polling the same channel. This approach distributes the filenames that were sent to the channel between the workers. As a name has been read, it is removed from the channel, and the next read request returns the next name.

Finally, cook theย do_something sub according to your needs. In the following simplest example, it only prints filenames:

sub do_something($file) {
ย ย ย ย say $file.path;
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s