"... helped me understand what I was doing wrong ..."
OK, that's a good start.
"Using your code, I get results in ~10 seconds, which is acceptable (though still a lot slower than the shell script that pipes into Perl, and I'm not sure why that is). "
I'm completely guessing but the overhead may be due to the IO::Uncompress::Bunzip2 module.
You could avoid using that module by setting up the same pipe but from within the Perl script
(rather than piping to that script).
I put exactly the same data I used previously into a text file (just a copy and paste):
$ cat > pm_1196493_paragraph_mode_test_data.txt
Block1 Line1
...
Block4 Line3
^D
I then modified the start of my previous example code, so it now looks like this:
#!/usr/bin/env perl -l
use strict;
use warnings;
use autodie;
my $filename = 'pm_1196493_paragraph_mode_test_data.txt';
open my $z, '-|', "cat $filename";
{
local $/ = '';
while (<$z>) {
chomp;
print '--- One Block ---';
print;
}
}
This produces exactly the same output as before.
Obviously, you'll want to change 'cat' to '/usr/bin/bzcat'
(and, of course, use *.bz2 instead of *.txt files).
This solution will not be platform-independent: that may not matter to you.
See open for more on the '-|',
and closely related '|-', modes.
Also, note that I used the autodie pragma.
If you want more control over handling I/O problems,
you can hand-craft messages (e.g. open ... or die "..."),
or use something like Try::Tiny.
|