in reply to Re^2: Searching large files a block at a time
in thread Searching large files a block at a time

"... helped me understand what I was doing wrong ..."

OK, that's a good start.

"Using your code, I get results in ~10 seconds, which is acceptable (though still a lot slower than the shell script that pipes into Perl, and I'm not sure why that is). "

I'm completely guessing but the overhead may be due to the IO::Uncompress::Bunzip2 module. You could avoid using that module by setting up the same pipe but from within the Perl script (rather than piping to that script).

I put exactly the same data I used previously into a text file (just a copy and paste):

$ cat > pm_1196493_paragraph_mode_test_data.txt Block1 Line1 ... Block4 Line3 ^D

I then modified the start of my previous example code, so it now looks like this:

#!/usr/bin/env perl -l use strict; use warnings; use autodie; my $filename = 'pm_1196493_paragraph_mode_test_data.txt'; open my $z, '-|', "cat $filename"; { local $/ = ''; while (<$z>) { chomp; print '--- One Block ---'; print; } }

This produces exactly the same output as before. Obviously, you'll want to change 'cat' to '/usr/bin/bzcat' (and, of course, use *.bz2 instead of *.txt files). This solution will not be platform-independent: that may not matter to you. See open for more on the '-|', and closely related '|-', modes.

Also, note that I used the autodie pragma. If you want more control over handling I/O problems, you can hand-craft messages (e.g. open ... or die "..."), or use something like Try::Tiny.

— Ken