in reply to Re^2: Searching large files a block at a time
in thread Searching large files a block at a time
"... helped me understand what I was doing wrong ..."
OK, that's a good start.
"Using your code, I get results in ~10 seconds, which is acceptable (though still a lot slower than the shell script that pipes into Perl, and I'm not sure why that is). "
I'm completely guessing but the overhead may be due to the IO::Uncompress::Bunzip2 module. You could avoid using that module by setting up the same pipe but from within the Perl script (rather than piping to that script).
I put exactly the same data I used previously into a text file (just a copy and paste):
$ cat > pm_1196493_paragraph_mode_test_data.txt Block1 Line1 ... Block4 Line3 ^D
I then modified the start of my previous example code, so it now looks like this:
#!/usr/bin/env perl -l use strict; use warnings; use autodie; my $filename = 'pm_1196493_paragraph_mode_test_data.txt'; open my $z, '-|', "cat $filename"; { local $/ = ''; while (<$z>) { chomp; print '--- One Block ---'; print; } }
This produces exactly the same output as before. Obviously, you'll want to change 'cat' to '/usr/bin/bzcat' (and, of course, use *.bz2 instead of *.txt files). This solution will not be platform-independent: that may not matter to you. See open for more on the '-|', and closely related '|-', modes.
Also, note that I used the autodie pragma. If you want more control over handling I/O problems, you can hand-craft messages (e.g. open ... or die "..."), or use something like Try::Tiny.
— Ken
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Searching large files a block at a time
by JediWombat (Novice) on Aug 03, 2017 at 23:56 UTC | |
by marioroy (Parson) on Aug 04, 2017 at 04:41 UTC | |
by marioroy (Parson) on Aug 04, 2017 at 15:46 UTC | |
by marioroy (Parson) on Aug 05, 2017 at 00:23 UTC |