Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: grab 'n' lines from a file above and below a /match/

by Aristotle (Chancellor)
on Sep 17, 2004 at 06:23 UTC ( [id://391683]=note: print w/replies, xml ) Need Help??


in reply to Re: grab 'n' lines from a file above and below a /match/
in thread grab 'n' lines from a file above and below a /match/

It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of.

Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed.

You gave me an idea with regards to memory consumption, though:

#!/usr/bin/perl use strict; use warnings; use Fcntl qw( :seek ); my $rx = qr/c9391b56-b174-441b-921c-7d63/; my $to_print = 0; my $context = 10; my @offs = ( 0 ) x ( 1 + $context ); while(<>) { my $context_start = shift @offs; my $here = tell ARGV; push @offs, $here; if( /$rx/ ) { if( not $to_print ) { my $length = $here - $context_start; seek ARGV, $context_start, SEEK_SET; read ARGV, $_, $length; } $to_print = 1 + $context; } --$to_print, print if $to_print; }

This only needs to keep $context offsets in memory.

Update: fixed bugs. It was ( 0 ) x $context which gave one too few lines of before-context and $here - $context_start + length which of course ate too much input — but that wasn't obvious with my test data. Oopsie.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re^3: grab 'n' lines from a file above and below a /match/
by mrpeabody (Friar) on Sep 20, 2004 at 03:07 UTC
    It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of.
    Oops. Guessed wrong, then.

    Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed.
    That was intentional, and it depends on your definition of "missed". That hit will be printed with the context of the previous hit. Changing the behavior would just require removing the line:
    $i += $context;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://391683]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-23 06:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found