How to print the lines immediately above and below a matching line?

http://www.perlmonks.org?node_id=1005485

Bio90 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am trying to match a specific part of a line and print the line above and below it, in full.

Through various internet searches I have found this code which I believe is suitable to the task:

use strict;
use warnings;

open( my $fh, '<', 'input.txt') or die "Error opening file - $!\n";
open OUT, ">", "output.txt" or die "could not open output.txt $! \n";

my $this_line = "";
my $do_next = 0;

while(<$fh>) {
    my $last_line = $this_line;
    $this_line = $_;

    if ($this_line =~ /<DATA>/) {
        print OUT $last_line unless $do_next;
        print OUT $this_line;
        $do_next = 1;
    } else {
        print OUT $this_line if $do_next;
        $last_line = "";
        $do_next = 0;
    }
}
close ($fh);

__DATA__
4386_7#8
4350_7#6
4414_1#6
4465_5#1
etc...
[download]

The data are not a line by themselves, rather they are part of a line.

When I run this code the output produced is just a blank text file. It does not return any error messages, and I am sure that matches I am searching for are in the file I am searching.

Any help as to what might be the problem would be much appreciated.

Thanks in advance,

Bio

Comment on How to print the lines immediately above and below a matching line? Download Code

Replies are listed 'Best First'.

Re: How to print the lines immediately above and below a matching line?
by toolic (Bishop) on Nov 25, 2012 at 13:23 UTC

if ($this_line =~ /<DATA>/) {

DATA

<DATA>

perl -Mre=debug mycode.pl
Compiling REx "<DATA>"
Final program:
   1: EXACT <<DATA>> (4)
   4: END (0)
anchored "<DATA>" at 0 (checking anchored isall) minlen 6 
Error opening file - No such file or directory
Freeing REx: "<DATA>"
[download]

[reply]
[d/l]
[select]

Re^2: How to print the lines immediately above and below a matching line?

by Bio90 (Initiate) on Nov 25, 2012 at 13:36 UTC

I am trying to match from the DATA handle.

Re^3: How to print the lines immediately above and below a matching line?

by toolic (Bishop) on Nov 25, 2012 at 13:41 UTC

my $data = <DATA>;
chomp $data;
if ($this_line =~ /\Q$data/)
[download]

Re^4: How to print the lines immediately above and below a matching line?

by Bio90 (Initiate) on Nov 25, 2012 at 13:55 UTC

Re^5: How to print the lines immediately above and below a matching line?

by Anonymous Monk on Nov 25, 2012 at 14:00 UTC

Some notes below your chosen depth have not been shown here

Re^3: How to print the lines immediately above and below a matching line?

by Anonymous Monk on Nov 25, 2012 at 13:40 UTC

Re^4: How to print the lines immediately above and below a matching line?

by Bio90 (Initiate) on Nov 25, 2012 at 13:52 UTC

Re^5: How to print the lines immediately above and below a matching line?

by Anonymous Monk on Nov 25, 2012 at 14:01 UTC

Re: How to print the lines immediately above and below a matching line?
by karlgoethebier (Abbot) on Nov 25, 2012 at 20:35 UTC

As far as i understood the basic theme is: "...match a specific part of a line and print the line above and below it, in full." Please correct me if i'm wrong.

I would do it like this:

#!/usr/bin/perl

use strict;
use warnings;
use Tie::File;

# my $pattern = qr/(^4000.+)/;
# my $pattern = qr/(^4001.+)/;
my $pattern = qr/(^4002.+)/;
# my $pattern = qr/(4003.+)/;
# my $pattern = qr/(^4004.+)/;
# my $pattern = qr/(^4005.+)/;

tie my @lines, 'Tie::File', shift || die;

my $idx = 0;

for my $line(@lines){
   print qq($idx $line\n);
   if( $line =~ m/($pattern)/ ){
      if( $idx == 0){
         print qq(Heuraka: $1 next: $lines[ ( $idx + 1) ]\n);
      };
      if ( $idx == scalar( @lines - 1 ) ) {
          print qq(Heuraka: $1 previous: $lines[ ( $idx - 1 )] \n);
      };
      if ( $idx ~~ [ 1..scalar( @lines - 2 ) ]) {
          print qq(Heuraka: $1 previous: $lines[ ( $idx - 1 ) ] next: 
+$lines[ ( $idx + 1 ) ]\n);
      };
    }
    ++$idx;
}

untie @lines || die;

__END__

Karls-Mac-mini:Desktop karl$ cat MyData.txt
4000_1#0
4001_1#1
4002_1#2
4003_1#3
4004_1#4
4005_1#5

Karls-Mac-mini:Desktop karl$ ./test.pl  MyData.txt 
0 4000_1#0
1 4001_1#1
2 4002_1#2
Heuraka: 4002_1#2 previous: 4001_1#1 next: 4003_1#3
3 4003_1#3
4 4004_1#4
5 4005_1#5
[download]

See also: Tie::File

Regards, Karl

ŤThe Crux of the Biscuit is the Apostropheť

Re: How to print the lines immediately above and below a matching line?
by afoken (Chancellor) on Nov 25, 2012 at 16:56 UTC

/tmp>grep -C1 halt /etc/passwd
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/:/bin/false
/tmp>ack -C1 halt /etc/passwd
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/:/bin/false
/tmp>
[download]

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Re^2: How to print the lines immediately above and below a matching line?

by karlgoethebier (Abbot) on Nov 25, 2012 at 21:54 UTC

Yes...

Karls-Mac-mini:Desktop karl$ grep -C1 4003 MyData.txt 
4002_1#2
4003_1#3
4004_1#4
[download]

...but do you really want to qx this on a 4 GByte file? Regards, Karl

ŤThe Crux of the Biscuit is the Apostropheť

Re^3: How to print the lines immediately above and below a matching line?

by afoken (Chancellor) on Nov 26, 2012 at 19:16 UTC

but do you really want to qx this on a 4 GByte file?

No. I would not use Perl at all just to call grep. My shell can start grep fine without needing Perl.

4 GByte should be no problem for grep, at least not for GNU grep. Actually, I expect grep to be at least as fast as a perl script, and I expect it to use less memory. Simply because grep is optimized for exactly that job.

By the way: grep has lots of other useful options, like showing line numbers and/or file names, again no need to write Perl code.

A quite useful alternative to grep is ack. It shares many features with GNU grep, and does some things better. ack ignores files and directories you typically do not want to search by default, it uses Perl regexp syntax instead of "basic" or "extended" regexp syntax, and it has a configuration file.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Re^4: How to print the lines immediately above and below a matching line?

by karlgoethebier (Abbot) on Nov 27, 2012 at 18:34 UTC

Re^2: How to print the lines immediately above and below a matching line?

by Anonymous Monk on Nov 25, 2012 at 18:30 UTC

shh! don't give away the secrets

Re: How to print the lines immediately above and below a matching line?
by Kenosis (Priest) on Nov 25, 2012 at 18:37 UTC

You mention wanting to print the lines above and below a matching line, but your code prints the matching line, too. In case you wanted to print all (two or) three lines, consider the following that you can adapt for files:

use strict;
use warnings;

my ( $prevLine, $nextLine );

for ( ; ; ) {
    last if eof DATA;
    chomp( my $currLine = defined $nextLine ? $nextLine : <DATA> );

    if ( $currLine =~ /match this/ ) {
        print '-' x 25, "\n";
        chomp( $nextLine = <DATA> ) if !eof DATA;

        print $prevLine, "\n" if defined $prevLine;
        print $currLine, "\n";
        print $nextLine, "\n" if defined $nextLine;
        print '-' x 25, "\n";
    }
    else {
        undef $nextLine;
    }

    $prevLine = $currLine;
}

__DATA__
The first line match this
Not this
abcdefg
The one above
Another match this 1
Another match this 2
the one below match this 2
zxcvbnn

Another match this blank above
Second to the last line
The last line match this
[download]

Output:

-------------------------
The first line match this
Not this
-------------------------
-------------------------
The one above
Another match this 1
Another match this 2
-------------------------
-------------------------
Another match this 1
Another match this 2
the one below match this 2
-------------------------
-------------------------
Another match this 2
the one below match this 2
zxcvbnn
-------------------------
-------------------------

Another match this blank above
Second to the last line
-------------------------
-------------------------
Second to the last line
The last line match this
-------------------------
[download]

The dashes are printed to show the desired output. If the first or last line is a match, only two lines are printed. If you only want the lines above and below a matching line, delete print $currLine, "\n";.

Hope this helps!

Addition: If you want to avoid printing the same line more than once--like in the example above--and have output that more closely resembles grepping the file, you can do the following:

Read more... (2 kB)

[reply]
[d/l]
[select]

Re: How to print the lines immediately above and below a matching line?
by space_monk (Chaplain) on Nov 25, 2012 at 16:29 UTC

Instead of reading the file line by line as suggested in other answers above, you could also read the entire file into a scalar and use a multi line regexp to do it. Look up use of the /m option on regexp matching.

Any Monk who wishes to extend this thread with a complete answer using this method is more than welcome to do so (I'm a bit short of time)

A Monk aims to give answers to those who have none, and to learn from those who know more.

Re^2: How to print the lines immediately above and below a matching line?

by Anonymous Monk on Nov 25, 2012 at 16:33 UTC

If at ever there will need to be printed most at then some three lines, read the whole file is waste of memory

Re^3: How to print the lines immediately above and below a matching line?

by roboticus (Chancellor) on Nov 25, 2012 at 18:38 UTC

Waste of memory? Memory is there to be used ... what are you saving it for?

Sure, for a long-lived application may want to be leery of using a large quantity of ram. But if it solves the immediate problem at hand, then using memory isn't really a problem. Also, memory is so large anymore that you need to work with *big* files if you're going to make memory usage a problem. The ordinary file isn't really going to be a problem.

For example, here's a histogram of file sizes on a couple of my machines--my work laptop (LT0186) and my goofing off computer (Boink):

files smallar than	LT0186	Boink
1	2852	122103
10	701	5777
25	3920	30988
50	7793	31843
100	4501	41932
250	10112	128614
500	14385	119923
1k	31564	192614
2.5k	40133	275173
5k	33471	218245
10k	34710	233223
25k	27628	211316
50k	14394	100595
100k	12579	71556
250k	9674	61003
500k	4961	22754
1M	3508	13800
2.5M	2013	8325
5M	852	4279
10M	738	4586
25M	365	1958
50M	223	634
100M	54	372
250M	52	129
500M	22	63
1G	16	55
2.5G	9	32
5G	0	3
10G	0	1

I wouldn't take a second thought about just loading a file under 500M into RAM, and as you can see, I have *very few* files larger than that. And for a simple task like the one presented, I'd probably go ahead and try it on larger files (swap space permitting) and go take a break.

When your only tool is a hammer, all problems look like your thumb.

Re^4: How to print the lines immediately above and below a matching line?

by Anonymous Monk on Nov 25, 2012 at 18:42 UTC

Re^5: How to print the lines immediately above and below a matching line?

by roboticus (Chancellor) on Nov 25, 2012 at 19:16 UTC

Some notes below your chosen depth have not been shown here

Re^5: How to print the lines immediately above and below a matching line?

by Kenosis (Priest) on Nov 25, 2012 at 23:48 UTC

Some notes below your chosen depth have not been shown here

Back to Seekers of Perl Wisdom