http://www.perlmonks.org?node_id=1005485

Bio90 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am trying to match a specific part of a line and print the line above and below it, in full.

Through various internet searches I have found this code which I believe is suitable to the task:
use strict; use warnings; open( my $fh, '<', 'input.txt') or die "Error opening file - $!\n"; open OUT, ">", "output.txt" or die "could not open output.txt $! \n"; my $this_line = ""; my $do_next = 0; while(<$fh>) { my $last_line = $this_line; $this_line = $_; if ($this_line =~ /<DATA>/) { print OUT $last_line unless $do_next; print OUT $this_line; $do_next = 1; } else { print OUT $this_line if $do_next; $last_line = ""; $do_next = 0; } } close ($fh); __DATA__ 4386_7#8 4350_7#6 4414_1#6 4465_5#1 etc...

The data are not a line by themselves, rather they are part of a line.

When I run this code the output produced is just a blank text file. It does not return any error messages, and I am sure that matches I am searching for are in the file I am searching.

Any help as to what might be the problem would be much appreciated.

Thanks in advance,

Bio

  • Comment on How to print the lines immediately above and below a matching line?
  • Download Code

Replies are listed 'Best First'.
Re: How to print the lines immediately above and below a matching line?
by toolic (Bishop) on Nov 25, 2012 at 13:23 UTC
    if ($this_line =~ /<DATA>/) {
    Are you trying to read from the special DATA handle, or are you trying to match the exact string <DATA>? re indicates you are doing the latter:
    perl -Mre=debug mycode.pl Compiling REx "<DATA>" Final program: 1: EXACT <<DATA>> (4) 4: END (0) anchored "<DATA>" at 0 (checking anchored isall) minlen 6 Error opening file - No such file or directory Freeing REx: "<DATA>"
      Hi, thanks for your reply.

      I am trying to match from the DATA handle.

        Then you need to change your code. Something like:
        my $data = <DATA>; chomp $data; if ($this_line =~ /\Q$data/)
        Maybe you should read a line from that handle instead ... don't skip the basics, read perlintro
Re: How to print the lines immediately above and below a matching line?
by karlgoethebier (Abbot) on Nov 25, 2012 at 20:35 UTC

    As far as i understood the basic theme is: "...match a specific part of a line and print the line above and below it, in full." Please correct me if i'm wrong.

    I would do it like this:

    #!/usr/bin/perl use strict; use warnings; use Tie::File; # my $pattern = qr/(^4000.+)/; # my $pattern = qr/(^4001.+)/; my $pattern = qr/(^4002.+)/; # my $pattern = qr/(4003.+)/; # my $pattern = qr/(^4004.+)/; # my $pattern = qr/(^4005.+)/; tie my @lines, 'Tie::File', shift || die; my $idx = 0; for my $line(@lines){ print qq($idx $line\n); if( $line =~ m/($pattern)/ ){ if( $idx == 0){ print qq(Heuraka: $1 next: $lines[ ( $idx + 1) ]\n); }; if ( $idx == scalar( @lines - 1 ) ) { print qq(Heuraka: $1 previous: $lines[ ( $idx - 1 )] \n); }; if ( $idx ~~ [ 1..scalar( @lines - 2 ) ]) { print qq(Heuraka: $1 previous: $lines[ ( $idx - 1 ) ] next: +$lines[ ( $idx + 1 ) ]\n); }; } ++$idx; } untie @lines || die; __END__ Karls-Mac-mini:Desktop karl$ cat MyData.txt 4000_1#0 4001_1#1 4002_1#2 4003_1#3 4004_1#4 4005_1#5 Karls-Mac-mini:Desktop karl$ ./test.pl MyData.txt 0 4000_1#0 1 4001_1#1 2 4002_1#2 Heuraka: 4002_1#2 previous: 4001_1#1 next: 4003_1#3 3 4003_1#3 4 4004_1#4 5 4005_1#5

    See also: Tie::File

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Re: How to print the lines immediately above and below a matching line?
by afoken (Chancellor) on Nov 25, 2012 at 16:56 UTC
    /tmp>grep -C1 halt /etc/passwd shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/:/bin/false /tmp>ack -C1 halt /etc/passwd shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/:/bin/false /tmp>

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Yes...

      Karls-Mac-mini:Desktop karl$ grep -C1 4003 MyData.txt 4002_1#2 4003_1#3 4004_1#4

      ...but do you really want to qx this on a 4 GByte file? Regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

        but do you really want to qx this on a 4 GByte file?

        No. I would not use Perl at all just to call grep. My shell can start grep fine without needing Perl.

        4 GByte should be no problem for grep, at least not for GNU grep. Actually, I expect grep to be at least as fast as a perl script, and I expect it to use less memory. Simply because grep is optimized for exactly that job.

        By the way: grep has lots of other useful options, like showing line numbers and/or file names, again no need to write Perl code.

        A quite useful alternative to grep is ack. It shares many features with GNU grep, and does some things better. ack ignores files and directories you typically do not want to search by default, it uses Perl regexp syntax instead of "basic" or "extended" regexp syntax, and it has a configuration file.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      shh! don't give away the secrets
Re: How to print the lines immediately above and below a matching line?
by Kenosis (Priest) on Nov 25, 2012 at 18:37 UTC

    You mention wanting to print the lines above and below a matching line, but your code prints the matching line, too. In case you wanted to print all (two or) three lines, consider the following that you can adapt for files:

    use strict; use warnings; my ( $prevLine, $nextLine ); for ( ; ; ) { last if eof DATA; chomp( my $currLine = defined $nextLine ? $nextLine : <DATA> ); if ( $currLine =~ /match this/ ) { print '-' x 25, "\n"; chomp( $nextLine = <DATA> ) if !eof DATA; print $prevLine, "\n" if defined $prevLine; print $currLine, "\n"; print $nextLine, "\n" if defined $nextLine; print '-' x 25, "\n"; } else { undef $nextLine; } $prevLine = $currLine; } __DATA__ The first line match this Not this abcdefg The one above Another match this 1 Another match this 2 the one below match this 2 zxcvbnn Another match this blank above Second to the last line The last line match this

    Output:

    ------------------------- The first line match this Not this ------------------------- ------------------------- The one above Another match this 1 Another match this 2 ------------------------- ------------------------- Another match this 1 Another match this 2 the one below match this 2 ------------------------- ------------------------- Another match this 2 the one below match this 2 zxcvbnn ------------------------- ------------------------- Another match this blank above Second to the last line ------------------------- ------------------------- Second to the last line The last line match this -------------------------

    The dashes are printed to show the desired output. If the first or last line is a match, only two lines are printed. If you only want the lines above and below a matching line, delete print $currLine, "\n";.

    Hope this helps!

    Addition: If you want to avoid printing the same line more than once--like in the example above--and have output that more closely resembles grepping the file, you can do the following:

Re: How to print the lines immediately above and below a matching line?
by space_monk (Chaplain) on Nov 25, 2012 at 16:29 UTC
    TMTOWTDI answer. :-)

    Instead of reading the file line by line as suggested in other answers above, you could also read the entire file into a scalar and use a multi line regexp to do it. Look up use of the /m option on regexp matching.

    Any Monk who wishes to extend this thread with a complete answer using this method is more than welcome to do so (I'm a bit short of time)

    A Monk aims to give answers to those who have none, and to learn from those who know more.
      If at ever there will need to be printed most at then some three lines, read the whole file is waste of memory

        Waste of memory? Memory is there to be used ... what are you saving it for?

        Sure, for a long-lived application may want to be leery of using a large quantity of ram. But if it solves the immediate problem at hand, then using memory isn't really a problem. Also, memory is so large anymore that you need to work with *big* files if you're going to make memory usage a problem. The ordinary file isn't really going to be a problem.

        For example, here's a histogram of file sizes on a couple of my machines--my work laptop (LT0186) and my goofing off computer (Boink):

        files smallar thanLT0186Boink
        1 2852 122103
        10 701 5777
        25 3920 30988
        50 7793 31843
        100 4501 41932
        250 10112 128614
        500 14385 119923
        1k 31564 192614
        2.5k 40133 275173
        5k 33471 218245
        10k 34710 233223
        25k 27628 211316
        50k 14394 100595
        100k 12579 71556
        250k 9674 61003
        500k 4961 22754
        1M 3508 13800
        2.5M 2013 8325
        5M 852 4279
        10M 738 4586
        25M 365 1958
        50M 223 634
        100M 54 372
        250M 52 129
        500M 22 63
        1G 16 55
        2.5G 9 32
        5G 0 3
        10G 0 1

        I wouldn't take a second thought about just loading a file under 500M into RAM, and as you can see, I have *very few* files larger than that. And for a simple task like the one presented, I'd probably go ahead and try it on larger files (swap space permitting) and go take a break.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.