Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Getting the line numbers of a multi-line match

by Athanasius (Abbot)
on May 02, 2012 at 01:56 UTC ( #968350=perlquestion: print w/ replies, xml ) Need Help??
Athanasius has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I wrote this simple script ‘findinfile.pl’ to search for arbitrary text in a file:

use strict; use warnings; print "\n"; scalar @ARGV == 2 or die "USAGE: perl $0 <filename> <regex>\n"; open(my $fh, '<', $ARGV[0]) or die "Unable to open file '$ARGV[0]' for reading: $!"; my $match = 0; my $regex = qr/$ARGV[1]/; my @lines = <$fh>; # read whole file foreach (0 .. $#lines) { if ($lines[$_] =~ /$regex/) { printf "Match found on line %d\n", ($_ + 1); $match = 1; } } print "No matches found\n" unless $match;

Example use: to find the main() function (if any) in file ‘run.c’, enter (I’m using a Windows command prompt):

>perl findinfile.pl run.c "int\s+main\s*\("

This works well (except that it doesn’t allow for embedded comments), provided the regex matches on a single line of text. However, some programmers code like this:

int main(int argc, char** argv)

So, I can modify the script as follows:

... my $text; my $match = 0; my $regex = qr/$ARGV[1]/; { local $/; # enable "slurp" mode $text = <$fh>; # read whole file } while ($text =~ /$regex/gms) { print "Match found\n"; $match = 1; } print "No matches found\n" unless $match;

but now I’ve lost track of the line numbers.

I read somewhere that I could count occurrences of "\n" to calculate the line number of each match, but how would I identify the start- and end-points of each substring between successive matches? Or, is there a more straightforward approach that will retain line numbers while searching across multiple lines?

Thanks,

Athanasius <°(((><contra mundum

Comment on Getting the line numbers of a multi-line match
Select or Download Code
Replies are listed 'Best First'.
Re: Getting the line numbers of a multi-line match
by ikegami (Pope) on May 02, 2012 at 02:33 UTC

    $-[0]

    Simple approach:

    my $line_num = 1 + ( () = substr($text, 0, $-[0]) =~ /\n/g );
Re: Getting the line numbers of a multi-line match
by kcott (Abbot) on May 02, 2012 at 02:58 UTC

    Here's one way of doing it (Note the update at the end):

    #!/usr/bin/env perl use strict; use warnings; my $text; my $match = 0; my $regex = qr/int\s+main\s*\(/; { local $/; # enable "slurp" mode $text = <DATA>; # read whole file } while ($text =~ /$regex/gms) { print "Match found on line ", scalar ( split /\n/, substr $text, 0, pos($text) ) - scalar ( split /\n/, ${^MATCH} ) + 1, "\n"; $match = 1; } print "No matches found\n" unless $match; __DATA__ /* Dummy multi-main C code */ int main(int argc, char** argv) int main(int argc, char** argv) /* END */

    Output:

    $ pm_multi_line_match.pl Match found on line 3 Match found on line 6

    Update: As the arithmetic operators already provide a scalar context, the two calls to scalar() are redundant; so you could just write:

    print "Match found on line ", split(/\n/, substr $text, 0, pos $text) - split(/\n/, ${^M +ATCH}) + 1, "\n";

    -- Ken

Re: Getting the line numbers of a multi-line match
by jwkrahn (Monsignor) on May 02, 2012 at 05:34 UTC
    my @lines = <$fh>; # read whole file foreach (0 .. $#lines) { if ($lines[$_] =~ /$regex/) { printf "Match found on line %d\n", ($_ + 1); $match = 1; } }

    No need to read the whole file at once:

    while ( <$fh> ) { if ( /$regex/ ) { print "Match found on line $.\n"; $match = 1; } }


    This may solve your multi-line problem (UNTESTED):

    my $lines; while ( <$fh> ) { $lines .= $_; if ( $lines =~ /($regex)/sm ) { my $newlines = $1 =~ tr/\n//; if ( $newlines ) { print "Match found on lines ", $. - $newlines, " through $ +.\n"; } else { print "Match found on line $.\n"; } $match = 1; $lines = $_; } }
Re: Getting the line numbers of a multi-line match
by Athanasius (Abbot) on May 04, 2012 at 01:57 UTC

    In what other monastery could a newly-initiated monk receive same-day answers from a curate, a prior -- and a pope? :-)

    Thanks ikegami, kcott, and jwkrahn -- I’m learning a lot!

    Athanasius <°(((><contra mundum

      Don't get cause and effect confused. They're curates, priors, and popes because they answer a lot of questions well. :-)

      PS - I take this opportunity to welcome brother Athanasius, who in only a few days has shown himself to be a valuable addition to our community. I hope he stays around for a long time.

      I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://968350]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (9)
As of 2015-07-30 21:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls