Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Finding Line numbers in a file

by sanPerl (Friar)
on Apr 04, 2007 at 14:02 UTC ( #608280=perlquestion: print w/replies, xml ) Need Help??
sanPerl has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I am developing a QC tool, which will scan text files and will create a log report. This seems to be a simple task with the help of Regex. However the user wants log file to tell the line numbers, where the doubtful occurance is present. For e.g. The user dosen't want the word "Cell" in the text file, then the script should tell at what line the Word "Cell" is preset. (This is a simple case, But I needed to use complex Regex to get doubtful occurances). for e.g. if $ctfilebuf contains all the data taken from a file
while ($ctfilebuf =~ m/Cell/ig) { print "Word \"Cell\" found at line number ......\n"; }
Here How can I print Line Number? Is there any simple way?

Replies are listed 'Best First'.
Re: Finding Line numbers in a file
by davidrw (Prior) on Apr 04, 2007 at 14:12 UTC
    do you have to have the whole file in $ctfilebuf? It will be a lot easier if you simply read/scan it line by line .. you can keep a counter of the line numbers (or use the $. variable -- see perlvar)
    Also, grep -n Cell log.txt from the commandline accomplishes this as well
Re: Finding Line numbers in a file
by ferreira (Chaplain) on Apr 04, 2007 at 14:13 UTC

    You're looking for the special variable $. (or its long-form variant, $INPUT_LINE_NUMBER, available when you use English). See perlvar.

    If you use IO::Handle, you will be able to use $io->input_line_number (on IO objects) or input_line_number( H ) (on filehandles).

Re: Finding Line numbers in a file
by ahmad (Hermit) on Apr 04, 2007 at 14:38 UTC

    you can make your own counter something like

    my $counter; while ( <$fh> ) { $counter++; print "Line number is : $counter\n"; }

    or you could just use $. which detects line number automatically when reading from file handle

    while ( <$fh> ) { print "Line number is :" . $. . "\n"; }


Re: Finding Line numbers in a file
by kyle (Abbot) on Apr 04, 2007 at 14:52 UTC

    According to my Camel, "each time a pattern successfully matches (including the pattern in a substitution), it sets the $`, $&, and $' variables to the text left of the match, the whole match, and the text right of the match."

    That sounds useful.

    my $text = <<'END_OF_TEXT'; line 1 apple banana line 2 line cherry 3 END_OF_TEXT ; while ( $text =~ m/(apple|banana|cherry)/ig ) { my $word = $1; my $prelines = ( $` =~ tr/\n// ); printf qq{Word "%s" found on line %d\n}, $word, $prelines + 1; } __END__ Word "apple" found on line 1 Word "banana" found on line 2 Word "cherry" found on line 3

    If you use English, the $` variable is called $PREMATCH (see perlvar, which notes that using this variable "imposes a considerable performance penalty on all regular expression matches").

      Nice, but inefficient, and gets worse the bigger the text file is.

      Do not do this, use the others, they increase in a linear proportion with the size of the text file, and do not require entire file to be loaded into memory.

      my name's not Keith, and I'm not reasonable.
        You are possibly right. You are just as possibly wrong. There are several things that we don't know, such as:
        • Average line length. Shorter lines means more lowlevel iterations.
        • Average file length. Longer files will require more memory - but that is about all.
        • Average hit count. How often is the string found in the file.
        • Average hit placement. How often does the string end up at the beginning or the end.
        • Implementation issues. Is the string passed in already in one chunk or do we have access to a file handle.
        There are just too many unknowns to use blanket statements as to which algorithm is best.

        But one thing that is a major issue is that the special regex capture variables shouldn't be used. They impose too much penalty. Instead though you can use @- and @+ which have no penalty. As in the following:

        my $str = "1 one 2 two 3 one 4 four 5 one 6 five"; my $last_pos = 0; my $newlines = 1; while ($str =~ /(one)/g) { $newlines += substr($str, $last_pos, $-[0] - $last_pos) =~ tr/\n// +; $last_pos = $-[0]; print "Found on line $newlines\n"; } # prints # Found on line 1 # Found on line 3 # Found on line 5

        Notice the optimization that only counts newlines from the previous match.

        my @a=qw(random brilliant braindead); print $a[rand(@a)];
        Dear kyle and reasonablekeith,
        Thanks for suggestion and warning also. This is making me think in new directions.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://608280]
Approved by kyle
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2018-05-23 05:54 GMT
Find Nodes?
    Voting Booth?