http://www.perlmonks.org?node_id=552487

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How do I process lines in a file from the point where a line contains a certain word ?
I have a file that I wish to process. e.g.
a line another line internal name another line a further line need this another line need this
I want to start processing the file where the line begins with "internal name". After this I only want to process the lines that have "need this" in them until end of file. Where "need this" always occurs at the beginning of the line.
I can't use seek because I don't know how many lines will be in the file and at what position the "internal name" first appears.
I tried doing something like this but it doesn't work
open DATA,filename; while (<DATA>) { chomp; next unless (/^internal name/); next unless (/^need this/); do something with data } close DATA;
This doesn't work because each time round the loop it searches for "internal name".
These are variable length records per line and the file is a variable number of lines long.
Any help appreciated

Replies are listed 'Best First'.
Re: seek and process from there on
by blazar (Canon) on May 30, 2006 at 13:38 UTC
    while (<$data>) { last if /^internal name/; } while (<$data>) { # do what you want! }

    PS: there's a predefined DATA filehandle. Do not clobber it. In any case I recommend you to use lexical handles. And to do error checks of course...

Re: seek and process from there on
by davorg (Chancellor) on May 30, 2006 at 14:07 UTC

    Sounds like a perfect use for the flip-flop operator.

    use strict; use warnings; while (<DATA>) { next unless /^internal name/ .. 0; next unless /^need this/; # your code here... print $_; } __DATA__ a line need this (not really) another line internal name another line a further line need this another line need this

    Note that by using a zero as the righthand operand, we get a flip-flop that turns on but then never turns off.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: seek and process from there on
by holli (Abbot) on May 30, 2006 at 13:58 UTC
    You simply need a flag:
    my $hot; while (<DATA>) { $hot=1 if /^internal name/; if ( /^need this/ && $hot) { print; } }


    holli, /regexed monk/
Re: seek and process from there on
by Zaxo (Archbishop) on May 30, 2006 at 14:15 UTC

    Another way, with the scalar range operator,

    while (<$fh>) { next unless /^internal name/..0 and /^need this/; print "Got one in line ${.}\n"; }
    Scalar range, once true, stays true until its right argument goes true.

    Update: Fixed thinko on when the op switches off.

    After Compline,
    Zaxo

      Thanks for all the answers, I just wish I knew which one was the best option !
        There are really only two ways of doing this. One is to use two loops. In the first loop, you look for "internal name", then go to your other loop to look for "need this". Becareful to account for not finding "internal name":
        open (MY_DATA, "$someFile") or die qq(aauugh!); while (<MY_DATA>) { next unless (/^internal name/); } while (<MY_DATA>) { next unless (/^need this/); <do something> }
        The above was written mainly for clarity. You could have used a for loop for the first loop which some people will argue is better because the test is in the outside of the loop instead of being buried in the loop itself.

        I also didn't test in the second loop to see if I actually found "internal name" (like I said should be done). You could argue that it isn't necessary since the second loop won't execute if the first loop doesn't find "internal name" anyway.

        However, there may be a difference between not finding "internal name" and finding "internal name" but not finding "need this". I would probably set a flag in the first loop:

        open (MY_DATA, "$someFile") or die qq(aauugh!); my $internalNameFlag = 0; while (<MY_DATA>) { next unless (/^internal name/); internalNameFlag = 1; } while ((<MY_DATA>) and ($internalNameFlag)) { next unless (/^need this/); <do something> }
        Of course, if you're setting flags, you might want to do the second solution. It uses only a single loop and uses a flag to see whether you're looking for "internal name" or "need this".
        my $internalNameFlag = 0; while (<DATA>) { if ($internalNameFlag) { if (/^need this/) { <Do something> } } else { #Internal Name Not Found Yet if (/^internal name/) { $internalNameFlag = 1; } } }
        I personally like solution #1 because I think it flows better. You're looking for "internal name". Then, when you find it, you look for "needs this".

        Others will prefer solution #2 because they like the fact that it's a single loop and not two loops. Many people feel that there should only be a single processing loop per "open".

        All of the replies to this message were really just different takes on one of these two methods. My recommendation is to do which of these two solutions makes the most sense for you, and to keep the code simple and easy for you and your coworkers to follow. Remember that someone down the line is going to have to maintain your code.

        Thanks for all the answers, I just wish I knew which one was the best option !

        I just wish I knew on which criteria you would consider a solution to be the best, and I may possibly give you more helpful an answer!

Re: seek and process from there on
by rminner (Chaplain) on May 30, 2006 at 13:52 UTC
    my $file = shift @ARGV; if (open my $FH , $file) { my $start_pattern_found = 0; while (my $line = <$FH>) { if ($start_pattern_found and $line =~ m{^need\s+this}) { print "found: $line"; } elsif ($line =~ m{^internal\s+name}) { $start_pattern_found++; } } close $FH; } else { die "failed to open '$file' ($!)\n"; }
      my $file = shift @ARGV; if (open my $FH , $file) { my $start_pattern_found = 0; while (my $line = <$FH>) {

      It's a side note, but I can't understand why you do this. You just want to show the use of a flag, which is a good point. Since the example may very well stay minimal, why not using good old <> instead?

      In any case, using the range operator in scalar context may be much easier than maintaining a flag yourself:

      while (<>) { chomp; next if 1 .. /^internal name/; print "Found pattern at line $.\n" if /^need this/; }
Re: seek and process from there on
by Tobin Cataldo (Monk) on May 30, 2006 at 13:37 UTC
    You could try something using the:
    $` ($PREMATCH) and $´ ($POSTMATCH)
    operators.

    Then use your while structure looking for 'need this' running through the contents of the postmatch.
      You could try something using the:
      $` ($PREMATCH) and $´ ($POSTMATCH)
      operators.

      So he should slurp in the whole file all at once into a string and apply a match to it. Then seems like you're suggesting he will have to

      open my $postmatch, '<', \$' or die $!;

      and search while(<$postmatch>) (or to use some trick along these lines). Very, very smart indeed...

Re: seek and process from there on
by zakame (Pilgrim) on May 30, 2006 at 15:56 UTC
    Perhaps you wanted something like this:
    open STUFF, "<", $file or die "Can't open $file: $!"; while (<STUFF>) { chomp; $seen{internal_name}++, next if /^internal name/; if ($seen{internal_name}) { if (/^need this/) { # do your stuff here } } }
    At the start, nothing is done until /^internal name/ is seen. If we see one, we just take note of that in a hash key, which we test later on when we get to do the real work. Update: I suddenly remembered there was a predefined DATA filehandle...
Re: seek and process from there on
by leocharre (Priest) on May 30, 2006 at 16:02 UTC

    Is this a mammoth file? How does this file grow? Is it appended or perhaps.. prepended to? Is this a log?

    If this was a file that were going to grow, and some code creates or manages this file; .. maybe I would store where that place is, the line num, for later use .. elsewhere. That is.. maybe you could keep metadata<.i> about where the place you want is. like..

    Maybe your file is in a named textfile.log, and it's the only file in a dir.. You could touch the line number in that dir.. so you'd create an empty file named (for example) ./3456, then you can do an ls -I textfile.log, what you get back would be the line num of where to open textfile.log
    Depending on what you're doing, this could prove slower- and then maybe faster.