Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

multi-line parsing

by amit223 (Initiate)
on Mar 31, 2009 at 14:16 UTC ( #754491=perlquestion: print w/replies, xml ) Need Help??
amit223 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I am parsing through a bunch of files where I looking for a number occurring after a specific two lines of text and the I need to grab a number and save it to an array. .

open (INF, "$file"); while (<INF>){ if(/First line of text[\n]second line of text[\n]in the third line + I have (\d*\.\d*)/) { push(@array,$1); } }

But it is not working, I tried using m after the string for multi-line parsing.

Can you suggest anything?? Thank you, Appreciate any help.

Replies are listed 'Best First'.
Re: multi-line parsing
by ELISHEVA (Prior) on Mar 31, 2009 at 14:31 UTC

    Your multi-line regex is failing because you are comparing against a single line, not multiple lines. By default, Perl treats a new line as the record delimiter and so each time you loop you read in only one line, i.e. just "First line of text\n" or just "Second line of text\n" and so on - so your regex never matches.

    For your above code to work, you will need to either (a) choose a different record delimiter by setting the $/ variable (see the section on $INPUT_RECORD_SEPARATOR in perlvar) or (b) by defining a variable to store and concatenate the lines you read in. Then you can compare your regular expression against that variable.

    Does your data have a record delimiter other than a newline? Perhaps you could post a sample of the data you are tring to parse?

    Best, beth

Re: multi-line parsing
by Bloodnok (Vicar) on Mar 31, 2009 at 14:32 UTC
    Try using the range operator - something like i.e. untested...
    use warnings; use strict; use autodie; open INF, "<$file"; while (<INF>){ if(/First line of text$/ ... /in the third line I have (\d*\.\d*)$ +/) { push @array, $1 if $1; } }
    Note the use of both strictures and autodie - the latter causing the snippet to die if open() fails.

    A user level that continues to overstate my experience :-))

      Thank you one and all.

      Bloodnok, I have a small issue there:

      How do I get away if I see a number in the second line, but would like to grab the number in the line3. Sorry, should have been clear

      Thank you again

Re: multi-line parsing
by jethro (Monsignor) on Mar 31, 2009 at 14:47 UTC

    Your problem is that you are reading the file line by line but expect three lines in your search pattern

    There are a lot of ways you could do this, one would be to use a state machine. It uses a variable that notes which state it is in and depending on the next line switches to an appropriate state.

    In this case state 1 means "I'm at the line after the first line I'm looking for" and state 2 means "I'm after the second line which followed the first line, expecting the number now"

    #!/usr/bin/perl use strict; use warnings; my @array=(); open (INF, "$file"); my $state= 0; while (<INF>){ if ($state==0) { if (/^First line of text$/) { $state=1; } next; } if ($state==1) { if (/^second line of text$/) { $state=2; } elsif (/^First line of text$/) { $state=1; } else { $state=0; } next; } if ($state==2) { if (/^in the third line I have (\d*\.\d*)/) { push(@array,$1); $state=0; } elsif (/^First line of text$/) { $state=1; } else { $state=0; } next; } }

    Note I used ^ and $ to denote line begin and end in the patterns so that the full lines must correspond to the pattern, not only a part of a line. Also used strict and warnings.

Re: multi-line parsing
by CountZero (Bishop) on Mar 31, 2009 at 18:17 UTC
    Another solution, this one without any loops:
    use strict; use warnings; my $data; { local $/ = undef; $data = <DATA>; } # my @results = $data =~ m/First Line\nSecond Line\n(\d+)\n/mg; # The m modifier was not really necessary here my @results = $data =~ m/First Line\nSecond Line\n(\d+)\n/g; print join "\n", @results; __DATA__ First Line Second Line 123 First Line Second Line 456 First Line Second Line 789 First Line Second Line 101112 First Line Second Line 131415
    Output:
    123 456 789 101112 131415

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: multi-line parsing
by Nkuvu (Priest) on Mar 31, 2009 at 14:40 UTC

    The while (<INF>) only reads one line at a time (where a "line" is defined by the input record separator, $/ which is a newline by default). Since you're only pulling in one line at a time, you'll never match across multiple lines. You need to reset the record separator. For example:

    { # Set the record separator to match the data. # In this example I have a blank line between # the lines I want to match, so set $/ to two # newlines. # Note that I'm setting $/ in its own data block (the empty # brackets) to localize the record separator change. $/ = "\n\n"; while (<DATA>) { if(/First line of text[\n]second line of text[\n]in the third +line I have (\d*\.\d*)/s) { print "Captured ($1)\n"; } } } __DATA__ First line of text second line of text in the third line I have 3.14159 Here is another line of text second line of text third line has 1.41421356 First line of text second line of text in the third line I have 1.73205081

    Note that due to the regex, only the first and third numbers will be output.

    You could also set the record separator to "First line of text", but keep in mind that you'll need to remove that bit from the regex:

    $/ = "First line of text"; while (<DATA>) { if(/[\n]second line of text[\n]in the third line I have (\d*\. +\d*)/s) { print "Captured ($1)\n"; } }

    Added: You could also undef the separator and match globally:

    { $/ = undef; my $data = <DATA>; my @array = $data =~ /First line of text[\n]second line of text[\n +]in the third line I have (\d*\.\d*)/g; for (@array) { print "Captured $_\n"; } }

      You could also undef the separator and match globally ...
      Be aware that the code given will globally alter the $/ package variable (see perlvar).

      A more 'idiomatic' way to slurp the contents of a file without globally changing  $/ (if you do not use the File::Slurp module or an equivalent) is with the statement

      my $file_contents = do { local $/; <$file_handle> };
      or
      my $file_contents = do { local $/; <FILEHANDLE> };

        That's what I get for writing up a note just before lunch. I missed the local (but normally include that when altering the record separator). Thanks for the correction.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://754491]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2018-09-23 14:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Eventually, "covfefe" will come to mean:













    Results (191 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 18:53 UTC) Welcome new users!