Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re: multi-line parsing

by Nkuvu (Priest)
on Mar 31, 2009 at 18:40 UTC ( #754499=note: print w/replies, xml ) Need Help??

in reply to multi-line parsing

The while (<INF>) only reads one line at a time (where a "line" is defined by the input record separator, $/ which is a newline by default). Since you're only pulling in one line at a time, you'll never match across multiple lines. You need to reset the record separator. For example:

{ # Set the record separator to match the data. # In this example I have a blank line between # the lines I want to match, so set $/ to two # newlines. # Note that I'm setting $/ in its own data block (the empty # brackets) to localize the record separator change. $/ = "\n\n"; while (<DATA>) { if(/First line of text[\n]second line of text[\n]in the third +line I have (\d*\.\d*)/s) { print "Captured ($1)\n"; } } } __DATA__ First line of text second line of text in the third line I have 3.14159 Here is another line of text second line of text third line has 1.41421356 First line of text second line of text in the third line I have 1.73205081

Note that due to the regex, only the first and third numbers will be output.

You could also set the record separator to "First line of text", but keep in mind that you'll need to remove that bit from the regex:

$/ = "First line of text"; while (<DATA>) { if(/[\n]second line of text[\n]in the third line I have (\d*\. +\d*)/s) { print "Captured ($1)\n"; } }

Added: You could also undef the separator and match globally:

{ $/ = undef; my $data = <DATA>; my @array = $data =~ /First line of text[\n]second line of text[\n +]in the third line I have (\d*\.\d*)/g; for (@array) { print "Captured $_\n"; } }

Replies are listed 'Best First'.
Re^2: multi-line parsing
by AnomalousMonk (Chancellor) on Mar 31, 2009 at 21:34 UTC
    You could also undef the separator and match globally ...
    Be aware that the code given will globally alter the $/ package variable (see perlvar).

    A more 'idiomatic' way to slurp the contents of a file without globally changing  $/ (if you do not use the File::Slurp module or an equivalent) is with the statement

    my $file_contents = do { local $/; <$file_handle> };
    my $file_contents = do { local $/; <FILEHANDLE> };

      That's what I get for writing up a note just before lunch. I missed the local (but normally include that when altering the record separator). Thanks for the correction.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://754499]
[Corion]: Meh. My bank removed (part of) their CSV download facilities. Now I will either have to implement a full scraper or automate the download using the HBCI interface instead (or just get a new account elsewhere...)
[Corion]: On the upside, I spend a lot of time thinking this weekend about how to actually implement rate limiting for futures, and if things work out, maybe even loading a configuration from an external file makes sense
[Corion]: I've also found some interesting invariants that I have to think/write about more. A simple rate limiter will never change the order of the input, while a limiter that allows for parallel execution will change the order. But my API currently allows for bo
[Corion]: ... for both, and I'm not sure if I want to add the cruft from the parallel API (a token that you need to hold on to while you hold the lock) to the rate limiting API too, to allow seamless up/downgrades, or not.
[Corion]: Also, rate limiting will look great with await: my $token = await $limiter-> limit($hostname); instead of my $f = $limiter->limit( $hostname )->then(sub { my( $token)=@_; ... });

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (9)
As of 2017-10-23 08:23 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (277 votes). Check out past polls.