http://www.perlmonks.org?node_id=1057125


in reply to Iterator to parse multiline string with \\n terminator

G'day three18ti,

In the absence of seeing a context requiring anything more complex, I'd probably code something along these lines:

#!/usr/bin/env perl use strict; use warnings; my $re = qr{^(.*)(?<![\\])[\\]\n$}; my $line = ''; while (<DATA>) { if (/$re/) { $line .= $1; next; } $line .= $_; print $line; $line = ''; } __DATA__ Line 1 Part A \ Line 1 Part B \ Line 1 Part C Line 2 ALL Line 3 Part X \ Line 3 Part Y Line 4 END WITH BACKSLASH \\ Line 5 LAST Z

Output:

Line 1 Part A Line 1 Part B Line 1 Part C Line 2 ALL Line 3 Part X Line 3 Part Y Line 4 END WITH BACKSLASH \\ Line 5 LAST Z

That code could easily be adapted for an iterator if one is required for your application.

If you're not familiar with negative look-behind assertions ((?<!pattern)), they're documented under Look-Around Assertions in "perlre: Extended Patterns".

-- Ken

Replies are listed 'Best First'.
Re^2: Iterator to parse multiline string with \\n terminator
by three18ti (Monk) on Oct 06, 2013 at 09:16 UTC

    Neat! Thanks for the link.

    I've been reading Higher Order Perl and was just reading the chapter on Lexers where MJD makes use of look-behind assertions. This actually helps make more sense of what I was reading.

    What is the difference between next and redo in this context? A user below had a similar solution but used redo instead of next.

      The difference is that redo does not re-evaluate the loop condition (in this case: "(<DATA>)", which fetches the next line) before evaluating the loop body again, whereas next does.

      This is why in jwkrahn's solution, the next line is fetched manually before calling redo:

      $_ .= <$fh>;

      The advantage of jwkrahn's solution with redo, is that the implicit variable $_ can be used to store the complete multiline record.

      The advantage of kcott's solution with next, is that there is only one place where the <> operator for fetching the next line is used (inside the loop condition) - but re-evaluating the the loop condition also resets $_, so in this case a custom variable needs to be declared above the loop to store the current record.