Re: Iterator to parse multiline string with \\n terminator
by kcott (Archbishop) on Oct 06, 2013 at 06:34 UTC
|
G'day three18ti,
In the absence of seeing a context requiring anything more complex, I'd probably code something along these lines:
#!/usr/bin/env perl
use strict;
use warnings;
my $re = qr{^(.*)(?<![\\])[\\]\n$};
my $line = '';
while (<DATA>) {
if (/$re/) {
$line .= $1;
next;
}
$line .= $_;
print $line;
$line = '';
}
__DATA__
Line 1 Part A \
Line 1 Part B \
Line 1 Part C
Line 2 ALL
Line 3 Part X \
Line 3 Part Y
Line 4 END WITH BACKSLASH \\
Line 5 LAST Z
Output:
Line 1 Part A Line 1 Part B Line 1 Part C
Line 2 ALL
Line 3 Part X Line 3 Part Y
Line 4 END WITH BACKSLASH \\
Line 5 LAST Z
That code could easily be adapted for an iterator if one is required for your application.
If you're not familiar with negative look-behind assertions ((?<!pattern)),
they're documented under Look-Around Assertions in
"perlre: Extended Patterns".
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Neat! Thanks for the link.
I've been reading Higher Order Perl and was just reading the chapter on Lexers where MJD makes use of look-behind assertions. This actually helps make more sense of what I was reading.
What is the difference between next and redo in this context? A user below had a similar solution but used redo instead of next.
| [reply] [Watch: Dir/Any] |
|
The difference is that redo does not re-evaluate the loop condition (in this case: "(<DATA>)", which fetches the next line) before evaluating the loop body again, whereas next does.
This is why in jwkrahn's solution, the next line is fetched manually before calling redo:
$_ .= <$fh>;
The advantage of jwkrahn's solution with redo, is that the implicit variable $_ can be used to store the complete multiline record.
The advantage of kcott's solution with next, is that there is only one place where the <> operator for fetching the next line is used (inside the loop condition) - but re-evaluating the the loop condition also resets $_, so in this case a custom variable needs to be declared above the loop to store the current record. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Iterator to parse multiline string with \\n terminator
by Athanasius (Archbishop) on Oct 06, 2013 at 05:56 UTC
|
my $fh_iterator = sub
{
my $fh = shift;
my $line = $fh->getline();
$line .= $fh->getline() while $line =~ m{\\$};
return $line;
}
Update: Here is a tested script which eliminates the Use of uninitialized value warning reported in the post below:
use strict;
use warnings;
use IO::File;
my $filename = shift @ARGV;
my $fh = IO::File->new($filename, 'r');
sub fh_iterator
{
my $fh = shift;
my $line = $fh->getline();
if (defined $line)
{
$line .= $fh->getline() while $line =~ m{\\$};
}
return $line;
}
while (my $line = fh_iterator($fh))
{
print $line;
}
Output:
16:29 >perl 738_SoPW.pl test.file
foo \
bar \
baz
single line
16:29 >
Hope that helps,
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
EDIT: Oh, I'm an idiot... for got to change:
$line .= $fh_iterator while $line =~ m{\\$};
To:
$line .= $fh->getline while $line =~ m{\\$};
However, the code does return the error below on the last line of my test file...
Use of uninitialized value $line in pattern match (m//) at parser3.pl
+line 17, <GEN0> line 5.
Begin Original Post
Hmm... well, I get a new error at least:/p>
Use of uninitialized value $line in pattern match (m//) at parser3.pl
+line 17, <GEN0> line 5.
Here's the accompanying code and test file:
#/usr/bin/perl
use strict;
use warnings;
use IO::File;
use 5.010;
my $filename = shift @ARGV;
my $fh = IO::File->new($filename, 'r');
sub fh_iterator {
my $fh = shift;
my $line = $fh->getline;
$line .= fh_iterator($fh) while $line =~ m{\\$};
}
while (my $line = fh_iterator $fh ) {
print $line;
}
__END___
test.file
foo \
bar \
baz
single line
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
sub fh_iterator {
my $fh = shift;
my $line = $fh->getline();
return $line unless $line;
$line .= $fh->getline() while $line =~ m{\\$};
return $line;
}
I don't think there's any functional difference, but one may be more readable than the other...
Thanks for your help! | [reply] [Watch: Dir/Any] [d/l] |
|
return $line unless $line;
make that:
return $line unless defined $line;
theoretically the last line could be missing the newline and only contain "0". then $line would be false and ignored with your code.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
++tinita for highlighting the important difference between testing for definedness and testing for truth (see perlsyn#Truth-and-Falsehood).
But even with the correction I prefer my version. Readability is in the eye of the programmer, but the first high-level programming subject I took at Uni (in Pascal!) emphasised structured programming, and this has remained with me. I prefer a function to have a single exit point (at the end) where possible. In Perl this is not always optimum, so I’ve had to learn to be flexible. But when — as in this case — the structured version is as straightforward as the non-structured one, I prefer the former. YMMV.
As always in Perl, TMTOWTDI.
| [reply] [Watch: Dir/Any] |
Re: Iterator to parse multiline string with \\n terminator
by jwkrahn (Abbot) on Oct 06, 2013 at 08:26 UTC
|
open my $fh, '<', $filename or die "Cannot open '$filename' because: $
+!";
while ( <$fh> ) {
chomp;
if ( s/\\$// ) {
$_ .= <$fh>;
redo;
}
# now complete line in $_
}
| [reply] [Watch: Dir/Any] [d/l] |
|
What's the difference between using redo and using next as a similar example above does?
| [reply] [Watch: Dir/Any] |
|
while ( <$fh> ) {
And therefore reads the next line into $_ while redo goes back to the line after that leaving $_ unchanged.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Iterator to parse multiline string with \\n terminator
by CountZero (Bishop) on Oct 06, 2013 at 07:42 UTC
|
If your file is not terribly huge and/or you have enough memory, this is an alternative solution: use Modern::Perl;
my $file;
{
local $/ = '';
$file = <DATA>;
}
$file =~ s/\\\n/ /gs;
my @lines = split /\n/, $file;
say for @lines;
__DATA__
First line
Second line (part1)\
Second line (first continuation)\
Second line (second continuation)
Third line (part1)\
Third line (first continuation)\
Third line (second continuation)
Fourth line (part1)\
Fourth line (first continuation)\
Fourth line (second continuation)
Output:First line
Second line (part1) Second line (first continuation) Second line (seco
+nd continuation)
Third line (part1) Third line (first continuation) Third line (second
+continuation)
Fourth line (part1) Fourth line (first continuation) Fourth line (seco
+nd continuation)
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
| [reply] [Watch: Dir/Any] |
Re: Iterator to parse multiline string with \\n terminator
by Laurent_R (Canon) on Oct 06, 2013 at 08:58 UTC
|
Hi,
as a side note, your anonymous function does not really act as a closure:
my $fh = IO::File->new($filename, 'r');
my $fh_iterator = sub {
my $fh = shift;
my $line = $fh->getline;
}
while (my $line = $fh_iterator->($fh)) {
# do stuff to $line
}
because you are passing $fh each time to the sub (and you have to, you actually get a fresh copy of $fh each time the anonymous subroutine is called).
If you want it to act as a closure, you may do something like this (untested):
sub create_iterator{
my $filename = shift;
my $fh = IO::File->new($filename, 'r');
return sub {
my $line = $fh->getline;
}
}
my $fh_iterator = create_iterator($file_name);
while (my $line = $fh_iterator->()) {
# do stuff to $line
}
Now, $fh is really a persistent variable within the sub scope, this is a real closure. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
What exactly makes it not a closure? Is it that I'm passing a variable each time? If it was a new copy of $fh, wouldn't $fh_iterator->($fh) always return the same line (since it's creating a copy of the $fh object, on next passing it would be a copy of the original)?
Thanks for setting me straight, I always like learning new things.
| [reply] [Watch: Dir/Any] |
|
$fh is a file handler, i.e. it is actually an iterator on a file, so that each time you read from $fh, you get the next line. In your sub, your my $fh = shift; actually creates a new copy of $fh each time the sub is called. It still works because $fh "knows" which is the next line to read from the file. But your anonymous sub is not a closure; the alternative code I wrote is actually keeping its own copy of $fh, my anonymous sub actually closes on $fh. Please note that an anonymous function is not necessarily a closure, and a closure does not necessarily have to be anonymous (although is is often the case).
You might want to have a look to this: Closure on Closures.
| [reply] [Watch: Dir/Any] [d/l] |
|
Re: Iterator to parse multiline string with \\n terminator
by Lennotoecom (Pilgrim) on Oct 06, 2013 at 12:55 UTC
|
while(<DATA>){s/[\\\n]//g; $line.=$_;}
print $line;
__DATA__
fist line\
second line\
third line\
fourth
no? | [reply] [Watch: Dir/Any] [d/l] |
|
#! perl
use strict;
use warnings;
my $file = '';
while (<DATA>)
{
s{\\\n}{};
$file .= $_;
}
print $file;
__DATA__
first line
second line \
third line \
fourth line
fifth line
Output:
13:10 >perl 738a_SoPW.pl
first line
second line third line fourth line
fifth line
13:13 >
Note that the substitution operates on each line of input in turn, and a single input line can contain no more than one backslash-newline sequence. So, the /g modifier is not needed.
Hope that helps,
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
my brackets was just a typo
| [reply] [Watch: Dir/Any] |