Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Reading a text file collapsing line continuations

by skx (Parson)
on Mar 09, 2009 at 15:37 UTC ( [id://749322]=perlquestion: print w/replies, xml ) Need Help??

skx has asked for the wisdom of the Perl Monks concerning the following question:

I've written some code for parsing input files which is working happily on one machine, but failing on another.

Although not necessary I've added support for the input to be broken across multiple lines - via "\" character at the end of line. The following is a cut-down version of the code (and identical to recipe 8.1 from the perl cookbook):

#!/usr/bin/perl -w use strict; use warnings; while (defined(my $line = <DATA>) ) { chomp $line; # line 7 if ($line =~ s/\\$//) # line 8 { $line .= <DATA>; redo unless eof(DATA); } print "LINE: '$line'\n"; } __DATA__ 1 1 2 2 3 3 4 4 \ 4 4 \ 4 4 5 5

On one machine that works perfectly:

skx@gold:~$ ./bug.pl LINE: '1 1' LINE: '2 2' LINE: '3 3' LINE: '4 4 4 4 4 4' LINE: '5 5' LINE: '' skx@gold:~$ perl -v|grep perl, This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi

On another it gives errors:

steve@skx:~$ ./bug.pl LINE: '1 1' LINE: '2 2' LINE: '3 3' Use of uninitialized value in scalar chomp at ./bug.pl line 7, <DATA> +line 5. Use of uninitialized value in substitution (s///) at ./bug.pl line 8, +<DATA> line 5. Use of uninitialized value in concatenation (.) or string at ./bug.pl +line 13, <DATA> line 5. LINE: '' LINE: ' 4 4' LINE: '5 5' LINE: '' steve@skx:~$ perl -v | grep perl, This is perl, v5.8.8 built for i486-linux-gnu-thread-multi

I would expect that the while loop wouldn't run if there was an undefined line - so I'm unclear on why the chomp is giving the uninitialized variable warning.

For the moment I've just reworked my input into non-split lines, but I'm confused as to what I've done wrong!

Steve
--

Replies are listed 'Best First'.
Re: Reading a text file collapsing line continuations
by kennethk (Abbot) on Mar 09, 2009 at 15:57 UTC

    I can repeat your issue with my Linux box (and get good behavior w/ AS 5.8.8 under Windows). Based upon what I see in my debugger, you are having a scoping issue. Essentially, every iteration of your while loop reinitializes $line as per your my statement. My guess is they modified the scoping behavior in 5.10.0, which fixes this particular issue. You can avoid the problem with:

    #!/usr/bin/perl -w use strict; use warnings; my $line; while (defined($line = <DATA>) ) { chomp $line; # line 7 if ($line =~ s/\\$//) # line 8 { $line .= <DATA>; redo unless eof(DATA); } print "LINE: '$line'\n"; } __DATA__ 1 1 2 2 3 3 4 4 \ 4 4 \ 4 4 5 5

    Update:Regarding your comment

    I would expect that the while loop wouldn't run if there was an undefined line - so I'm unclear on why the chomp is giving the uninitialized variable warning.
    you should check out the documentation to see why the original developer used a redo in place of a next.
      I doubt if it is a scoping issue. The script works as expected on my Windows XP Pro ActiveState Perl v5.8.8 [MSWin32-x86-multi-thread]. The OP has a problem with the script under Linux Perl v5.8.8 but no problem under Linux Perl v5.10.0.

      If you are right, that means that the scoping rules would be different between Windows and Linux? Seems strange to me.

      But I see the same output (but not the warnings!) as the OP if I replace the redo with next and that is as expected.

      The difference between redo and next being that redo does not re-evaluate the condition, whereas next of course does. In other words, with redo you do not run the defined(my $line = <DATA>) again, so the accumulation of the "broken" lines work and hence it cannot be a scoping issue since you never leave, nor re-enter the scope.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        That my Linux box (v5.8.8 built for x86_64-linux-gnu-thread-multi under Ubuntu) replicates the issue and my Windows box (v5.8.9 built for MSWin32-x86-multi-thread, I forgot I upgraded) does not implies something differs between the implementations. Running the Linux variety through my debugger shows $line to be undef on line 7 on the continuation case, whereas it maintains continuity under AS. Since we are dealing with a variable which is scoped at the loop, I was reading it as the variable was getting a new instance upon redo even though the my was not being re-evaluated. This is supported by the following code, which prints an infinite series of 1's on my Windows box and prints one 1 and then an infinite series of "Use of uninitialized value in print at fluff.pl line 6, <DATA> line 1." on my Linux box:

        #!/usr/bin/perl -w use strict; use warnings; while (defined(my $line = <DATA>) ) { print $line; redo; } __DATA__ 1

        Update: Better example code (I think). Prints 210 under Windows and 2 under Linux:

        #!/usr/bin/perl use strict; use warnings; my $j = 2; while ( my $i = $j ) { print $i if defined $i; last unless $i--; redo; }

        Update 2: Filed a bug report, id [perl #63752]

      Thanks for the tip, and for the solution.

      The reason is obvious now you point it out, but I managed to fail to spot it myself so ++.

      Steve
      --
Re: Reading a text file collapsing line continuations
by repellent (Priest) on Mar 09, 2009 at 17:53 UTC
    If all you care about is printing it out, you don't have to accumulate the lines before printing. Just print out the partial lines (without line endings).

    If the input to your existing code may be broken up in multiple lines via backslash \, just filter the input first before it gets sucked in by your existing code:
    perl -pe 's=\s*\\\s*$/= ='

      Thanks for the suggestion - in my case I actually want to parse & process the lines. This example only prints them out to make a complete program which demonstrates the problem.

      Steve
      --
        I usually decouple the code that joins lines from the actual program which parses the lines. Hence, the program makes the assumption that lines are not broken, and thus, is able to concentrate on its sole purpose.

        This works out cleaner for me because I find that many programs require that lines be joined in the same way. Rather than duplicating code in many programs, I place the join-lines filter up front. It's also easier if the programs read from STDIN, so filters could be piped along in the *NIX environment.
Re: Reading a text file collapsing line continuations
by Bloodnok (Vicar) on Mar 09, 2009 at 16:16 UTC
    Why not accumulate the lines into an array and then chomp the array ?

    A user level that continues to overstate my experience :-))
      If you have a really huge file, your solution needs a lot of memory.

      Furthermore you will still need to loop over all the data to re-assemble the broken lines, so your code will not be much simpler.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Obviously, my intentions weren't as clear as they obviously could have been - my proposal was to maintain the broken line processing, merely accumulating assimilated lines into an array which ....... Oh bugger!! your point hoves quite clearly into view - just off to see if there's a cure for the write before you think malady - with which I have, self-evidently, been stricken.

        A user level that continues to overstate my experience :-))

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://749322]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-03-29 06:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found