http://www.perlmonks.org?node_id=293998

vinforget has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, when trying to run this code snippet:
my $str = join ('',my @temp = <>); while ($str =~ /^\#:lav\n(^(?!\#:eof\n).*\n)+/gm){ my ($val) = $&; print $val; }
I get different behavior depending on what system I run it on. It runs fine on FreeBSD (4.5) perl 5.6.0, but fails on Gentoo, Perl5.8. Also, the file being fed through the script has been encoded (using uuencode or mimencode or mpack.. I tried them all) and sent to an email recipient on the BSD box. Bascially I have a script on the BSD box I want to migrate to Gentoo. This script takes a text file from an email attachment (previously encoded to compensate for Windoze ^M) and processes it. But whe I try to run the script on the Gentoo box I get a seg. fault. I think the problem is in the encoding so this may not be the forum to ask this question. If so, I apologize in advance. Thank you. Vince

Replies are listed 'Best First'.
Re: weird perl verion and/or OS problem with newlines
by graff (Chancellor) on Sep 25, 2003 at 03:28 UTC
    If you say that snippet works on BSD, I'll take your word for it, but it seems pretty far from optimal. You read an entire file into a @temp array, but never actually use that array elsewhere -- you just concatenate all the lines into a scalar. Then you use a looping regex where you could use index() and substr(), and on top of that, you use the infamous  $& instead of the more efficient paren-capture and "$1".

    And you tried this same snippet with the exact same data on a Gentoo box and got a seg fault? It's known that 5.8.0 has some minor "issues" with a handful of particular/arcane regex conditions, because its expanded support for unicode has "broken new ground" (so to speak); even though you have no hint of utf8-like data here, maybe trying a different (and more efficient) approach would get you past the roadblock.

    As for newline behavior, if your snippet actually works as intended on BSD, there should be no difference on Gentoo that involves just line-termination patterns (unless the data on the Gentoo box differs from the BSD data in this regard -- in which case, try fixing the data first because that would be easy).

    Anyway, something like this might be worth a try (assuming the text really has unix-style line termination) -- I haven't tested it:

    { # create a scoping block for slurp-mode reading local $/ = undef; my $str = <>; my $bgnstr = "#:lav\n"; my $bgnlen = length( $bgn ); my $endstr = "#:eof\n"; while ( my $startpos = index( $str, $bgnstr ) >= 0 ) { $startpos += $bgnlen; my $stoppos = index( $str, $endstr, $startpos ); $stoppos = length( $str ) if ( $stoppos < 0 ); my $val = substr( $str, $startpos, $stoppos - $startpos ); print $val; $str = substr( $str, $stoppos ); } }
    (update: fixed the ">=" operator in the while condition)
      Thanks for the quick response. I think a little more explanation is in order on my part. I have a file with multiple concatenated sections that begin with #:lav and end with #:eof and I want to process each of these one at a time e.g.
      --start of file-- #:lav some text #:eof #:lav some text #:eof #:lav some text #:eof --end of file--
      This is why I used the while loop with the concatenated string... so I can do some processing of the subsections that I find that satisfy the above criteria. Writing a while loop to process the code line by line would be more efficient in memory, but more time in development. Vince

        <IN>; # skip "--start of file--" while( <IN> ) { if( $_ eq "#:lav\n" ) { local $/= "#:eof\n"; my $block= <IN>; # ... process $block here ... } elsif( $_ eq /^--end of file--/ ) { warn "Unexpected line: $_"; } }

        Updated: Thanks to graff for noting that I wrote $\ when I meant $/.

                        - tye
Re: weird perl verion and/or OS problem with newlines
by asarih (Hermit) on Sep 24, 2003 at 20:51 UTC
    How big are these attachments? It looks like you're putting the entire content of <STDIN> to $str, and this might be causing the seg faults. I think the following (untested) code does what the code does, without consuming too much memory. Comments welcome.
    while (<>) { my $val=''; if (/^\#:lav/) { $_.=<>; $val.="$_\n"; while (/^#:eof/m) { $val.="$_\n"; $_.=<>; } } print $val; }
    Update: fixed the bonehead error of using next inside the loops.