in reply to weird perl verion and/or OS problem with newlines

If you say that snippet works on BSD, I'll take your word for it, but it seems pretty far from optimal. You read an entire file into a @temp array, but never actually use that array elsewhere -- you just concatenate all the lines into a scalar. Then you use a looping regex where you could use index() and substr(), and on top of that, you use the infamous  $& instead of the more efficient paren-capture and "$1".

And you tried this same snippet with the exact same data on a Gentoo box and got a seg fault? It's known that 5.8.0 has some minor "issues" with a handful of particular/arcane regex conditions, because its expanded support for unicode has "broken new ground" (so to speak); even though you have no hint of utf8-like data here, maybe trying a different (and more efficient) approach would get you past the roadblock.

As for newline behavior, if your snippet actually works as intended on BSD, there should be no difference on Gentoo that involves just line-termination patterns (unless the data on the Gentoo box differs from the BSD data in this regard -- in which case, try fixing the data first because that would be easy).

Anyway, something like this might be worth a try (assuming the text really has unix-style line termination) -- I haven't tested it:

{ # create a scoping block for slurp-mode reading local $/ = undef; my $str = <>; my $bgnstr = "#:lav\n"; my $bgnlen = length( $bgn ); my $endstr = "#:eof\n"; while ( my $startpos = index( $str, $bgnstr ) >= 0 ) { $startpos += $bgnlen; my $stoppos = index( $str, $endstr, $startpos ); $stoppos = length( $str ) if ( $stoppos < 0 ); my $val = substr( $str, $startpos, $stoppos - $startpos ); print $val; $str = substr( $str, $stoppos ); } }
(update: fixed the ">=" operator in the while condition)

Replies are listed 'Best First'.
Re: Re: weird perl verion and/or OS problem with newlines
by vinforget (Beadle) on Sep 25, 2003 at 15:20 UTC
    Thanks for the quick response. I think a little more explanation is in order on my part. I have a file with multiple concatenated sections that begin with #:lav and end with #:eof and I want to process each of these one at a time e.g.
    --start of file-- #:lav some text #:eof #:lav some text #:eof #:lav some text #:eof --end of file--
    This is why I used the while loop with the concatenated string... so I can do some processing of the subsections that I find that satisfy the above criteria. Writing a while loop to process the code line by line would be more efficient in memory, but more time in development. Vince

      <IN>; # skip "--start of file--" while( <IN> ) { if( $_ eq "#:lav\n" ) { local $/= "#:eof\n"; my $block= <IN>; # ... process $block here ... } elsif( $_ eq /^--end of file--/ ) { warn "Unexpected line: $_"; } }

      Updated: Thanks to graff for noting that I wrote $\ when I meant $/.

                      - tye