Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Quick and portable way to determine line-ending string?

by mdillon (Priest)
on Aug 09, 2001 at 03:34 UTC ( #103284=note: print w/replies, xml ) Need Help??

in reply to Quick and portable way to determine line-ending string?

can't you split on the value of $/? (whose value is presumably related to $^O in the Perl source code and hence will always be in synch):

my @lines = split m#$/#, $content;

or how about just splitting on any line ending?:

my @lines = split m#\x0d\x0a?|\x0a#, $content;

to get around the multiple 0x0d problem, you could add \x0d+\x0a to the alternation as the first alternative (though it will slow things down on a Unix file with a lot of blank lines). come to think of it, \x0d{2}\x0a might be a better idea.

for EBCDIC, i think the first solution i mentioned should work.

for some reason, i feel like i'm missing something fundamental about your question, so if i'm just spouting crazy-talk, please ignore me.

Replies are listed 'Best First'.
Re: Re: Quick and portable way to determine line-ending string?
by bikeNomad (Priest) on Aug 09, 2001 at 04:15 UTC
    $/ is set to "\n" by default. Which doesn't answer the question of "what is the representation of "\n" in external text files?".

    This is the kind of thing that'll probably work but I was trying to avoid because it's hard to track and get right:

    my $nativeSeparator = "\n"; if ($^O =~ /MSWin32|dos|os2|cygwin/) { # not sure what to do about cygwin here. $nativeSeparator = "\x0d\x0a" } elsif ($^O eq 'MacOS') { $nativeSeparator = "\x0d" } elsif ($^O eq 'VMS') { # it depends on file type... what to do? } elsif (ord('A') eq 193) { # what to do for EBCDIC? "\n" may be OK... }

      I think EBCDIC is going to be a pain. For one thing, many mainframe systems assumes fixed length records with no newline separator. I thought that different mainframes would use different characters to determine a newline. \x15 and \x25 are two that are used. Also, according to the documentation for Convert::EBCDIC, there is a standard EBCDIC and a version used for OS390 (which may account for the different line endings).

      One problem there is that the EBCDIC Newline doesn't really translate to the ASCII CR or LF. Further, since the 'newline' varies on ASCII systems, I can only imagine that it's going to vary on EBCDIC systems. Admittedly, it's been a while since my mainframe days (no, I wasn't a Y2K boy), but I doubt you'll find a truly universal solution without the user choosing how their newline gets translated.

      Here's an interesting chart of the EBCDIC characters. What the heck is a "Required newline" (\x06)? I sure as heck don't remember that.

      Good luck.


      Vote for paco!

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://103284]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2021-05-09 05:13 GMT
Find Nodes?
    Voting Booth?
    Perl 7 will be out ...

    Results (100 votes). Check out past polls.