comment on

Your code will work just fine for preserving the original line boundaries no matter what system created the text. In that respect, I personally don't know of any other approach that would improve on yours.

But since it's HTML data that you're working with, line breaks are only meaningful as such within <pre> ... </pre> -- which leads to at least three points that might be interesting for your situation:

original data may have strange variations in the placement of line breaks, though this does not affect browser behavior;
you can often "revise" the distribution of line breaks without any noticeable effect on browser behavior;
the previous two points do not apply in certain portions of some HTML data (i.e. within <pre> elements).

If all you're doing is taking html data that is already "okay" and replicating it with some particular wrapping around it, your suggested code will be fine.

If your process involves any sort of filtering, enhancement or other modification of the content, then you will be much better off looking through the various HTML modules (especially HTML::Parser or HTML::TokeParser) to read the input properly. I frankly don't know how these will handle the subtler details of input from different systems. At worst, you may need to keep something like the code you suggested when handling the contents of <pre> blocks.

update: it sounds like you're producing all your output for just one system (the one running the perl script), which means you want to eliminate the variations in line-break characters. But if you had to keep the line-breaks as-is, so that the results could be read back nicely on the particular system that created each original, you'd want to modify your code just a little:

$file='';
open(IN,"<$filename") || die "$file can't be opened: $!";
{ local $/=undef;  $file=<IN>; }
($\) = (/(\r\n|\r|\n)/);  # make output rec-separator same as input
@lines=split /[\r\n]+/, $file;
foreach $line (@lines) {
     # do some processing here
}
[download]

In reply to Re: Handling Mac, Unix, Win/DOS newlines at readtime... by graff
in thread Handling Mac, Unix, Win/DOS newlines at readtime... by strredwolf

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks