Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Your code will work just fine for preserving the original line boundaries no matter what system created the text. In that respect, I personally don't know of any other approach that would improve on yours.

But since it's HTML data that you're working with, line breaks are only meaningful as such within <pre> ... </pre> -- which leads to at least three points that might be interesting for your situation:

  • original data may have strange variations in the placement of line breaks, though this does not affect browser behavior;
  • you can often "revise" the distribution of line breaks without any noticeable effect on browser behavior;
  • the previous two points do not apply in certain portions of some HTML data (i.e. within <pre> elements).

If all you're doing is taking html data that is already "okay" and replicating it with some particular wrapping around it, your suggested code will be fine.

If your process involves any sort of filtering, enhancement or other modification of the content, then you will be much better off looking through the various HTML modules (especially HTML::Parser or HTML::TokeParser) to read the input properly. I frankly don't know how these will handle the subtler details of input from different systems. At worst, you may need to keep something like the code you suggested when handling the contents of <pre> blocks.

update: it sounds like you're producing all your output for just one system (the one running the perl script), which means you want to eliminate the variations in line-break characters. But if you had to keep the line-breaks as-is, so that the results could be read back nicely on the particular system that created each original, you'd want to modify your code just a little:

$file=''; open(IN,"<$filename") || die "$file can't be opened: $!"; { local $/=undef; $file=<IN>; } ($\) = (/(\r\n|\r|\n)/); # make output rec-separator same as input @lines=split /[\r\n]+/, $file; foreach $line (@lines) { # do some processing here }

In reply to Re: Handling Mac, Unix, Win/DOS newlines at readtime... by graff
in thread Handling Mac, Unix, Win/DOS newlines at readtime... by strredwolf

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (4)
    As of 2018-07-17 03:27 GMT
    Find Nodes?
      Voting Booth?
      It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

      Results (354 votes). Check out past polls.