Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Realistically, how could that be “a bug report?”   Think about it ... every multi-byte character encoding scheme that has ever been invented (or that could be) involves significant-bytes that precede the data that they modify.   If you are reading the file from stern to stem, well, “either you read them or you didn’t.”

It stands to reason, therefore, that you must be the one to have read “a few more bytes than you need,” and, having read those bytes, you have to figure out whether (unlucky you ...) you started reading smack-dab in the middle of a multi-byte (MBCS) sequence or not.   There is no bright-line rule answer for this.   The only reliable strategy that I can think of is to rely upon some contextual knowledge about the data stream itself.   Find some string of (non-MBCS) sequence that you know will occur somewhere within the last n characters of the data.   Then, read some n+x (for some x...) bytes from the tail of the file, then use a regex to search within that data for that reliable sequence.   Advance suspiciously forward from there.

Bear in mind that the onus is upon your application, not merely to come up with the right answers if it can, but to reliably fail if it cannot.   Your application is the only player with the capability to do this.   The fact that the algorithm does “produce answers at all” must, itself, be a positive indication that those answers are in fact worthy to be trusted.

In reply to Re: How do I use the "File::ReadBackwards" and open in "Unicode text, UTF-32, little-endian" mode by sundialsvc4
in thread How do I use the "File::ReadBackwards" and open in "Unicode text, UTF-32, little-endian" mode by hashperl

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (2)
    As of 2017-12-16 22:53 GMT
    Find Nodes?
      Voting Booth?
      What programming language do you hate the most?

      Results (459 votes). Check out past polls.