in reply to How do I use the "File::ReadBackwards" and open in "Unicode text, UTF-32, little-endian" mode
Realistically, how could that be “a bug report?” Think about it ... every multi-byte character encoding scheme that has ever been invented (or that could be) involves significant-bytes that precede the data that they modify. If you are reading the file from stern to stem, well, “either you read them or you didn’t.”
It stands to reason, therefore, that you must be the one to have read “a few more bytes than you need,” and, having read those bytes, you have to figure out whether (unlucky you ...) you started reading smack-dab in the middle of a multi-byte (MBCS) sequence or not. There is no bright-line rule answer for this. The only reliable strategy that I can think of is to rely upon some contextual knowledge about the data stream itself. Find some string of (non-MBCS) sequence that you know will occur somewhere within the last n characters of the data. Then, read some n+x (for some x...) bytes from the tail of the file, then use a regex to search within that data for that reliable sequence. Advance suspiciously forward from there.
Bear in mind that the onus is upon your application, not merely to come up with the right answers if it can, but to reliably fail if it cannot. Your application is the only player with the capability to do this. The fact that the algorithm does “produce answers at all” must, itself, be a positive indication that those answers are in fact worthy to be trusted.
Replies are listed 'Best First'. | |
---|---|
Re^2: How do I use the "File::ReadBackwards" and open in "Unicode text, UTF-32, little-endian" mode
by tobyink (Canon) on Mar 12, 2013 at 18:43 UTC | |
by hashperl (Initiate) on Mar 13, 2013 at 09:11 UTC | |
Re^2: How do I use the "File::ReadBackwards" and open in "Unicode text, UTF-32, little-endian" mode
by hashperl (Initiate) on Mar 13, 2013 at 09:13 UTC | |
Re^2: How do I use the "File::ReadBackwards" and open in "Unicode text, UTF-32, little-endian" mode
by hashperl (Initiate) on Mar 13, 2013 at 09:18 UTC | |
by hashperl (Initiate) on Mar 13, 2013 at 16:16 UTC | |
by Anonymous Monk on Apr 25, 2018 at 22:59 UTC |