Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Best Way To Parse Concordance DAT File Using Modern Perl?

by Jim (Curate)
on Dec 10, 2012 at 22:09 UTC ( #1008172=note: print w/ replies, xml ) Need Help??


in reply to Re: Best Way To Parse Concordance DAT File Using Modern Perl?
in thread Best Way To Parse Concordance DAT File Using Modern Perl?

If it's a UTF-8 file, isn't it meant to have a 3 byte BOM? Your BOM indicates that it's a UTF-16 file, not UTF-8.

It is a Unicode BOM encoded in three bytes in the UTF-8 character encoding scheme. But it's just one character (one Unicode code point), represented in Perl as \x{FEFF} or \N{BYTE ORDER MARK}. In a decoded, abstract Unicode string, distinctions between various encodings (serializations) of the string don't exist.

Jim


Comment on Re^2: Best Way To Parse Concordance DAT File Using Modern Perl?
Replies are listed 'Best First'.
Re^3: Best Way To Parse Concordance DAT File Using Modern Perl?
by Anonymous Monk on Jan 15, 2013 at 23:24 UTC
    I realize it is probably impossible because the file contains evidence and attorney work product, but can you isolate and anonymize a few exemplar records that would cause the CSV or CSV_XS modules to fail in a properly formatted file somewhere?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008172]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (10)
As of 2015-07-08 02:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls