in reply to Best Way To Parse Concordance DAT File Using Modern Perl?
I gather that the "CRLF" pairs that serve to terminate records are not enclosed in any kind of quotes, whereas data fields that include "CRLF" as content must be quoted (using the U+00FE string delimiter). If that's not true, then parsing the input would be pretty tough.
Apart from that, I'm not sure I understand what you're saying about the BOM (U+FEFF)... What in particular needs to be done to "handle it properly"? (In UTF-8 data, it's sufficient to just ignore/delete it without further ado, or perhaps include it at the beginning of one's output, if one expects that a downstream process will be looking for it.)
Anyway, I'd go with the suggestion in the first reply.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Best Way To Parse Concordance DAT File Using Modern Perl?
by Jim (Curate) on Dec 09, 2012 at 18:43 UTC | |
by graff (Chancellor) on Dec 10, 2012 at 09:48 UTC | |
by Jim (Curate) on Dec 10, 2012 at 22:52 UTC | |
by graff (Chancellor) on Dec 11, 2012 at 07:22 UTC | |
by Anonymous Monk on Dec 11, 2012 at 07:40 UTC |
In Section
Seekers of Perl Wisdom