Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Peculiar Reference To U+00FE In Text::CSV_XS Documentation

by Anonymous Monk
on Dec 10, 2012 at 03:02 UTC ( #1008020=note: print w/ replies, xml ) Need Help??


in reply to Peculiar Reference To U+00FE In Text::CSV_XS Documentation

Why is this particular Unicode character, LATIN SMALL LETTER THORN, singled out for special mention in the documentation?

Because of its ordinal value (the number that it is)

When might they work?

:) When they're not on vacation?

:) When the source allows it?

Seriously though, the docs you're quoting say it Multibyte characters are not allowd and use perl-5.8.2 or higher

 

If this is the case, then I want to learn how to do this

What are you waiting for?
examples/csv-check Script to check a CSV file/stream
examples/csvdiff Script to shoff diff between sorted CSV files
examples/parser-xs.pl Parse CSV stream, be forgiving on bad lines
examples/speed.pl Small benchmark script


Comment on Re: Peculiar Reference To U+00FE In Text::CSV_XS Documentation
Re^2: Peculiar Reference To U+00FE In Text::CSV_XS Documentation
by Jim (Curate) on Dec 10, 2012 at 04:33 UTC

    Thanks. I tested parsing a UTF-8 Concordance DAT file using csv-check. It doesn't work.

    I don't understand your explanation of why there is a reference to the Unicode character with code point U+00FE in the Text::CSV_XS documentation. Why that character? I suspect only Tux, the maintainer of the module, can explain the mysterious reference to it.

    The documentation is ambiguous with regard to whether or not Text::CSV_XS can parse CSV records that use multi-byte metacharacters. It says it "may or may not work as expected," and it also explicitly states that U+00FE, which is a multi-byte character (the two bytes \xC3\xBE in UTF-8), "will be allowed as a quote character." It's this very ambiguity that is the basis of my inquiry here.

    Jim

      I don't understand ... the documentation is ambiguous

      But did you understand what I said? What number is it?

      In the source

      #define byte unsigned char typedef struct { byte quote_char; byte escape_char; byte sep_char;

      255 is the biggest a byte gets, right?

      :D

        \xFE is 254, not 255. It's the second biggest a byte gets. \xFF is the biggest byte.

        In any case, the character with code point U+00FE isn't a single-byte character in any Unicode character encoding scheme.

        I suspect the peculiar reference to U+00FE in the documentation has something to do with the Concordance DAT file. I hope it does, because it would then imply that Text::CSV_XS can be used to parse Concordance DAT records, which is precisely what I need to do.

        Jim

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008020]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2014-08-01 10:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (5 votes), past polls