Re^2: Peculiar Reference To U+00FE In Text::CSV

Thanks. I tested parsing a UTF-8 Concordance DAT file using csv-check. It doesn't work.

I don't understand your explanation of why there is a reference to the Unicode character with code point U+00FE in the Text::CSV_XS documentation. Why that character? I suspect only Tux, the maintainer of the module, can explain the mysterious reference to it.

The documentation is ambiguous with regard to whether or not Text::CSV_XS can parse CSV records that use multi-byte metacharacters. It says it "may or may not work as expected," and it also explicitly states that U+00FE, which is a multi-byte character (the two bytes \xC3\xBE in UTF-8), "will be allowed as a quote character." It's this very ambiguity that is the basis of my inquiry here.

Jim

Comment on Re^2: Peculiar Reference To U+00FE In Text::CSV_XS Documentation

Replies are listed 'Best First'.
Re^3: Peculiar Reference To U+00FE In Text::CSV_XS Documentation by Anonymous Monk on Dec 10, 2012 at 05:42 UTC
I don't understand ... the documentation is ambiguous But did you understand what I said? What number is it? In the source `#define byte unsigned char typedef struct { byte quote_char; byte escape_char; byte sep_char;` [download] 255 is the biggest a byte gets, right? :D	[reply] [d/l]
Re^4: Peculiar Reference To U+00FE In Text::CSV_XS Documentation by Jim (Curate) on Dec 10, 2012 at 06:20 UTC
`\xFE` is 254, not 255. It's the second biggest a byte gets. `\xFF` is the biggest byte. In any case, the character with code point `U+00FE` isn't a single-byte character in any Unicode character encoding scheme. I suspect the peculiar reference to `U+00FE` in the documentation has something to do with the Concordance DAT file. I hope it does, because it would then imply that Text::CSV_XS can be used to parse Concordance DAT records, which is precisely what I need to do. Jim	[reply]
Re^5: Peculiar Reference To U+00FE In Text::CSV_XS Documentation by Tux (Canon) on Dec 10, 2012 at 07:56 UTC
I do not recall exactly why the docs were written the way they were, but as I am unaware of the DAT format, I cannot verify that U+00FE was referred to because of this format. Knowing my own way of thinking, it most likely is not 0xFF, as that would be -1, which could be used as a guard marker or something alike (currently it isn't). I might have used 0xFE as it is the next highest byte. I have just read your post, and the only conflict I see that Text::CSV_XS is not able to do is optional line endings. The optional <CR> before the <NL> is automatically dealt with (just do not specify `eol`), but you cannot have an extra U+00AE to also end records. If otoh 0xAE is just a placeholder for embedded newlines, that is easy to do (see below). Another point of care is that Text::CSV_XS does not deal with BOM's, so you'll need File::BOM or other means to deal with that. `my $csv = Text::CSV_XS->new ({ sep_char => "\x{14}", quote_char => "\x{fe}", escape_char => undef, binary => 1, auto_diag => 1, }); while (my $row = $csv->getline ($fh)) { tr/\x{ae}/\n/ for @$row; # continue as usual }` [download] If it doesn't, I'd like to see some data. Note that the encoded U+00FE is 0xC3BE, which is two bytes, and two bytes cannot be used as a `sep_char` in Text::CSV_XS, which parses the data as bytes, so the stream has to be properly coded before parsing. Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^6: Peculiar Reference To U+00FE In Text::CSV_XS Documentation by Jim (Curate) on Dec 10, 2012 at 17:33 UTC
Re^7: Peculiar Reference To U+00FE In Text::CSV_XS Documentation by Tux (Canon) on Dec 10, 2012 at 18:02 UTC
Some notes below your chosen depth have not been shown here


Do you know where your variables are?
	PerlMonks

Re^2: Peculiar Reference To U+00FE In Text::CSV_XS Documentation