in reply to
Re^3: CSV_XS and UTF8 strings
in thread CSV_XS and UTF8 strings
No, I am not asking at all for an option to disable the need for quotation on characters with code-points > 127. I expect consistent functionality regardless of the characters used. As illustrated if I have an input file such as
this is field 1,this is field 2, this is field 3
My script will write a new file that is exactly the same as the input file. Now if I change the ASCII letter 'e' to e with acute (U+00E9),
this is fiéld 1,this is fiéld 2, this is fiéld 3
guess what, it still works the same as with just ASCII. That is the output file is exactly the same as the input file. The output file created with my script has no double quotes around any field. It looks the same as the input file. But, if I add one Japanese character to any one of the items, that item with the Japanese character will have double quotes around it in the new file. So the inconsistency is even worse then I expected as the command "quote_space => 0 does work for some characters above 0x7F but not for all characters. My data file is UTF8 so the e acute is two bytes in UTF8 where as the Japanese character is 3 bytes but again, I'd like to think that all UTF8 data is treated the same. In conclusion, I want properly formatted CSV, I expect double quotes around strings when needed but my testing shows that there is a lot of inconsistency with the use of quote_space => 0 depending on the type of characters in the string.
Did some more testing and it appears that the characters that "quote_space => 0" works properly are printable characters in the range 0x00 - 0xFF (Basic Latin and Latin 1 Supplement). Fields with a character above 0x0100 (starts with Latin Extended A) will always get double quotes around them regardless if needed or not. So the function quote_space => 0 stops working as expected with characters starting at 0x0100.