in reply to Re^6: CSV_XS and UTF8 strings
in thread CSV_XS and UTF8 strings
You should pay attention to the two sides of the coin:
- Parsing: Text::CSV_XS is created to do safe, reliable, and fast parsing of CSV data. The constructor supports many attributes to control the parsing of CSV data that is formatted outside of the default allowable small definition. The most common used attribute will be sep_char to allow for all the different non-standard seperation characters used by M$-Excel which uses the "list separation character" from the locale setting instead of the default comma when exporting to CSV. "The string is marked UTF8" only applies to this side of the coin: when reading CSV.
- Writing: many of the attributes only apply to parsing, some apply only to writing. The quote_space is one of them and has no influence whatsoever on parsing data.
Text::CSV_XS parses and writes bytes, not characters or letters. The "upgrade" to Unicode/UTF-8 only applies to the moment a field is correctly parsed and detected "binary" inside that field. When dealing with Unicode (in whatever encoding), you are absolutely sure that the text both in parsing and writing will contain "binary" bytes so you should always set that attribute. The fact that it is not default stems from the distant past. Setting that to a sane default of 1 could possible break backward compatibility.
In writing both whitespace and "binary" bytes will trigger quotation. Please don't mix quote_space (controlling quotation on whitespace) with quote_binary (controlling binary quotation - the new attribute), so what you perceive as "strange" is just a misconception of your understanding of the quote_space attribute.
Enjoy, Have FUN! H.Merijn