http://www.perlmonks.org?node_id=1061973


in reply to Text::CSV and Unicode

This has come up before, and as of Text::CSV_XS version 1.00, the behavior is now consistent. It however does not meet your current needs. I just uploaded version 1.02 a minute ago, as that now has a new attribute decode_utf8 that enables you to disable the default behavior (which is what has proven to be what most people want and expect).

decode_utf8 This attributes defaults to TRUE. While parsing, fields that are valid UTF-8, are automatical +ly set to be UTF-8, so that $csv->parse ("\xC4\xA8\n"); results in PV("\304\250"\0) [UTF8 "\x{128}"] Sometimes it might not be a desired action. To prevent thos +e upgrades, set this attribute to false, and the result will +be PV("\304\250"\0)

I realize that "most people" is not "all people" and I cannot make a default that makes "all" people happy. That is also the reason why I waited with 1.02. I have asked many users about what should be the default and also check the historical entries in RT and my mail and came to the conclusion that nowadays the majority works with UTF8 CSV more than with binary CSV. The change in 1.00 was not to enable UTF-8 or to disable it. The change was to make it work more consistently.


Enjoy, Have FUN! H.Merijn