|Perl: the Markov chain saw|
UTF8 with YAML or JSONby SBECK (Friar)
|on Jun 29, 2012 at 16:08 UTC||Need Help??|
SBECK has asked for the
wisdom of the Perl Monks concerning the following question:
In one of my modules (Date::Manip) I store a bunch of UTF8 data in a YAML file which I then load into a perl data structure. The basic form looks like this:
Note: the ă was entered in the question as the UTF8 character ă but inside the code block, it's displayed as above. There's probably some markup I could use to get it to display properly, but I didn't want to spend too much time getting sidetracked from the problem, so just pretend that ă and ă are the same.
YAML::Syck has one property that I haven't found in any of the other YAML (or JSON) modules... it doesn't do any handling of UTF8 (converting to perl encoding). What you put in is what you get out, so if you run the above script in the debugger and dump the value of $dat, you get:
Unfortunately, YAML::Syck is perhaps the least supported of the YAML modules and I'd like to switch to one of the more recent modules. If I change the above script to use YAML or YAML::XS (my preferred module), and then run it in the debugger, I get:
i.e. It displays the string as a perl encoding rather than a UTF8 encoding. I'm completely open to the option of converting the YAML to JSON, but the JSON and JSON::XS modules do the same thing. I've tried the following script with similar results:
Obviously, once the data structure is produced, I could recurse through it and change the perl encodings back to UTF8, but rather than do that, I'll probably just stick with YAML::Syck.
Any suggestions, or do I just stick to YAML::Syck?