Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^4: UTF8 with YAML or JSON

by SBECK (Friar)
on Jun 29, 2012 at 18:06 UTC ( #979164=note: print w/replies, xml ) Need Help??

in reply to Re^3: UTF8 with YAML or JSON
in thread UTF8 with YAML or JSON

I want the characters included in the data structure to be EXACTLY what I included in the text that got parsed. So, if I send in a string which contains a scalar of UTF8 values, then I should see UTF8 values in the data structure. YAML::Syck does this. YAML/YAML::XS/JSON/JSON::XS all take the scalars with UTF8 values in them and produce data structures containin perl encodings.

Replies are listed 'Best First'.
Re^5: UTF8 with YAML or JSON
by zwon (Abbot) on Jun 30, 2012 at 04:35 UTC
    I should see UTF8 values ... YAML::Syck does this

    I don't see this from your example. YAML::Syck returns you two latin1 characters instead of a single \x{103} that the file contains, which is exactly the opposite to what you are saying you want. YAML::XS expects UTF-8 octets on input, and it checks that it is correct UTF-8, and it returns you UTF-8 characters. I have impression that you don't realise what you are getting from the modules, maybe you should use Dump from the Devel::Peek to inspect values instead of Dumper, also if you add

    use open ":utf8"; use open ":std";
    to your script, it will be clear to you, that YAML::Syck doesn't return ă, but ă.

      Unfortunately, what you're seeing in my post isn't exactly what I'm seeing in real life. When using the <code> markup in my post, it changed the UTF8 character, and I couldn't figure out how to make the markup show exactly what I wanted it to show (i.e. what I'm seeing in the script and debugger). That has unfortunately served to confuse the question.

      Regardless of what's showing up in my post, my script contains actual UTF8 characters, and when I use the debugger to show exactly what is in the data structure after the Load function, YAML::Syck shows the exact same UTF8 characters that are contained in the __DATA__ section. All others modules produce strings containing perl encodings.

      Hope that clarifies things a bit.

      EDIT: I need to retract my above statement. I'm still trying to understand perl and unicode (which I definitely don't have a great understanding of). Although the posting markup issue is there, the comments you make are correct, and I was wrongly interpreting things. What I've been calling the "correct" behavior does indeed involve a string of length two (instead of one), so I've got to look into this in more detail to determine exactly what it is that I'm seeing and what I WANT to see.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://979164]
[Corion]: marto: Well, I think they go a tour every two years and I think it's hard to even get a connection with the crowd at a 20k people concert... But maybe after this time I'll stop too ;)
[Corion]: I still have to see the Pet Shop Boys live before they stop touring at all
[marto]: yeah, I think that as a group creatively they're done. I can understand how it'd be hard to stop the process, album/tour, album/tour, if that's pretty much all you've ever done :)
[marto]: Corion yes I saw them Pandemonium_Tour
[Corion]: marto: Yeah, and I doubt that they'll ever get back to something like Violator/Songs of Faith and Devotion - it would either alienate their "regular" crowd, or be "too much Violator" ;)
LanX needs to see Freddy Mercury'
[Corion]: marto: Yeah, I have that on DVD even, and I missed them last year because I was in London at the time ironically ;)
[marto]: put me in the mood to listen to this now :)
[Corion]: LanX: Well, grab a shovel :)
[Corion]: Kraftwerk also are still touring (well, only Ralf Hütter and Florian Schneider), but I'm too tight fisted to spend EUR 150+ for standing around two hours ;)

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2017-03-24 11:44 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (301 votes). Check out past polls.