Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: Can someone please write a *working* JSON module (Send money)

by Corion (Pope)
on Oct 24, 2021 at 10:23 UTC ( #11137957=note: print w/replies, xml ) Need Help??


in reply to Re^2: Can someone please write a *working* JSON module (Send money)
in thread Can someone please write a *working* JSON module

Since JSON is (supposed to be) UTF-8, you merely need to mark the resulting data as being UTF-8 decoded. You could even do it for all string data, assuming that all your input has been verified as UTF-8. See for example Re: Bypass utf-8 encoding/decoding?, the function/macro you want is newSVpvn_utf8.

Obviously, this implies that you're trusting your input data to actually be valid UTF-8...

  • Comment on Re^3: Can someone please write a *working* JSON module (Send money)
  • Download Code

Replies are listed 'Best First'.
Re^4: Can someone please write a *working* JSON module (Send money)
by cnd (Acolyte) on Oct 24, 2021 at 11:06 UTC
    newSVpvn_utf8 sounds awesome!. Is there some simple way to detect invalid UTF-8 ?

    I guess something, somewhere, knows this - since croak() is the bane of my existence right now: email subject lines which may or may not have been truncated somewhere are 100% guaranteed to spew invalid UTF-8 at *some* point.

    Is there some way perl can auto-magically handle UTF-16 as well? e.g. (from the RFC): "... UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E"." (those 4 bytes (and/or 12 characters) are also an example of why truncated text breaks everything I expect)

      See Encode::Unicode for the translations between the various Unicode encodings.

      I think converting from UTF-16 to UTF-8 is merely a mathematical transformation between two encoding styles of the same number, so you can easily model that. I'm not sure how easy it is to determine whether a backslash-escaped sequence is UTF-8 or UTF-16, but maybe if it's just two characters, it's UTF-8.

      A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11137957]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2021-12-02 17:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (23 votes). Check out past polls.

    Notices?