in reply to Re: Can someone please write a *working* JSON module (Send money)
in thread Can someone please write a *working* JSON module

This node falls below the community's threshold of quality. You may see it by logging in.
  • Comment on Re^2: Can someone please write a *working* JSON module (Send money)

Replies are listed 'Best First'.
Re^3: Can someone please write a *working* JSON module (Send money)
by Corion (Patriarch) on Oct 24, 2021 at 10:23 UTC

    Since JSON is (supposed to be) UTF-8, you merely need to mark the resulting data as being UTF-8 decoded. You could even do it for all string data, assuming that all your input has been verified as UTF-8. See for example Re: Bypass utf-8 encoding/decoding?, the function/macro you want is newSVpvn_utf8.

    Obviously, this implies that you're trusting your input data to actually be valid UTF-8...

      newSVpvn_utf8 sounds awesome!. Is there some simple way to detect invalid UTF-8 ?

      I guess something, somewhere, knows this - since croak() is the bane of my existence right now: email subject lines which may or may not have been truncated somewhere are 100% guaranteed to spew invalid UTF-8 at *some* point.

      Is there some way perl can auto-magically handle UTF-16 as well? e.g. (from the RFC): "... UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E"." (those 4 bytes (and/or 12 characters) are also an example of why truncated text breaks everything I expect)

        See Encode::Unicode for the translations between the various Unicode encodings.

        I think converting from UTF-16 to UTF-8 is merely a mathematical transformation between two encoding styles of the same number, so you can easily model that. I'm not sure how easy it is to determine whether a backslash-escaped sequence is UTF-8 or UTF-16, but maybe if it's just two characters, it's UTF-8.

        A reply falls below the community's threshold of quality. You may see it by logging in.
Re^3: Can someone please write a *working* JSON module (Send money)
by NERDVANA (Hermit) on Oct 24, 2021 at 16:14 UTC
    While I was sort of rolling my eyes at the request to write a new JSON module just to avoid 'croak', you got me thinking, and it actually would be rather useful to have a parser function that starts from pos($scalar) and converts valid JSON into Perl SV* until the first parse error, then returns what it built so far along with flags for how it ended. It would be especially useful if it returned partial results that could be resumed on additional input, allowing you to feed the parser with buffer segments. Or like you suggested, ignore certain types of decoder errors.

    The C function might look like

    bool json_parse_more(pTHX_ struct json_parse_state *state, // configuration and error messages SV *input, // any scalar int input_pos, // byte offset within the scalar SV *output // empty SV, destination for data );
    and you could call that recursively to assign the output SV with the progress-so-far of whatever it found on input. As long as the state was unique to the thread, it would be thread-safe. It's probably easiest to store all the error info into the struct.

    You could probably read the implementations of all the other JSON modules to flesh out the implementation of that one function, then you could wrap that one function in XS, along with some XS methods to construct/read/write the state struct, and you'd be on your way.

    When you get to the part of decoding unicode, you'll see the solutions in all the other JSON modules, but you need to fully understand what they're solving. A perl SV can either be raw bytes or Characters, and the Perl is_utf8 flag is *not* a proper indication of this. The perl is_utf8 flag only indicates to the back-end whether you need to use utf8 functions to read the characters or if there is one character per byte. There can be cases where a byte > 127 is stored as a utf8 sequence even though it wasn't intended by the application to be a character yet. So, you need to let the user specify whether they think their string contains bytes or characters when calling your API, then do the decoding in your module if they say the input needs decoded. Again, the solutions for these problems will all be found in the other existing JSON modules. As it happens, the UTF8 rant by MLEHMANN in the JSON::XS manual is the explanation that finally showed me the right way to think about Perl's utf8 flag.

Re^3: Can someone please write a *working* JSON module
by Anonymous Monk on Oct 24, 2021 at 10:20 UTC
    Get technical?