While I was sort of rolling my eyes at the request to write a new JSON module just to avoid 'croak', you got me thinking, and it actually would be rather useful to have a parser function that starts from pos($scalar)
and converts valid JSON into Perl SV* until the first parse error, then returns what it built so far along with flags for how it ended. It would be especially useful if it returned partial results that could be resumed on additional input, allowing you to feed the parser with buffer segments. Or like you suggested, ignore certain types of decoder errors.
The C function might look like
struct json_parse_state *state, // configuration and error messages
SV *input, // any scalar
int input_pos, // byte offset within the scalar
SV *output // empty SV, destination for data
and you could call that recursively to assign the output SV with the progress-so-far of whatever it found on input. As long as the state was unique to the thread, it would be thread-safe. It's probably easiest to store all the error info into the struct.
You could probably read the implementations of all the other JSON modules to flesh out the implementation of that one function, then you could wrap that one function in XS, along with some XS methods to construct/read/write the state struct, and you'd be on your way.
When you get to the part of decoding unicode, you'll see the solutions in all the other JSON modules, but you need to fully understand what they're solving. A perl SV can either be raw bytes or Characters, and the Perl is_utf8 flag is *not* a proper indication of this. The perl is_utf8 flag only indicates to the back-end whether you need to use utf8 functions to read the characters or if there is one character per byte. There can be cases where a byte > 127 is stored as a utf8 sequence even though it wasn't intended by the application to be a character yet. So, you need to let the user specify whether they think their string contains bytes or characters when calling your API, then do the decoding in your module if they say the input needs decoded. Again, the solutions for these problems will all be found in the other existing JSON modules. As it happens, the UTF8 rant by MLEHMANN in the JSON::XS manual is the explanation that finally showed me the right way to think about Perl's utf8 flag.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link
or How to display code and escape characters
are good places to start.