http://www.perlmonks.org?node_id=1229969

Sec has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I'm looking for a simple way to parse JSON objects out of a stream.

I am calling an HTTP service that returns a stream of json objects over time.

To read it, i registered a simple callback handler in LWP::UserAgent

$ua->add_handler( response_data => sub { my($response, $ua, $h, $data) = @_; return "true" if $data =~ /^\s*$/; print "Callback triggered\n"; print "data: .${data}.\n"; my $result = eval { $json->decode( $data ) }; if ($@){ print "error: $@\n"; return "true"; }; print Dumper $result; return "true"; } );

As expected, i sometimes get partial json objects in $data.

My first idea would be to so something like m/{.*?}/ against $data. Or rather some more elaborate version that deals with balanced parens and quoted strings.

My question is, is there an easier / simpler / more straightforward way to deal with something like this that I am missing?

-- Sec
EDIT: to clarify a bit.

The problem is that in the callback $data may just be

{ "foo": "bar" } { "foo" : "q
which means I have to grab one object, handle it, keep the rest and wait for more data.

I know how to write code for that "the long way". I was asking if there is something clever or an existing module that can help with this.

Replies are listed 'Best First'.
Re: Parsing JSON out of an incremental stream
by huck (Prior) on Feb 15, 2019 at 17:40 UTC

    Although i havent used it, i remembered seeing something like this so i went looking for it again. JSON INCREMENTAL-PARSING. It seems to do what you want.

      I went and had look at that.

      Unfortunately it looks like incr_parse is not capable of actually accepting random snippets.

      >perl -MJSON -e 'my $json = JSON->new;$foo=$json->incr_parse(q!{"check +_result!);' unexpected end of string while parsing JSON string, at character offse +t 14 (before "(end of string)") at -e line 1.
      This makes it unusable in my scenario.

        I can reproduce this, but only with with JSON::PP. With JSON::XS, it works like a charm:

        PERL_JSON_BACKEND=JSON::PP perl -MData::Dump=dump -MJSON -E 'my $json = JSON->new;@foo=$json->incr_parse(q!{"check_result!); $bar=$json->incr_parse(q!" : "ok"}!); dump $bar'

        Output: unexpected end of string while parsing JSON string, at character offset 14 (before "(end of string)") at -e line 1.

        PERL_JSON_BACKEND=JSON::XS perl -MData::Dump=dump -MJSON -E 'my $json = JSON->new;@foo=$json->incr_parse(q!{"check_result!); $bar=$json->incr_parse(q!" : "ok"}!); dump $bar

        Output: { check_result => "ok" }

Re: Parsing JSON out of an incremental stream
by Sec (Monk) on Feb 15, 2019 at 17:52 UTC
    I have no cobbled together my own solution. It's less ugly than I expected.

    Most tricky part (for me) was to come up with a regexp that works well enough for me. (I don't correctly deal with hashes inside arrays)

    our $store; $ua->add_handler( response_data => sub { my($response, $ua, $h, $data) = @_; $store.=$data; eval { if ($store =~ s/^\s*({\s*(?:"[^"]*"|(?>[^"{}]* +)|(?1))+\s*})\s*//){ my $result = $json->decode( $1 ); print Dumper $result; }; }; if ($@){ print STDERR "Error: $@\n"; die }; 1; } );

      I strongly encourage you to look at huck's link instead and use the incremental parsing in concert with some higher level knowledge of the objects/arrays to expect to control what to do with the data as it becomes complete.

      For anyone following along.
      $ua->add_handler( response_data => sub { my($response, $ua, $h, $data) = @_; eval { my $result = $json->incr_parse( $data ); if (defined $result){ print Dumper $result; }; }; if ($@){ print STDERR "Error: $@\n"; die }; 1; } );
      is a pretty nice solution in my eyes.

      The eval/catch is a bit of an eyesore, but LWP seems to silently eat all errors in handlers, and just not call them anymore which makes debugging quite difficult.

      Also make sure you install JSON::XS - the "core" version of JSON::PP may decide to die on you.