Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

JSON parser as a single Perl Regex

by merlyn (Sage)
on Sep 26, 2012 at 19:25 UTC ( [id://995856]=CUFP: print w/replies, xml ) Need Help??

$client has a script that needed minimal module support, but wants to parse JSON. Couldn't find anything like YAML::Tiny (which I borrowed to remove the YAML dependency), so I hacked up this regex to parse and extract JSON. Doesn't handle unicode yet, but that wasn't a client requirement.
#!/usr/bin/env perl use Data::Dumper qw(Dumper); my $FROM_JSON = qr{ (?&VALUE) (?{ $_ = $^R->[1] }) (?(DEFINE) (?<OBJECT> (?{ [$^R, {}] }) \{ (?: (?&KV) # [[$^R, {}], $k, $v] (?{ # warn Dumper { obj1 => $^R }; [$^R->[0][0], {$^R->[1] => $^R->[2]}] }) (?: , (?&KV) # [[$^R, {...}], $k, $v] (?{ # warn Dumper { obj2 => $^R }; [$^R->[0][0], {%{$^R->[0][1]}, $^R->[1] => $^R->[2]}] }) )* )? \} ) (?<KV> (?&STRING) # [$^R, "string"] : (?&VALUE) # [[$^R, "string"], $value] (?{ # warn Dumper { kv => $^R }; [$^R->[0][0], $^R->[0][1], $^R->[1]] }) ) (?<ARRAY> (?{ [$^R, []] }) \[ (?: (?&VALUE) (?{ [$^R->[0][0], [$^R->[1]]] }) (?: , (?&VALUE) (?{ # warn Dumper { atwo => $^R }; [$^R->[0][0], [@{$^R->[0][1]}, $^R->[1]]] }) )* )? \] ) (?<VALUE> \s* ( (?&STRING) | (?&NUMBER) | (?&OBJECT) | (?&ARRAY) | true (?{ [$^R, 1] }) | false (?{ [$^R, 0] }) | null (?{ [$^R, undef] }) ) \s* ) (?<STRING> ( " (?: [^\\"]+ | \\ ["\\/bfnrt] # | # \\ u [0-9a-fA-f]{4} )* " ) (?{ [$^R, eval $^N] }) ) (?<NUMBER> ( -? (?: 0 | [1-9]\d* ) (?: \. \d+ )? (?: [eE] [-+]? \d+ )? ) (?{ [$^R, eval $^N] }) ) ) }xms; sub from_json { local $_ = shift; local $^R; eval { m{\A$FROM_JSON\z}; } and return $_; die $@ if $@; return 'no match'; } while (<>) { chomp; print Dumper from_json($_); }

-- Randal L. Schwartz, Perl hacker

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Replies are listed 'Best First'.
Re: JSON parser as a single Perl Regex
by Anonymous Monk on Sep 27, 2012 at 01:34 UTC

    Its quicker to fail on buggy json than RFC: A walkthrough from JSON ABNF to Regexp::Grammars

    sub fa { my $time = time; my $ref = from_json(@_); $time = time-$time; print "T$time ", Dumper( $ref ),"\n"; } fa(q{["double extra comma",,]}); ## THE BUGGY fa(q{[1,[2,[3],[]]]}); ## THE REGULAR fa( q{[{"k":"v"},{"v":"k"}] } ); fa( q{{"ro":["sham","bo"],"t":{"i":{"c":{"t":{"o":"c"}}}}}} ); __END__ T5 $VAR1 = 'no match'; T0 $VAR1 = [ 1, [ 2, [ 3 ], [] ] ]; T0 $VAR1 = [ { 'k' => 'v' }, { 'v' => 'k' } ]; T0 $VAR1 = { 'ro' => [ 'sham', 'bo' ], 't' => { 'i' => { 'c' => { 't' => { 'o' => 'c' } } } } };
      It'd probably fail much faster if I used ratcheting in VALUE.

      -- Randal L. Schwartz, Perl hacker

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Re: JSON parser as a single Perl Regex
by spazm (Monk) on Sep 27, 2012 at 23:26 UTC
Re: JSON parser as a single Perl Regex
by Anonymous Monk on Sep 26, 2012 at 21:06 UTC
    As an alternative you can use fatpack to get the minimal module support. http://search.cpan.org/~ether/App-FatPacker-0.009010/bin/fatpack
Re: JSON parser as a single Perl Regex
by merlyn (Sage) on Oct 10, 2012 at 03:41 UTC
    I think, looking back on this a few weeks later, is the most amazing thing that...
    if the match succeeds, the thing it is matched against is replaced with the actual data structure. Yes, a match turns into a substitute!

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Re: JSON parser as a single Perl Regex
by Tommy (Chaplain) on Sep 30, 2012 at 01:12 UTC

    *picks jaw back up off the floor*

    That's amazing. Seriously that's really awesome, Randal. Wow. Just wow.

    --
    Tommy
    $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'
      Deepest apologies for misspelling your name. It has been corrected.
      --
      Tommy
      $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'
Re: JSON parser as a single Perl Regex
by sedusedan (Monk) on Oct 18, 2013 at 03:04 UTC
    Also, would you mind if I package this as a CPAN module? I think I want to maintain a collection of modules that parse stuffs using a single regex.
      Feel free to take it. I hereby claim it is licensed "just like Perl". :)

      -- Randal L. Schwartz, Perl hacker

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

        Neat! Will package it at the soonest. All credits will go to you, obviously.
Re: JSON parser as a single Perl Regex
by sedusedan (Monk) on Oct 18, 2013 at 02:32 UTC

    With due respect, I admit that this hack is cool and a source of great learning experience for others.

    However, I have to ask: what is the client's requirement, to be more exact? Prior to the time of this post, there had already been JSON::PP and Mojo::JSON (which became JSON::Tiny in Oct 2012) on CPAN. They are both faster than this regex version, already support Unicode, and I suspect have smaller footprint also. JSON::PP is also much preferable to me because it gives helpful error message instead of just returning undef or die-ing with 'no match'. You can just stick either in the project source repo and be done with it in a couple of minutes.

      Prior to the time of this post there had already been JSON::PP and Mojo::JSON (which became JSON::Tiny in Oct2012)...

      I just want to clarify this: If you skim to the bottom of the POD for JSON::Tiny you will find this:

      ACKNOWLEDGEMENTS: ...to Randal Schwartz for showing the Los Angeles Perl Mongers (Sept 2012) his embeddable pure-regexp JSON parser, and explaining it on PerlMonks (995856). He wasn't involved in JSON::Tiny, but it was the exploration of alternatives to his solution that led to this fork of Mojolicious's JSON parser.

      JSON::Tiny wouldn't exist if it weren't for two things: First, for the work that Sebastian and his team did to bring us Mojo::JSON as a component of the Mojolicious framework. And second, if merlyn hadn't elevated my interest in the topic by presenting his pure-regexp solution at Los Angeles Perl Mongers in September 2012.

      I won't try to put words in his mouth or explain the rationale that he presented at LA-PM for the regexp monstrosity (or thing of beauty). My understanding was that he needed a light-weight JSON parser that he could embed in a $project. He was justifiably proud of the regular expression solution. And while it's certainly fewer lines of code than JSON::Tiny, I believed that there might be a more robust way to embed a pure-Perl JSON parser in a project.

      A few days later I found myself thinking about it again, and remembered that one of Mojolicious's philosophies is to minimize external dependencies. That meant that Mojo::JSON should be pretty easy to adapt to a stand-alone module. ...and after working around its use of Mojo::Base and Mojo::Utils, the conversion to a standalone module that could (if absolutely necessary) be copy/pasted right into a code base was straight-forward; JSON::Tiny came into existence.

      Mojo::JSON on its own wouldn't have worked for merlyn, because it relies on Mojo::Base and Mojo::Utils. I had to embed the functionality that Mojo::JSON needed from those two modules before JSON::Tiny could be made stand-alone. The test suite also required a bit of adaptation. The tests were almost all identical to their Mojo::JSON versions, but in JSON::Tiny the test suite had to emulate some of the support functionality that the Mojolicious framework would have provided.


      Dave

        Thanks for posting this.
      Strangely enough, this project needed both YAML parsing and JSON parsing (don't ask!). I was able to cannibalize the YAML::Tiny, but I didn't find a similar module for JSON. I started writing a traditional recursive descent parser, and then I remembered that modern regex can do it all internally, so I took a whack at it. You see the result.

      -- Randal L. Schwartz, Perl hacker

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

        My point is, JSON::PP was perfectly cannibalizable at that point. But I understand that you probably thought it was a chance to do something cool.
Re: JSON parser as a single Perl Regex
by fanasy (Sexton) on Nov 10, 2019 at 03:24 UTC

    I think maybe need \s* in (?<STRING>...)also as some JSON file have space outside of ""

    (?<STRING> \s* ( " (?: [^\\"]+ | \\ ["\\/bfnrt] )* " )\s* (?{ [$^R, $^N] }) )
Re: JSON parser as a single Perl Regex
by Anonymous Monk on Oct 01, 2012 at 10:30 UTC

    How do you use this script?

    I tried feeding it various .json files and it always prints 'no match'; and frequently consumes all my 8 gig of memory first.

      Worked for me. Give me an example of a one-line file that broke.

      -- Randal L. Schwartz, Perl hacker

      The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://995856]
Approved by Old_Gray_Bear
Front-paged by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-03-19 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found