Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Parsing Emacs Lisp sexpr?

by perlancar (Hermit)
on Apr 09, 2020 at 03:00 UTC ( [id://11115256]=note: print w/replies, xml ) Need Help??


in reply to Re: Parsing Emacs Lisp sexpr?
in thread Parsing Emacs Lisp sexpr?

Nice work! I wonder why you opt to parse this format specifically instead of the generic lisp format though.

As for the speed, it's actually rather on-par with Data::SExpression, which uses Parse::Yapp. I commented out the dumping and then:

% time perl 11115197.pl archive-contents

real    0m7.449s
user    0m7.036s
sys     0m0.413s

% time perl -MFile::Slurper=read_text -MData::SExpression -E'$ds=Data::SExpression->new; ($sexp, $text) = $ds->read(read_text "archive-contents.2");'
real    0m5.411s
user    0m5.386s
sys     0m0.025s

archive-contents.2 is just the original file with replaced with ( ), and then the problematic @ atom replaced by "@".

Perl regex or Regexp::Grammars will probably be several times faster.

Replies are listed 'Best First'.
Re^3: Parsing Emacs Lisp sexpr?
by choroba (Cardinal) on Apr 10, 2020 at 22:14 UTC
    > I wonder why you opt to parse this format specifically instead of the generic lisp format though.

    As I said, I started from a wrong end. I'm kind of busy working from home and staying there with a wife and three children, so I didn't have time to fix it immediately. Here's a much simpler and faster version, which parses melpa's archive-contents in less than 5 seconds on my machine:

    #! /usr/bin/perl use warnings; use strict; use Marpa::R2; my $dsl = << '__DSL__'; :default ::= action => ::first lexeme default = latm => 1 List ::= ('(') Elements (')') Elements ::= Element+ action => [values] Element ::= List | Vector | Atom | String | Pair Vector ::= ('[') Elements (']') Atom ::= identifier String ::= ('"') Quoteds ('"') Quoteds ::= Quoteds Quoted action => concat | Quoted Quoted ::= backslash || qq || plain Pair ::= Element (dot) Element action => pair :discard ~ whitespace whitespace ~ [\s]+ dot ~ '.' backslash ~ '\\' qq ~ '\"' identifier ~ [-\w@:+]+ plain ~ [^\\"]+ __DSL__ sub concat { $_[1] . $_[2] } sub pair { +{ $_[1] => $_[2] } } my $grammar = 'Marpa::R2::Scanless::G'->new({source => \$dsl}); my $lisp = do { local $/; <> }; my $value_ref = $grammar->parse(\$lisp, {semantics_package => 'main'}) +; use Data::Dumper; print Dumper $value_ref;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thanks for this, choroba. It finishes in about 2 seconds on my computer, pretty impressive. I'll see what I can use to improve my SExpression::Decode::Marpa.
Re^3: Parsing Emacs Lisp sexpr?
by perlancar (Hermit) on Apr 10, 2020 at 02:31 UTC

    And here's my stab at creating a Marpa-based parser, based on JSON::Decode::Marpa: https://github.com/perlancar/perl-SExpression-Decode-Marpa/. It's unfinished (its number and string rules, particularly, are still not adjusted), but can already parse the original archive-contents file, a bit faster than Data::SExpression:

    % time perl -Ilib -MSExpression::Decode::Marpa=from_sexp -MFile::Slurper=read_text -E'from_sexp(read_text "archive-contents")'
    
    real    0m4.023s
    user    0m3.818s
    sys     0m0.204s
    
Re^3: Parsing Emacs Lisp sexpr?
by perlancar (Hermit) on Apr 09, 2020 at 10:44 UTC
    Anyhow, I tried hacking a regex-based parser here. It's "working" with some problem: 1) segmentation fault for larger data, indicating a leak somewhere. 2) parsing failure when e.g. the NUMBER rule fails to match and it matches ATOM instead, e.g. in this sexp: (1a) which fails, but (1) and (a) succeed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11115256]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2025-05-23 15:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.