Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Simple Parsing of JavaScript function names using Parse-RecDescent

by Incognito (Pilgrim)
on Mar 11, 2002 at 23:48 UTC ( #151014=perlquestion: print w/ replies, xml ) Need Help??
Incognito has asked for the wisdom of the Perl Monks concerning the following question:

I've built a simple grammar using Parse-RecDescent to return only the function names found in a file. Basically all I care about at this point is the function name (later on I will return the contents of the function, rather than the name, but am having troubles which are mentioned in the Output section.

#!/usr/bin/perl use strict; # Enforces safer, clearer code. use warnings; # Detects common programming errors use Time::HiRes qw(gettimeofday); use Parse::RecDescent; #--------------------------------------------------------------------- +- # Build the grammar. #--------------------------------------------------------------------- +- my ($grammar); my ($startCompile) = gettimeofday; $grammar = q { statement: ( ( function_method | brace_statement | parenthesis_statement | bracket_statement | stuff_we_ignore ) (';')(?) { $return = $item[1]; } )(s?) function_method: 'function' identifier parenthesis_statement brace_statement { $return = $item[2]; } brace_statement: '\{' statement '\}' { $return = $item[2]; } parenthesis_statement: '(' statement ')' { $return = ""; } bracket_statement: '[' statement ']' { $return = ""; } stuff_we_ignore: ( identifier | punctuators )(s?) + { $return = ""; } identifier: /\w+/ punctuators: /[><=!~:&^%,\?\|\+\-\*\/\.]+/ # stuff_we_ignore: /[\w><=!~:&^%,\?\|\+\-\*\/\.]+/ # others: /[;(){}\[\]]+/ }; #--------------------------------------------------------------------- +- # Print the results. #--------------------------------------------------------------------- +- my @localDeclaredVars = <DATA>; my $localDeclaredVar = join ' ', @localDeclaredVars; my $parser = new Parse::RecDescent ($grammar) or die "*** Bad grammar! +\n"; my $i = 1; my ($endCompile) = gettimeofday; print "\nCompile Time: " . (sprintf "%2.1f", ($endCompile - $startComp +ile)) . " seconds\n"; print '-'x80 . "\n"; my $refParsedValues = $parser->statement($localDeclaredVar) || print " +*** $localDeclaredVar\n"; # This is ugly - What can I do folks? if (ref($refParsedValues) eq 'ARRAY') { foreach my $parsedValue (@$refParsedValues) { if (ref($parsedValue) eq 'ARRAY') { foreach my $parsedSubValue (@$parsedValue) { if (ref($parsedSubValue) eq 'ARRAY') { chomp (@$parsedSubValue); print sprintf ("%3d", $i++) . " %=> [@$parsedSubVa +lue]\n"; } elsif (defined $parsedSubValue && $parsedSubValue ne + '1' && $parsedSubValue ne '') { print sprintf ("%3d", $i++) . " ^-> [$parsedSubVal +ue]\n"; } } } elsif (defined $parsedValue && $parsedValue ne '1' && $parse +dValue ne '') { print sprintf ("%3d", $i++) . " =-> [$parsedValue]\n"; } } } elsif (defined $refParsedValues && $refParsedValues ne '1' && $refPa +rsedValues ne '') { print sprintf ("%3d", $i++) . " --> [$refParsedValues]\n"; } my ($parseEnd) = gettimeofday; print '-'x80 . "\n"; print "Parse Time: " . (sprintf "%2.1f", ($parseEnd - $endCompile)) . + " seconds\n"; print "Total Time: " . (sprintf "%2.1f", ($parseEnd - $startCompile)) + . " seconds\n"; __END__ var g1, g2 = __QUOTE__; var g3 = 10000000; if (g1) { var XXXXXXX = __QUOTE__; } if ( ! defaultCookieCrumbNav ) { cookieCrumbNavBarHTML = __QUOTE__ ; } + else { function funct1 () { }; var xxx = __QUOTE__ ; } if (true == false) { alert(var1); } if (1) { if (1) { if (1) { function funct2 (X) { fff = funct3 (1,2); } + } } } function funct3 (a,b) { alert (1,2,3,4); return (a + b); } function funct4 () { var aaa = 1; } function funct5 (var1) { if (1) { return true; } else if (1) { return true; } else { if (tr +ue) { alert(var1); } } } var g4; var g7 = __QUOTE__; function funct6 () { var b = __REGEX__; c = __REGEX__; if (test333()) +{ return true; } } function funct7 () { var a = 111; } alert ( 3 ); funct5 ( funct6 ( funct2 () ) ); var done_parsing;

Output

Compile Time: 0.0 seconds ---------------------------------------------------------------------- 1 ^-> [funct1] 2 %=> [ ARRAY(0x25ec9bc) ] 3 =-> [funct3] 4 =-> [funct4] 5 =-> [funct5] 6 =-> [funct6] 7 =-> [funct7] ---------------------------------------------------------------------- Parse Time: 1.0 seconds Total Time: 1.0 seconds

My Issues

Some issues I have:
  • I can't figure out how to return the "funct2()" that was matched several levels deep from the brace_statement...
  • Is there anything else I should do to reduce Parse/Compile times, please feel free to let me know... I've read some good resources such as the POD and FAQ, and have gotten some hints there...
  • The display times never go into the decimal point (just integers)... why is this? I'm running ActiveState's Perl 5.6.1 Build 631 on a W2K box.
  • My code to print the output is very ugly... anything I can do to make the output code better?
  • Comment on Simple Parsing of JavaScript function names using Parse-RecDescent
    Select or Download Code
    Re: Simple Parsing of JavaScript function names using Parse-RecDescent
    by Incognito (Pilgrim) on Mar 12, 2002 at 01:10 UTC

      Update

      For the "My code to print the output is very ugly" comment, I have found that Data::Dumper works quite well... Of course, I ultimately want just one flat array passed, with each element in the array to be the function name (I don't want any arrays within the arrays)... This most likely has to do with what I'm returning...

    Re: Simple Parsing of JavaScript function names using Parse-RecDescent
    by qslack (Scribe) on Mar 12, 2002 at 01:33 UTC
      To reduce parse and compile times, check out http://www.engelschall.com/ar/perldoc/pages/module/Devel::DProf.html (Devel::DProf). Get it from CPAN. Then, separate your code into subs (which you should do anyways) and see which parts are taking most of your time. Then apply your optimization efforts there. 10% of the code runs 90% of the time. Optimize that 10% and you can get huge results.

      Regarding question 4, in Time::HiRes, gettimeofday returns a two-element list. my ($seconds, $microseconds) = gettimeofday; You're only getting the first element, the seconds, from it. Add in $microseconds and use it to compute the time as well.

      Quinn Slack
      perl -e 's ssfhjok qupyrqs&&tr sfjkohpyuqrspitnrapj"hs&&eval&&s&&&'
    Re (tilly) 1: Simple Parsing of JavaScript function names using Parse-RecDescent
    by tilly (Archbishop) on Mar 12, 2002 at 03:08 UTC
      The display times issue is due to the parens you have on the left hand side. As I pointed out in Arrays are not lists, there is a huge (and often misunderstood) issue of context. Your parens put the left hand side into list context, and then you only capture the first element of a 2 element list. You can either capture both elements and then combine them, or else just drop the parens like this example does:
      use Time::HiRes qw(gettimeofday); my $time = gettimeofday(); print "The time is $time.\n";
      You are now asking for a scalar, and it gives you a useful one. :-)

      As for your performance problems, the cause of that is that TheDamian's module, while amazingly flexible, is very inefficient. His excuse for that is to be found at Re: advice with Parse::RecDescent. (In his defence the excuse is perfectly accurate.) And if you wish to help the rewrite he talked about happen, feel free to donate to the Perl development fund drive. (TheDamian is one of the core people supported by that fund.) Or feel free to write it yourself...

      If you wish to do what the module does yourself, you will need to play with pos and the /g RE flag liberally.

    Re: Simple Parsing of JavaScript function names using Parse-RecDescent
    by Incognito (Pilgrim) on Mar 12, 2002 at 03:19 UTC
      I appreciate everyone's advice and suggestions on improving speeds and with using gettimeofday(), but the real problem is that the 'funct2a' and 'funct2b' functions which are embedded deep in if statements (brace_statement) are being returned as an array... and I'd like the output to either concatenate the array elements into a comma-separated string so that in the output, 2: would look like (funct2a,funct2b) rather than the (ARRAY(0x25ec9bc))...

      One way I could do this is by flattening this multi-dimensional array that gets returned into one array - or by doing it in the grammar (which is probably more efficient)? If someone knows how to do either (both would be great for learning) that would be awesome.

      I've changed the output code to do this

      foreach my $parsedValue (@$refParsedValues) { print Dumper($parsedValue) if ($parsedValue); }
      which produces these results on the array structure that is returned to me.
      Compile Time: 0.3 seconds ---------------------------------------------------------------------- +-- $VAR1 = [ '', '' ]; $VAR1 = [ '', '' ]; $VAR1 = [ 'funct1', '', '' ]; $VAR1 = [ '', '' ]; $VAR1 = [ '', [ '', [ 'funct2a', 'funct2b', '' ], '' ], '' ]; $VAR1 = 'funct3'; $VAR1 = 'funct4'; $VAR1 = 'funct5'; $VAR1 = 'funct6'; $VAR1 = 'funct7'; ---------------------------------------------------------------------- +-- Parse Time: 1.1 seconds Total Time: 1.4 seconds
      What we really wanted was an array that printed like this:
      $VAR1 = 'funct1'; $VAR1 = 'funct2a'; $VAR1 = 'funct2b'; $VAR1 = 'funct3'; $VAR1 = 'funct4'; $VAR1 = 'funct5'; $VAR1 = 'funct6'; $VAR1 = 'funct7';

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: perlquestion [id://151014]
    Approved by root
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others surveying the Monastery: (8)
    As of 2014-08-27 09:53 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (235 votes), past polls