Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

regexp for finding all function calls

by marksman (Novice)
on Jul 07, 2009 at 21:21 UTC ( #778026=perlquestion: print w/ replies, xml ) Need Help??
marksman has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
I'm writing a Perl script to read in Perl code and extract all calls to Mod::func().
Mod::func() might take any number of arguments with strings and funny characters and newlines and such.

Right now, I have this regexp:
use Regexp::Common qw(balanced); m/Mod::func\s*$RE{balanced}{-parens=>'()'}\s*;/g
It works well, except that calls like this break it:
Mod::func( ")" );
Namely, strings that have unbalanced parentheses.

Any advice on how to improve this regexp, or another course to pursue?
Thanks.

Comment on regexp for finding all function calls
Select or Download Code
Re: regexp for finding all function calls
by Fletch (Chancellor) on Jul 07, 2009 at 21:47 UTC

    Sounds like an XY Problem. Step back and explain what you're attempting to accomplish and there may be a better approach (e.g. using B::Xref and post-processing its output).

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Ok.

      I have a large code base of say thousands of perl files. I want to copy all calls to Mod::func() interspersed in these thousands of files to a single file, say funcCalls.pl. Then I will run fucCalls.pl and verify the output of all of the calls to Mod::func().

      If the output is wrong, hopefully I will have recorded the line number and file from which each call to Mod::func() came from. In this way I will be able to fix any erroneous calls to to Mod::func().
        I'd recommend using PPI, which does its best (and a very good job) at actually parsing Perl. You can search through the parse tree for sub calls, and check their names.

        That will be much more accurate than anything you can achieve with a regex.

        By the way, that was my post above as Anonymous Monk, I forgot that I wasn't logged in.
Re: regexp for finding all function calls
by planetscape (Canon) on Jul 08, 2009 at 02:00 UTC
      Thanks, I will have a look at those resources.
Re: regexp for finding all function calls
by merlyn (Sage) on Jul 08, 2009 at 02:01 UTC
    See B::Xref, which should be able to identify all calls.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

      Thanks,
      B::Xref looks very useful now that I have a second look at it. Actually Fletch mentioned that in his first reply, so thanks also to Fletch.

      I am able to extract the line numbers of calls to Mod::func(). I'm still wondering whether B::Xref can actually display the original code from that function call. Does anyone know if that's possible?

      Right now I still don't know how many lines each function call spans.
Re: regexp for finding all function calls
by ikegami (Pope) on Jul 08, 2009 at 14:16 UTC
    Can you add double quotes to the list of characters for -parens to heed? If not, there's Text::Balanced which does.
      I think I can add double quotes, but I don't think it will work for calls such as
      Mod::func("str1 \"); ");
      There are probably other corner cases as well.
Re: regexp for finding all function calls
by Anonymous Monk on Jul 09, 2009 at 03:15 UTC
    Something along the lines of ...
    #!c:\Perl\bin\perl.exe # # This recursively searches .pl and .pm files for parenthesized functi +on calls. # $ARG[0] specifies the initial directory, # $ARG[1] specifies the function name, # for example: # c:\alpha\adx gettext # use File::Find; use Perl6::Slurp; use strict; use warnings; # Prowl the drive looking for files named .pl and .pm find({ wanted => \&FindFunctionCalls__, follow => 0 } ,$ARGV[0]); exit; sub FindFunctionCalls__ { # Don't trace me # $File::Find::dir is the current directory name, # $_ is the current filename within that directory # $File::Find::name is the complete pathname to the file. return unless (m/\.p(l|m)$/i); return if (/^(p|s)\./); my($s_Function)=($ARGV[1]); # Slurp my $s_Code=slurp $File::Find::name, {}; my $Count; # ... and search for "$s_Function" if (my $Count=($s_Code=~s/(?:^|\W)$s_Function\W/$&/g)) { # $s_ +Function occurred at least once # Remove the "use statement;" if any print "\nFound $Count in $File::Find::name\n"; # Balanced parentheses: my $np; # The initial balance parentheses expression with an embed +ded set is: # $np=qr/\( ([^()] | (??{$np})) *\)/x; # and quoted strings are "(?:[^"]|\")*" and '(?:[^']|\')* +' or ('|")(?:[^\1]|\\1)*\1 so $np=qr/ \( + # The opening "(" (( + # We'll want this hence the capturing () '(?:[^']|\')*?' + # a single quote string | "(?:[^"]|\")*?" + # a double quote string | [^()] + # not a parentheses | (??{$np}) )*) \) + # and the closing ")" /x; # Find the body of the function calls my($s_CompleteCall,$s_Before); while ($s_Code =~ m/(?:^|\W)($s_Function\s*$np)/gos) { # H +ave the entire call! # The complete call is $s_CompleteCall=$1; # and line number is print 1+(($s_Before=$`)=~ tr/\n/\n/).": $s_CompleteCal +l\n"; }; }; }; # __END__
    perhaps?
Re: regexp for finding all function calls
by marksman (Novice) on Jul 17, 2009 at 18:40 UTC
    I'm still working on this problem, and have decided to try the perl PPI.

    I'm a little stuck on how to find all calls to Mod::func within PPI. I've successfully read in the document, and am calling the PPI::Node->find method.
    http://search.cpan.org/~adamk/PPI-1.203/lib/PPI/Node.pm#find_$class_|_\&wanted

    My first question is how to construct the anonymous subroutine to find all PPI::Statements that match Mod::func.

    A second question is if it is possible to extract the line number of the statement from PPI. Otherwise I may also use B::Xref and hope that the Mod::func calls that it finds matches the ones returned by PPI.

    Update: here's the code I'm working on right now, but the 'and' condition in the anonymous sub is returning false on everything. Hopefully the intent is clear though
    # find all statements that have a PPI::Token::Word with "Mod::func" my $res = $Doc->find ( sub { $_[1]->isa('PPI::Statement') and $_[1]->contains('PPI::Token::Word') } );
Re: regexp for finding all function calls
by marksman (Novice) on Jul 17, 2009 at 21:16 UTC
    I think I figured out one way to do it:

    my $res = $Doc->find ( sub { if ($_[1]->class eq 'PPI::Statement') { # find all PPI::Token::Words within the statement that have # literal equal to Mod::func # non-zero return means it found something $_[1]->find(sub { $_[1]->isa('PPI::Token::Word') and $_[1]->literal eq 'Mod::func'; }) } else {return 0;} } );
    Update: Figured out the line numbers too
    if ($res) { foreach (@$res) { my $line = @{$_->location}[0]; print "# line number: $line \n"; print "$_\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://778026]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2014-07-13 10:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (249 votes), past polls