Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Matching first Perl statement.

by vladb (Vicar)
on Jan 15, 2002 at 01:00 UTC ( #138702=perlquestion: print w/replies, xml ) Need Help??

vladb has asked for the wisdom of the Perl Monks concerning the following question:

Is it possible to use a single pattern match to match a single Perl statement (delimited with a ;)?

Although looks rather trivial, I struggled to come up with anything workable for these cases:
myfunc $x eq "foo;text";foobar';';$x++; # matching first Perl statement.. # should return: # myfunc $x eq "foo;text"; # # not: # myfunc $x eq "foo;";foobar'; # # or something else...
My first (shameful) attempt was this pattern: m/([`'"])*[^\1]*;[^\1]*\1;/. However, it didn't work (for some obvious reasons ;).

I've also looked at Text::Balanced. But couldn't figure which of its methods I had to use in order to accomplish what I wanted. Basically, the idea is to find first statement delimited by a ';' while also avoiding any occurance of ';' inside a pair of quotes.

Thanks in advance ;-)

"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

Replies are listed 'Best First'.
Re: Matching first Perl statement.
by danger (Priest) on Jan 15, 2002 at 03:01 UTC

    Well, if you've got a particularly well constrained data set, then you might be able to achieve what you want without *too* much trouble. However, your example of using a string containing a semi-colon only touches on the complexity of the problem --- here's another:

    my $x = do{ print "here's an embedded statement\n"; 42; };

    And we can imagine all sorts of complications using other quote-like mechanisms like s//statement;statement;/e or s;;;ge ... But even if you can grok all the combinations of quote-like operators and embedded multi-statement terms, you still have a problem. As merlyn so aptly points out in this node, from which I'll extract (and modify) just one little tidbit:

    $x = sin / 25 ; # /; die "Bang! I'm dead!"; $y = time / 25 ; # /; die "I'm only pretending!";

    Where does the first statement end in each of these two lines? It isn't just that you have to ignore semi-colons inside of a match operation, you have to know whether you are even in a match operation at all. Thus, while the question as you pose it may *seem* like something far less complex than actually "parsing Perl" (I just want to recognize one leading, semi-colon terminated, arbitrary statement) it really isn't at all.

    And we've competely ignored other things such as many statements without terminating semi-colons(in the same vein as Beatnik mentions):

    while(<>){ if ($. == 1){ print "Now processing $ARGV ...\n" } print if /something/ .. /something else/ } continue { close ARGV if eof }

    Which of course is rather contrived ... but still something to consider depending on what you are *really* trying to accomplish (although you did explicitly mention terminating semi-colons, so perhaps this isn't an issue for you).

    On the other hand, I seem to recall that Simon Cozens (I think) was working on a Perl parser in Perl --- but I have no idea how far that went or what became of it.

Re: Matching first Perl statement.
by Beatnik (Parson) on Jan 15, 2002 at 02:45 UTC
    I hate to be a nitpicker... but for example
    while(1) { for(1..10) { print "foo" } print "\n" }
    is valid perl code, yet fails to comply to basic rules on how statements should look like. Well ok, I cheated a bit, but it wasn't all THAT hard :)
    People have been praying for years that Damian Conway would write a perl parser in perl.... It seems to me that this would be the only, foolproof way to solve your problem.
    Ofcourse I could be mistaking... after all, there is more than one way to do it :)

    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
Re: Matching first Perl statement.
by talexb (Canon) on Jan 15, 2002 at 01:42 UTC
    I'd suggest working from the back of the statement, stripping off the trailing comment (if it exists) until you find a semi-colon. Munch the leading white space, then you're done -- for the trivial cases.

    For a full-featured, robust solution, you will most likely need to use Parse::RecDescent or something else.

    This is, of course, a dangerous question to answer because a problem is posed without much definition of how rigorous the answer should be. Caveat emptor.

    --t. alex

    "Of course, you realize that this means war." -- Bugs Bunny.

      I'd suggest working from the back of the statement, stripping off the trailing comment (if it exists) until you find a semi-colon. Munch the leading white space, then you're done -- for the trivial cases.
      If you know where the end of the statment it, then it would have been a trivial question ;) The problem is that you can't guess where the statement ends. Even stripping commments is not a trivial task. Consider the following line:

      my $foo=';'; $foo =~ s#;##g; #comments start here

      As you can see this would be tricky. I use '#' as regex delimters quite a lot. There are so many special cases in perl that you can't easily isolate things like "strip comments" or "first statement" into simple parsers.

      For the origial poster, I would suggest looking at perlTidy for some ideas.



      Simon Flack ($code or die)
      $,=reverse'"ro_';s,$,\$,;s,$,lc ref sub{},e;$,
      =~y'_"' ';eval"die";print $_,lc substr$@,0,3;
Re: Matching first Perl statement.
by Juerd (Abbot) on Jan 15, 2002 at 17:30 UTC
    Larry Wall once said: "Only perl can parse Perl", which seems to be true.
    I think there is no regex solution to this problem, and it needs a strict definition of what a statement is. (with if (foo) { bar; }, "bar" is the only statement according to some (that's what I believe), but some people say the entire block or the entire if-construct including the block is the statement. Some call every expression a statement, but I don't like thinking of $foo = $bar + 3 as 5 statements.)

    However, a great program called Perltidy carefully tokenizates Perl sources, and reformats them into a tidy output. If you have a look at their code, you know why it's so hard to parse Perl.

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Re: Matching first Perl statement.
by Albannach (Monsignor) on Jan 15, 2002 at 19:37 UTC
    Since it seems clear that a simple (or even highly complex) regex isn't going to appear and solve this (I think even japhy is a few months from building an entire Perl interpreter in a single regex ;-), I offer this trivial attempt for your amusement. At least it is an out of the box solution, but it will not work for all of the examples offered in this thread. I had some fun testing it if nothing else!
    my $firstline = (`perl -MO=Deparse $0 2>nul`)[2]; print "\nThe first line of Perl code is:\n$firstline"; # first test: $x = sin / 25 ; # /; die "Bang! I'm dead!"; $y = time / 25 ; # /; die "I'm only pretending!";
    produces:
    The first line of Perl code is: $x = sin / 25 ; # /; Bang! I'm dead! at D:\Perl\dl\debug\testsource.pl line 19.

    --
    I'd like to be able to assign to an luser

Re: Matching first Perl statement.
by rdfield (Priest) on Jan 16, 2002 at 20:53 UTC
    See if the following meets your needs:

    ($match) = $file =~ /([^\"\'\`\;]*(?:[\"\'\`]+.*[\"\'\`]+)?);/;

    rdfield

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://138702]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2021-12-06 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (32 votes). Check out past polls.

    Notices?