Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
{from an alt.perl post I just made, reposted here to solicit feedback from fellow monks...}

>>>>> "Makhno" == Makhno <> writes: Makhno> I'm thinking of writing a GUI Perl-syntax-aware editor, and Makhno> wondering what's the best way to parse perl? Highlighting Makhno> reserved words is easy (using, eg, index()) but indentifying Makhno> things like comments is a bit more difficult. Makhno> A regex like /#.*\n/ will catch comments when they are used Makhno> simply, ie: Makhno> print "hello\n"; #print hello Makhno> but will get it wrong when the '#' is used as part of a regex Makhno> (or in a string) Makhno> s#hello#goodbye#; Makhno> print "will behave like a #comment"; Makhno> Does anybody have any ideas on how I go about parsing perl Makhno> syntax in such a way, before I go to a lot of potentially Makhno> unnecessary work?
Perl is extremely difficult to parse. In fact, some would say impossible.

One thing that makes it difficult is the dual nature of a half dozen characters like "/". If that / is being used in a place that's expecting an operator, it's divide. If it's being used in a place that's expecting an operand, it's the beginning of a regular expression. So you have to keep track at all times of whether you're looking for an operator or an operand.

"No problem", you say? Quick... for the following, play the game of "regex or divide?"

sin / ... time / ... localtime / ... caller / ... eof / ...
Got those right? How about these?
use constant FOO => 35; FOO / ... use Fcntl qw(LOCK_SH); LOCK_SH / ...
OK, and now some of your own:
sub no_args (); sub one_arg ($); sub normal (@); no_args / ... one_arg / ... normal / ...
Got those too? How about these (same problem, different file):
use Random::Module qw(aaa bbb ccc); aaa / ... bbb / ... ccc / ...
A little harder, eh? So now you have to parse OUTSIDE the file to get your answer. And as if that wasn't enough, let's get weird:
BEGIN { eval (time % 2 ? 'sub zany ();' : 'sub zany (@);'); } zany / ...
Quick, was that last one a divide or a regex start?

Why does it matter? Look at this:

sin / 25 ; # / ; die "this dies!"; time / 25 ; # / ; die "this doesn't die";
The first one is computing the sin of the true/false value gotten by matching " 25 ; # " against $_. Then it dies. The second one is computing the time of day divided by 25, then ignoring the comment.

Starting to see the trouble?

This leads people to say "the only thing which can parse Perl (the language) is perl (the binary)". Maybe not for Perl6. But for the Perl we know and can use today, certainly so.

-- Randal L. Schwartz, Perl hacker

In reply to On Parsing Perl by merlyn

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [usemodperl]: greetings
    [usemodperl]: question
    [usemodperl]: perldoc -v 'ARGV' works but perldoc -v "STDIN' does not, but they are both filehandles. is there a perldoc for STDIN?
    [usemodperl]: (typo not withstanding :)

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (4)
    As of 2018-06-19 19:15 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (114 votes). Check out past polls.