http://www.perlmonks.org?node_id=485933

suaveant has asked for the wisdom of the Perl Monks concerning the following question:

Argh, it's driving me nuts.

I am using Parse::RecDescent to parse a meta language, and it is pretty much working, but I am having trouble with quoted strings. I have searched all around and can't find any good documentation on it.

In the P::RD docs it shows a parser for parsing P::RD grammars, and in that it shows { extract_quotelike($text) } being used as a rule. This doesn't do exactly what I want, since I only want ' and ", but it was close enough. I put it in my grammar like so:

string: { $_ = extract_quotelike($text); chop; substr($_,0,1,''); $_; +} { [@item] }
and it works, I realize its probably not right, especially since I get lots of warnings for use of uninitialized values. When I tried to change my code to something more proper and kill the warnings, however, my grammar no longer works. I tried:
string: { extract_quotelike($text) }
and it changes the way the whole grammar works

In fact if I change it to:

string: { $_ = extract_quotelike($text); if($_) { chop; substr($_,0,1, +''); } $_; } { [@item] }
it breaks it as well... which really confuses me.

So, my real question is in two parts
1) how do you use the extract_quotelike in P::RD (I looked all through the docs for an explanation of what rule: { <code> } is supposed to do, but I can't find it. Is it undocumented or am I blind?)
2) Is there a better way to match a string with just single or double quotes?

                - Ant
                - Some of my best work - (1 2 3)

Replies are listed 'Best First'.
Re: Quoted Text rule for Parse::RecDescent?
by halley (Prior) on Aug 23, 2005 at 15:14 UTC
    If you don't want anything but ' and " strings, then just fail the rule if $item[...][1] ne "'" and $item[...][1] ne '"'.

    In my application, I allow qr// and ' and ", but not others like qq//, qx//, s/// or tr///. In qr//, I scan for any use of (?{...}) and fail for those. I just bring it up to show that you can choose your rules pretty flexibly.

    (Expanded answer.)

    As for "how to use the code blocks in a production," just realize that there's no real difference between the parsing parts and the code parts: if you get through all of the subrules in the production, then the whole production succeeds and passes whatever is in $return up to the calling rule. The code blocks in YACC are very distinct, but code blocks in P::RD are more like embedded (?{}) code in a regexp. (That was my revelation thanks to another monk a few weeks ago.)

    So, do whatever you want in a code block. The items so far are in @item and %item. You can assign whatever you want to assign into $return, including undef to fail the rule.

    string: <perl_quotelike> { if ($item[1][0] =~ /^(qx|s|tr|y)$/ or # no perl commands $item[1][1] eq '`' or # no backticks $item[1][2] =~ /\(\?\??\{/) # no qr'' with code { $return = undef } else { $return = eval "" . join('', @{$item[1]}) } (defined $return) }

    --
    [ e d @ h a l l e y . c c ]

      ahhh... that helps. Though it seems a little more efficient to check
      if(!$item[1][0] && $item[1][1] eq '"' || $item[1][1] eq "'")) {

      Thanks!

                      - Ant
                      - Some of my best work - (1 2 3)

Re: Quoted Text rule for Parse::RecDescent?
by ikegami (Patriarch) on Aug 23, 2005 at 17:18 UTC

    Keep in mind the last item in the production is the one that determines the returned value of the entire production.

    Also keep in mind that extract_quotelike behaves different in list context compared to scalar context.

     

    The following will do the trick:

    { # Place this block at the top of the grammar, before any rules. use Text::Balanced qw( extract_quotelike ); } string: { [ $item[0], scalar extract_quotelike($text) ] }

    Of course, that leaves the quotes in. If you want to remove the quotes, the following will work:

    { # Place this block at the top of the grammar, before any rules. sub dequote_double { local $_ = (@_ ? $_[0] : $_); $_ = substr($_, 1, -1); s/\\(.)/$1/sg; } # Like in Perl. # Unlike bash. sub dequote_single { local $_ = (@_ ? $_[0] : $_); $_ = substr($_, 1, -1); s/\\(['\\])/$1/sg; } } string : /"(?:[^\\"]|\\.)*"/ {[ $item[0], dequote_double($item[1]) ]} | /'(?:[^\\']|\\.)*'/ {[ $item[0], dequote_single($item[1]) ]}

    Fortunately and unfortunately, it doesn't parse `, q, qq, s, tr, y and here-docs like extract-quotelike. If you want it to parse any or all of those, reply to this node and I'll provide.

     

    By the way, don't use $_ without localizing *_ or at least $_ first. You risk clobbering something in the caller. (for/foreach localize this for you, but not while.)

    Update: I pretty much rewrote this node a few times, trying to make my thoughts coherent.

Re: Quoted Text rule for Parse::RecDescent?
by TheDamian (Vicar) on Aug 23, 2005 at 20:39 UTC
      Ahh... I saw that, but I had some trouble figuring out how to limit it to ' and ", thanks.

                      - Ant
                      - Some of my best work - (1 2 3)

        Presumably now you've worked out that, to restrict it to quoted texts, you need something like:
        string: <perl_quotelike> { my ($marker, $quote, $text) = @{$item[0]}[0..2]; !marker && $quote =~ /['"]/ ? $text : undef }