Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

regex for strings with escaped quotes

by morgon (Priest)
on Feb 12, 2019 at 15:10 UTC ( #1229800=perlquestion: print w/replies, xml ) Need Help??

morgon has asked for the wisdom of the Perl Monks concerning the following question:


I am trying to construct a regex that extracts the string beween two quotes with the complication that that string may contain backslash-escaped quotes.

To illustrate:

my $rexeg = ???? # this is what I am after my ($m1) = '"hubba bubba"' =~ $regex; print "ok\n" if $m1 eq 'hubba bubba'; # should print "ok" my ($m2) = '"hubba \"bubba\""' =~ $regex; print "ok\n" if $m2 eq 'hubba "bubba"'; # should also print "ok"
I hope that is understandable...

I tried to do this with negative lookbehinds, but I attempt failed with "Variable length lookbehind not implemented", so am looking for some help here.

Many thanks!

Replies are listed 'Best First'.
Re: regex for strings with escaped quotes
by haukex (Bishop) on Feb 12, 2019 at 15:13 UTC
    use Regexp::Common qw/delimited/; my $str = q{ x "foo \"bar\"" y }; $str =~ /($RE{delimited}{-delim=>'"'})/; print $1, "\n"; print $RE{delimited}{-delim=>'"'}, "\n"; __END__ "foo \"bar\"" (?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))
      It somehow stops working when I use the regex directly and I cannot see why:
      my $str = q{ x "foo \"bar\"" y }; $str =~ /(?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))/; print $1, "\n"; # prints nothing
        $str =~ /(?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))/;

        It's missing the capture group that I added: /($RE{delimited}{-delim=>'"'})/

      thanks, but I forgot to mention one detail:

      I do not need this for a perl-program but for a perl5-compatible regex engine in Go.

      I there a way to print the regex that is used?

        I there a way to print the regex that is used?

        That's what the second line of output above is. Simplification of that regex is left as an exercise to the reader :-) (Update: Nevermind.)

        BTW, you can also use the -keep feature to get only the part between the quotes:

        use Regexp::Common qw/delimited/; q{ x "foo \"bar\"" y } =~ /$RE{delimited}{-delim=>'"'}{-keep}/; print $3, "\n"; # prints: foo \"bar\"

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1229800]
Approved by haukex
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2020-08-14 01:17 GMT
Find Nodes?
    Voting Booth?
    Which rocket would you take to Mars?

    Results (75 votes). Check out past polls.