Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

regex for strings with escaped quotes

by morgon (Priest)
on Feb 12, 2019 at 15:10 UTC ( #1229800=perlquestion: print w/replies, xml ) Need Help??

morgon has asked for the wisdom of the Perl Monks concerning the following question:


I am trying to construct a regex that extracts the string beween two quotes with the complication that that string may contain backslash-escaped quotes.

To illustrate:

my $rexeg = ???? # this is what I am after my ($m1) = '"hubba bubba"' =~ $regex; print "ok\n" if $m1 eq 'hubba bubba'; # should print "ok" my ($m2) = '"hubba \"bubba\""' =~ $regex; print "ok\n" if $m2 eq 'hubba "bubba"'; # should also print "ok"
I hope that is understandable...

I tried to do this with negative lookbehinds, but I attempt failed with "Variable length lookbehind not implemented", so am looking for some help here.

Many thanks!

Replies are listed 'Best First'.
Re: regex for strings with escaped quotes
by haukex (Archbishop) on Feb 12, 2019 at 15:13 UTC
    use Regexp::Common qw/delimited/; my $str = q{ x "foo \"bar\"" y }; $str =~ /($RE{delimited}{-delim=>'"'})/; print $1, "\n"; print $RE{delimited}{-delim=>'"'}, "\n"; __END__ "foo \"bar\"" (?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))
      It somehow stops working when I use the regex directly and I cannot see why:
      my $str = q{ x "foo \"bar\"" y }; $str =~ /(?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))/; print $1, "\n"; # prints nothing
        $str =~ /(?:(?|(?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\")))/;

        It's missing the capture group that I added: /($RE{delimited}{-delim=>'"'})/

      thanks, but I forgot to mention one detail:

      I do not need this for a perl-program but for a perl5-compatible regex engine in Go.

      I there a way to print the regex that is used?

        I there a way to print the regex that is used?

        That's what the second line of output above is. Simplification of that regex is left as an exercise to the reader :-) (Update: Nevermind.)

        BTW, you can also use the -keep feature to get only the part between the quotes:

        use Regexp::Common qw/delimited/; q{ x "foo \"bar\"" y } =~ /$RE{delimited}{-delim=>'"'}{-keep}/; print $3, "\n"; # prints: foo \"bar\"

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1229800]
Approved by haukex
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (1)
As of 2023-06-04 03:04 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (17 votes). Check out past polls.