http://www.perlmonks.org?node_id=6879

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Given the string
QUERY = "SOME QUERY WITH "" (DOUBLE QUOTES)" YEEHA
What regular expression will match the entire string within the outer double quotes, plus the string after the double-quoted string? In other words, after it matches i want
SOME QUERY WITH "" (DOUBLE QUOTES)
in $1 and
YEEHA
in $2

Replies are listed 'Best First'.
Re: Double quotes within double quotes
by little_mistress (Monk) on Apr 05, 2000 at 02:40 UTC
    I'm starting to get a little timid about posting on this site but im going to anyway
    here goes

    One thing to remeber about regular expressions in perl are that the charactors "*" and "+" are "greedy". In other words they will gobble up all the charactors they possibly can befor they fail. Hence the "*?" and "+?" constructions that make them "non-greedy" or rather they stop gobbling up charactors as soon as they make a match. so here is where greedy regular expressions come in handy

    $text = "QUERY = \"SOME QUERY WITH \"\" (DOUBLE QUOTES)\" YEEHA"; $text =~ m/^(.*)"+\s(\w+)$/g; print "dollar one =$1\n\ndollar two = $2\n\n"; # $1 == QUERY = "SOME QUERY WITH "" (DOUBLE QUOTES) # $2 == YEEHA

    Hope that helped

    Just for fun try it like this and see what you get"

    $text =~ m/^(.*)?"+\s(\w+)?$/g;

    little_mistress@mainhall.com

Re: Double quotes within double quotes
by btrott (Parson) on Apr 05, 2000 at 02:01 UTC
    my $query = qq("SOME QUERY WITH "" (DOUBLE QUOTES)" YEEHA); if ($query =~ /"(.*)"(.*)/) { my $inside = $1; my $outside = $2; print $inside, "\n", $outside, "\n"; }
    This works because the regex is greedy.

    But I'm not sure if this is really what you want-- was that query merely an example, or is that really what all your queries look like?

    If it's the former, and you're trying to do something more complicated than what you describe above--for example, match balanced text--you might take a look at perlfaq6, Can I use Perl regular expressions to match balanced text?.

    Also, see Text::Balanced on CPAN. -- Ed.

Re: How can I find nested delimiters?
by ignatz (Vicar) on Aug 26, 2002 at 16:43 UTC
    Using Text::Balanced:
    #!/usr/bin/perl -w use strict; use Text::Balanced "extract_delimited"; my @queries = ( qq(QUERY = "SOME QUERY WITH "" (DOUBLE QUOTES)" YEEHA), qq(Johnson looked up and said "Pie is tasty!" People in Clarkstown + liked ""Pie"".), qq("You sir are an ""A\$\@hole""!" The chipmunks were naturally sh +ocked.), qq("""No Way!""" """Way!""") ); for (@queries) { my ($extracted, $remainder, $prefix) = extract_delimited( undef, # defaults to $_ '"', # Our chosen delimiter '[^"]*', # Allow for text before the delimiter '"'); # Escape delimiter when doubled # Strip the delimiter since Text::Balanced leaves it in $extracted =~ s/^\"(.*)\"$/$1/; print; print "\n\$prefix = '$prefix'\n"; print "\$extracted = '$extracted'\n"; print "\$remainder = '$remainder'\n\n"; }
    RETURNS:
    QUERY = "SOME QUERY WITH "" (DOUBLE QUOTES)" YEEHA $prefix = 'QUERY = ' $extracted = 'SOME QUERY WITH "" (DOUBLE QUOTES)' $remainder = ' YEEHA' Johnson looked up and said "Pie is tasty!" People in Clarkstown liked +""Pie"". $prefix = 'Johnson looked up and said ' $extracted = 'Pie is tasty!' $remainder = ' People in Clarkstown liked ""Pie"".' "You sir are an ""A$@hole""!" The chipmunks were naturally shocked. $prefix = '' $extracted = 'You sir are an ""A$@hole""!' $remainder = ' The chipmunks were naturally shocked.' """No Way!""" """Way!""" $prefix = '' $extracted = '""No Way!""' $remainder = ' """Way!"""'
    mixing double delimiters
Re: How can I find nested delimiters?
by Anonymous Monk on Sep 02, 2000 at 23:01 UTC
    $query = qq("SOME QUERY WITH "" (DOUBLE QUOTES)" YEEHA); $query =~ /"((?:[^"]|"")*)"(.*)/; print "dollar one =$1\n\ndollar two = $2\n\n";
Re: How can I find nested delimiters?
by Anonymous Monk on Jan 27, 2004 at 13:01 UTC
    Either of these will match the longest possible quoted string with only an even number of quotes inside. m#"(?:"[^"]*"|[^"]*)"#; m#"(?:("?)[^"]*\1)+"#;