Re: Named capture backreferences cannot be used in character classes?
by moritz (Cardinal) on Sep 26, 2013 at 17:38 UTC
|
use strict;
use warnings;
use 5.010;
for (qw/'abc' "abc" 'abc"/) {
if (/(?<FQ>['"])(?<content>(?:(?!\k<FQ>).)+)\k<FQ>/s) {
say $+{content};
}
else {
say "Not matched: $_";
}
}
__END__
abc
abc
Not matched: 'abc"
| [reply] [d/l] |
|
Perfect. Thank you.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
my $re = qr/
(?<FQ>['"]) # Capture starting quote.
(?<content> # Capture content.
(?:
(?>[^'"]*) # Match non-quotes, don't backtrack.
(?(?!\k<FQ>).) # Match opposite quote.
)* # Repeat.
)
\k<FQ> # Match ending quote.
/x;
| [reply] [d/l] |
Re: Named capture backreferences cannot be used in character classes?
by hdb (Monsignor) on Sep 26, 2013 at 17:17 UTC
|
qr[(?:"[^"]+"|'[^']+')]
For this example, that is shorter than your regex.
UPDATE: and you can automatically generate this:
my @class = ( "'", '"' );
my $re = "[(?:".join("|",map{"${_}[^$_]+$_"}@class).")]";
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
Re: Named capture backreferences cannot be used in character classes?
by davido (Cardinal) on Sep 26, 2013 at 17:05 UTC
|
If it is only ' and ", you don't necessarily have to capture:
m/(?(?=Z|Y)[Z]|[Y])/
That's unsightly enough that I would probably want to define it as a named subpattern using:
(?(DEFINE)
(?<NAME_PAT>....)
)
*sigh* I posted too quickly... each alternate pattern would need to be tested individually, I think:
print "Yes"
if $string =~ m/(?(?=')'[^']|)
(?(?=")"[^"]|)/x;
Update:
And even that simplifies to:
m/(?:'[^']|"[^"])/
...so I'm sure the actual usage must be more complex. :) My apologies. heh.
| [reply] [d/l] [select] |
Re: Named capture backreferences cannot be used in character classes?
by kcott (Archbishop) on Sep 26, 2013 at 20:55 UTC
|
G'day BrowserUk,
You can use a postponed subexpression replacing [^\k<FQ>]+ with (??{"[^$+{FQ}]+"}).
It's flagged as "experimental" - that may affect your choice to use it.
Here's my test:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
my @test_strings = qw{""" "'" '"' ''' ""' "'' '"" ''"};
my $re = qr{ (?<FQ>"|') (??{ "[^$+{FQ}]+" }) \k<FQ> }x;
for (@test_strings) {
say "Testing [$_] : ", (/$re/ ? '' : 'no '), 'match.';
}
Output:
$ pm_re_kname_in_charclass.pl
Testing ["""] : no match.
Testing ["'"] : match.
Testing ['"'] : match.
Testing ['''] : no match.
Testing [""'] : no match.
Testing ["''] : no match.
Testing ['""] : no match.
Testing [''"] : no match.
| [reply] [d/l] [select] |
|
| [reply] |
|
"... I don't have a problem with it being "experimental"; as far as I have observed it hasn't changed in all the years it has been there ..."
When I read that this afternoon, that was my understanding too.
However, poking aroung in perl5180delta some hours later (on a completely unrelated matter), I found: "/(?{})/ and /(??{})/ have been heavily reworked".
That's really just an FYI, if you're interested.
It's still marked "experimental" and I saw nothing to indicate any specific performance enhancements.
| [reply] |
|
|
They changed at one point; IIRC it used to be something like (?p{...}); I think Ilya originally envisioned it as a limited scope version of a hypothetical qr/foo$bar/p that would at match time use the current value of $bar.
| [reply] [d/l] [select] |
|
|