http://www.perlmonks.org?node_id=1055861

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

This qr[(?<FQ>"|')[^\k<FQ>]+\k<FQ>] fails with Unrecognized escape \k in character class passed through in regex.

I can't use ("|')[^\1]+\1 because this regex will be embedded into bigger regexes that may have their own captures.

Workarounds?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Named capture backreferences cannot be used in character classes?
by moritz (Cardinal) on Sep 26, 2013 at 17:38 UTC

    One workaround is to use a negative look-ahead plus a dot instead of a negated character class:

    use strict; use warnings; use 5.010; for (qw/'abc' "abc" 'abc"/) { if (/(?<FQ>['"])(?<content>(?:(?!\k<FQ>).)+)\k<FQ>/s) { say $+{content}; } else { say "Not matched: $_"; } } __END__ abc abc Not matched: 'abc"

      Perfect. Thank you.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Probably not worth it, but fewer negative look-aheads:

      my $re = qr/ (?<FQ>['"]) # Capture starting quote. (?<content> # Capture content. (?: (?>[^'"]*) # Match non-quotes, don't backtrack. (?(?!\k<FQ>).) # Match opposite quote. )* # Repeat. ) \k<FQ> # Match ending quote. /x;
Re: Named capture backreferences cannot be used in character classes?
by hdb (Monsignor) on Sep 26, 2013 at 17:17 UTC
    qr[(?:"[^"]+"|'[^']+')]

    For this example, that is shorter than your regex.

    UPDATE: and you can automatically generate this:

    my @class = ( "'", '"' ); my $re = "[(?:".join("|",map{"${_}[^$_]+$_"}@class).")]";
      that is shorter than your regex.

      Except I need to retain knowledge of the quote type, so: qr[(?<FQ>")[^"]+"|(?<FQ>')[^']+']; but that's fine.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Named capture backreferences cannot be used in character classes?
by davido (Cardinal) on Sep 26, 2013 at 17:05 UTC

    If it is only ' and ", you don't necessarily have to capture:

    m/(?(?=Z|Y)[Z]|[Y])/

    That's unsightly enough that I would probably want to define it as a named subpattern using:

    (?(DEFINE) (?<NAME_PAT>....) )

    *sigh* I posted too quickly... each alternate pattern would need to be tested individually, I think:

    print "Yes" if $string =~ m/(?(?=')'[^']|) (?(?=")"[^"]|)/x;

    Update:

    And even that simplifies to:

    m/(?:'[^']|"[^"])/

    ...so I'm sure the actual usage must be more complex. :) My apologies. heh.


    Dave

Re: Named capture backreferences cannot be used in character classes?
by kcott (Archbishop) on Sep 26, 2013 at 20:55 UTC

    G'day BrowserUk,

    You can use a postponed subexpression replacing [^\k<FQ>]+ with (??{"[^$+{FQ}]+"}). It's flagged as "experimental" - that may affect your choice to use it. Here's my test:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my @test_strings = qw{""" "'" '"' ''' ""' "'' '"" ''"}; my $re = qr{ (?<FQ>"|') (??{ "[^$+{FQ}]+" }) \k<FQ> }x; for (@test_strings) { say "Testing [$_] : ", (/$re/ ? '' : 'no '), 'match.'; }

    Output:

    $ pm_re_kname_in_charclass.pl Testing ["""] : no match. Testing ["'"] : match. Testing ['"'] : match. Testing ['''] : no match. Testing [""'] : no match. Testing ["''] : no match. Testing ['""] : no match. Testing [''"] : no match.

    -- Ken

      You can use a postponed subexpression replacing It's flagged as "experimental" - that may affect your choice to use it.

      That would certainly work -- and I don't have a problem with it being "experimental"; as far as I have observed it hasn't changed in all the years it has been there and I'm already having to where I need recursive regex -- but I prefer to avoid it if there is a less performance sapping alternative.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        "... I don't have a problem with it being "experimental"; as far as I have observed it hasn't changed in all the years it has been there ..."

        When I read that this afternoon, that was my understanding too. However, poking aroung in perl5180delta some hours later (on a completely unrelated matter), I found: "/(?{})/ and /(??{})/ have been heavily reworked".

        That's really just an FYI, if you're interested. It's still marked "experimental" and I saw nothing to indicate any specific performance enhancements.

        -- Ken

        They changed at one point; IIRC it used to be something like (?p{...}); I think Ilya originally envisioned it as a limited scope version of a hypothetical qr/foo$bar/p that would at match time use the current value of $bar.