Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

qr/STRING/ fails with certain lookbehind expressions

by wiewa (Initiate)
on Jul 19, 2013 at 10:14 UTC ( #1045325=perlquestion: print w/ replies, xml ) Need Help??
wiewa has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I just run into quite an odd regex problem: I tried to compile a regex pattern with 'qr' which is case insensitive and contains a lookbehind assertion. If I have certain char combinations in my lookbehind expression, the regex engine throws the error:

Variable length lookbehind not implemented in regex...

Here some examples:

my $pattern = "(?<!ss)abc"; # 'ss' in lookbehind qr/$pattern/i; # throws the error my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # throws the error my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/; # this works fine since modifier 'i' is not + used my $pattern = "(?<!s)abc"; # 's' in lookbehind qr/$pattern/i; # this works fine my $pattern = "(?<!s[st])abc"; # 's[st]' in lookbehind qr/$pattern/i; # this works, whereas these fail: (?<! +s[s]),(?<!s[t])

All patterns do what I'd expect if I don't compiled them.

Seems like the regex engine is interpreting the 'st' or 'ss' string as some meta-command, since no quantifier is used, that would cause a variable length.

I am using a new version of perl (v5.18.0). This problem did not occur in version v5.10.1.

Can anybody explain what is happening here? And is it maybe a known bug in perl version v5.18.0?

Thanks, WW

Comment on qr/STRING/ fails with certain lookbehind expressions
Select or Download Code
Replies are listed 'Best First'.
Re: qr/STRING/ fails with certain lookbehind expressions
by Corion (Pope) on Jul 19, 2013 at 10:28 UTC

    My interpretation of the behaviour is that you're running afoul of ▀ (sz-ligature) matching SS when doing a case-insensitive match. This means that Perl will try to match s "inside" of , which then results in a variable-length look-behind pattern.

    This theory easily explains the first behaviour, as Perl compiles

    /(?<!ss)/i

    to

    /(?<!▀|ss)/i

    This creates a variable-length lookbehind, which is what Perl does not like.

    For your next examples, each "s" might start "inside" of an (expanded) :

    ▀tabc SSTABC STABC

    I don't know why your third example compiles when the others do not, but I blame this on the inconsistency of matching ss. My suggestion would be to try to avoid /i and instead enumerate the alternatives if possible. That would mean to expand at least the critical words starting with s to character classes matching the upper- and lower-case variant:

    /(?<!s)tart/i ; /(?<![Ss])[Tt][Aa][Rr][Tt]/ ;

    Maybe you can fudge things by constructing your master regular expression from even more parts:

    my $not_s= qr/(?<![Ss])/; # no explicit /i here, will never match insi +de ▀ my $rest= qr/abc/i; my $pattern= "$not_s$rest"; ...

    I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.

      I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.
      But it does:
      $ perl5180 -e'/(?<!ss)a/i' Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1. $

      Dave.

        And in Strawberry 5.16.3.1

        C:\strawberry-perl-5.16.3.1-64bit-portable\scripts>perl -e "/(?<!ss)a/ +i" Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1.

      thanks a lot for your quick explanations and advices. I wasn't aware that regexp would change my expression internally into something else (and I am actually not sure if I want that). But I kind of see the problem now (as beeing a German).

      The German sharp s '▀' doesn't exist as a capital letter in German writing. Since no German word starts with a sharp s, there is no need for a capital accordance. If used never the less in capital writing, it is written as 'SS'. So at least from a 'German point of view' it makes sense that the 'ss' extension is only implemented for the case-insensitive modifier.

      Still it is not quite clear to me what happens to the 'st' example:

      my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # error: 'Variable length lookbehind not imple +mented in regex...

      Whereas this one works (just like any other letter after 's' besides the combination 'ss' and 'st'):

      my $pattern = "(?<!sz)abc"; # 'sz' in lookbehind qr/$pattern/i; # this works fine

      Is there a way to display exactly the expression, that the regexp engine is using? (So, according to Corion this would be: /(?<!▀|st)abc/i in the first example). That would be very helpful.

        "st" ("st") is similar to "ss" - in certain typefaces, it is replaced by a Typographic_ligature. See also its Unicode codepoint.
        لսႽć ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: qr/STRING/ fails with certain lookbehind expressions
by kcott (Abbot) on Jul 19, 2013 at 11:03 UTC
Re: qr/STRING/ fails with certain lookbehind expressions
by farang (Chaplain) on Jul 19, 2013 at 12:38 UTC

    The desired behavior can be forced by using //iaa

    use v5.18.0; use warnings; my $pattern = "(?<!ss)abc"; my $regex = qr/$pattern/iaa; say 'ok' if 'ssqabc' =~ $regex; say 'ok' if 'ssabc' !~ $regex; say 'ok' if 'ss▀abc' =~ $regex;

    From perlre:

    To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"), specify the "a" twice, for example "/aai" or "/aia". (The first occurrence of "a" restricts the "\d", etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for "/i" matching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.

Re: qr/STRING/ fails with certain lookbehind expressions
by Laurent_R (Monsignor) on Jul 19, 2013 at 18:46 UTC

    I do not know if this can help, but I do not get any error or warning message on Perl 5.14 under Cygwin.

    $ perl -e'/(?<!ss)a/i'
      me neither, under linux.

      I am getting the error under perls 5.16.0 - 5.19.1

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045325]
Approved by hdb
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2015-07-30 02:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls