Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

qr/STRING/ fails with certain lookbehind expressions

by wiewa (Initiate)
on Jul 19, 2013 at 10:14 UTC ( #1045325=perlquestion: print w/ replies, xml ) Need Help??
wiewa has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I just run into quite an odd regex problem: I tried to compile a regex pattern with 'qr' which is case insensitive and contains a lookbehind assertion. If I have certain char combinations in my lookbehind expression, the regex engine throws the error:

Variable length lookbehind not implemented in regex...

Here some examples:

my $pattern = "(?<!ss)abc"; # 'ss' in lookbehind qr/$pattern/i; # throws the error my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # throws the error my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/; # this works fine since modifier 'i' is not + used my $pattern = "(?<!s)abc"; # 's' in lookbehind qr/$pattern/i; # this works fine my $pattern = "(?<!s[st])abc"; # 's[st]' in lookbehind qr/$pattern/i; # this works, whereas these fail: (?<! +s[s]),(?<!s[t])

All patterns do what I'd expect if I don't compiled them.

Seems like the regex engine is interpreting the 'st' or 'ss' string as some meta-command, since no quantifier is used, that would cause a variable length.

I am using a new version of perl (v5.18.0). This problem did not occur in version v5.10.1.

Can anybody explain what is happening here? And is it maybe a known bug in perl version v5.18.0?

Thanks, WW

Comment on qr/STRING/ fails with certain lookbehind expressions
Select or Download Code
Re: qr/STRING/ fails with certain lookbehind expressions
by Corion (Pope) on Jul 19, 2013 at 10:28 UTC

    My interpretation of the behaviour is that you're running afoul of ▀ (sz-ligature) matching SS when doing a case-insensitive match. This means that Perl will try to match s "inside" of , which then results in a variable-length look-behind pattern.

    This theory easily explains the first behaviour, as Perl compiles

    /(?<!ss)/i

    to

    /(?<!▀|ss)/i

    This creates a variable-length lookbehind, which is what Perl does not like.

    For your next examples, each "s" might start "inside" of an (expanded) :

    ▀tabc SSTABC STABC

    I don't know why your third example compiles when the others do not, but I blame this on the inconsistency of matching ss. My suggestion would be to try to avoid /i and instead enumerate the alternatives if possible. That would mean to expand at least the critical words starting with s to character classes matching the upper- and lower-case variant:

    /(?<!s)tart/i ; /(?<![Ss])[Tt][Aa][Rr][Tt]/ ;

    Maybe you can fudge things by constructing your master regular expression from even more parts:

    my $not_s= qr/(?<![Ss])/; # no explicit /i here, will never match insi +de ▀ my $rest= qr/abc/i; my $pattern= "$not_s$rest"; ...

    I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.

      I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.
      But it does:
      $ perl5180 -e'/(?<!ss)a/i' Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1. $

      Dave.

        And in Strawberry 5.16.3.1

        C:\strawberry-perl-5.16.3.1-64bit-portable\scripts>perl -e "/(?<!ss)a/ +i" Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1.

      thanks a lot for your quick explanations and advices. I wasn't aware that regexp would change my expression internally into something else (and I am actually not sure if I want that). But I kind of see the problem now (as beeing a German).

      The German sharp s '▀' doesn't exist as a capital letter in German writing. Since no German word starts with a sharp s, there is no need for a capital accordance. If used never the less in capital writing, it is written as 'SS'. So at least from a 'German point of view' it makes sense that the 'ss' extension is only implemented for the case-insensitive modifier.

      Still it is not quite clear to me what happens to the 'st' example:

      my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # error: 'Variable length lookbehind not imple +mented in regex...

      Whereas this one works (just like any other letter after 's' besides the combination 'ss' and 'st'):

      my $pattern = "(?<!sz)abc"; # 'sz' in lookbehind qr/$pattern/i; # this works fine

      Is there a way to display exactly the expression, that the regexp engine is using? (So, according to Corion this would be: /(?<!▀|st)abc/i in the first example). That would be very helpful.

        "st" ("st") is similar to "ss" - in certain typefaces, it is replaced by a Typographic_ligature. See also its Unicode codepoint.
        لսႽć ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: qr/STRING/ fails with certain lookbehind expressions
by kcott (Abbot) on Jul 19, 2013 at 11:03 UTC
Re: qr/STRING/ fails with certain lookbehind expressions
by farang (Hermit) on Jul 19, 2013 at 12:38 UTC

    The desired behavior can be forced by using //iaa

    use v5.18.0; use warnings; my $pattern = "(?<!ss)abc"; my $regex = qr/$pattern/iaa; say 'ok' if 'ssqabc' =~ $regex; say 'ok' if 'ssabc' !~ $regex; say 'ok' if 'ss▀abc' =~ $regex;

    From perlre:

    To forbid ASCII/non-ASCII matches (like "k" with "\N{KELVIN SIGN}"), specify the "a" twice, for example "/aai" or "/aia". (The first occurrence of "a" restricts the "\d", etc., and the second occurrence adds the "/i" restrictions.) But, note that code points outside the ASCII range will use Unicode rules for "/i" matching, so the modifier doesn't really restrict things to just ASCII; it just forbids the intermixing of ASCII and non-ASCII.

Re: qr/STRING/ fails with certain lookbehind expressions
by Laurent_R (Parson) on Jul 19, 2013 at 18:46 UTC

    I do not know if this can help, but I do not get any error or warning message on Perl 5.14 under Cygwin.

    $ perl -e'/(?<!ss)a/i'
      me neither, under linux.

      I am getting the error under perls 5.16.0 - 5.19.1

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1045325]
Approved by hdb
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2014-09-21 06:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (167 votes), past polls