Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Perl Monk, Perl Meditation
 
PerlMonks  

Re: qr/STRING/ fails with certain lookbehind expressions

by Corion (Pope)
on Jul 19, 2013 at 10:28 UTC ( #1045327=note: print w/ replies, xml ) Need Help??


in reply to qr/STRING/ fails with certain lookbehind expressions

My interpretation of the behaviour is that you're running afoul of ▀ (sz-ligature) matching SS when doing a case-insensitive match. This means that Perl will try to match s "inside" of , which then results in a variable-length look-behind pattern.

This theory easily explains the first behaviour, as Perl compiles

/(?<!ss)/i

to

/(?<!▀|ss)/i

This creates a variable-length lookbehind, which is what Perl does not like.

For your next examples, each "s" might start "inside" of an (expanded) :

▀tabc SSTABC STABC

I don't know why your third example compiles when the others do not, but I blame this on the inconsistency of matching ss. My suggestion would be to try to avoid /i and instead enumerate the alternatives if possible. That would mean to expand at least the critical words starting with s to character classes matching the upper- and lower-case variant:

/(?<!s)tart/i ; /(?<![Ss])[Tt][Aa][Rr][Tt]/ ;

Maybe you can fudge things by constructing your master regular expression from even more parts:

my $not_s= qr/(?<![Ss])/; # no explicit /i here, will never match insi +de ▀ my $rest= qr/abc/i; my $pattern= "$not_s$rest"; ...

I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.


Comment on Re: qr/STRING/ fails with certain lookbehind expressions
Select or Download Code
Re^2: qr/STRING/ fails with certain lookbehind expressions
by dave_the_m (Parson) on Jul 19, 2013 at 10:44 UTC
    I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.
    But it does:
    $ perl5180 -e'/(?<!ss)a/i' Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1. $

    Dave.

      And in Strawberry 5.16.3.1

      C:\strawberry-perl-5.16.3.1-64bit-portable\scripts>perl -e "/(?<!ss)a/ +i" Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1.
Re^2: qr/STRING/ fails with certain lookbehind expressions
by Anonymous Monk on Jul 22, 2013 at 14:54 UTC

    thanks a lot for your quick explanations and advices. I wasn't aware that regexp would change my expression internally into something else (and I am actually not sure if I want that). But I kind of see the problem now (as beeing a German).

    The German sharp s '▀' doesn't exist as a capital letter in German writing. Since no German word starts with a sharp s, there is no need for a capital accordance. If used never the less in capital writing, it is written as 'SS'. So at least from a 'German point of view' it makes sense that the 'ss' extension is only implemented for the case-insensitive modifier.

    Still it is not quite clear to me what happens to the 'st' example:

    my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # error: 'Variable length lookbehind not imple +mented in regex...

    Whereas this one works (just like any other letter after 's' besides the combination 'ss' and 'st'):

    my $pattern = "(?<!sz)abc"; # 'sz' in lookbehind qr/$pattern/i; # this works fine

    Is there a way to display exactly the expression, that the regexp engine is using? (So, according to Corion this would be: /(?<!▀|st)abc/i in the first example). That would be very helpful.

      "st" ("st") is similar to "ss" - in certain typefaces, it is replaced by a Typographic_ligature. See also its Unicode codepoint.
      لսႽć ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045327]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2014-04-20 02:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls