Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: qr/STRING/ fails with certain lookbehind expressions

by Corion (Pope)
on Jul 19, 2013 at 10:28 UTC ( #1045327=note: print w/ replies, xml ) Need Help??


in reply to qr/STRING/ fails with certain lookbehind expressions

My interpretation of the behaviour is that you're running afoul of ▀ (sz-ligature) matching SS when doing a case-insensitive match. This means that Perl will try to match s "inside" of , which then results in a variable-length look-behind pattern.

This theory easily explains the first behaviour, as Perl compiles

/(?<!ss)/i

to

/(?<!▀|ss)/i

This creates a variable-length lookbehind, which is what Perl does not like.

For your next examples, each "s" might start "inside" of an (expanded) :

▀tabc SSTABC STABC

I don't know why your third example compiles when the others do not, but I blame this on the inconsistency of matching ss. My suggestion would be to try to avoid /i and instead enumerate the alternatives if possible. That would mean to expand at least the critical words starting with s to character classes matching the upper- and lower-case variant:

/(?<!s)tart/i ; /(?<![Ss])[Tt][Aa][Rr][Tt]/ ;

Maybe you can fudge things by constructing your master regular expression from even more parts:

my $not_s= qr/(?<![Ss])/; # no explicit /i here, will never match insi +de ▀ my $rest= qr/abc/i; my $pattern= "$not_s$rest"; ...

I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.


Comment on Re: qr/STRING/ fails with certain lookbehind expressions
Select or Download Code
Re^2: qr/STRING/ fails with certain lookbehind expressions
by dave_the_m (Parson) on Jul 19, 2013 at 10:44 UTC
    I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.
    But it does:
    $ perl5180 -e'/(?<!ss)a/i' Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1. $

    Dave.

      And in Strawberry 5.16.3.1

      C:\strawberry-perl-5.16.3.1-64bit-portable\scripts>perl -e "/(?<!ss)a/ +i" Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1.
Re^2: qr/STRING/ fails with certain lookbehind expressions
by Anonymous Monk on Jul 22, 2013 at 14:54 UTC

    thanks a lot for your quick explanations and advices. I wasn't aware that regexp would change my expression internally into something else (and I am actually not sure if I want that). But I kind of see the problem now (as beeing a German).

    The German sharp s '▀' doesn't exist as a capital letter in German writing. Since no German word starts with a sharp s, there is no need for a capital accordance. If used never the less in capital writing, it is written as 'SS'. So at least from a 'German point of view' it makes sense that the 'ss' extension is only implemented for the case-insensitive modifier.

    Still it is not quite clear to me what happens to the 'st' example:

    my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # error: 'Variable length lookbehind not imple +mented in regex...

    Whereas this one works (just like any other letter after 's' besides the combination 'ss' and 'st'):

    my $pattern = "(?<!sz)abc"; # 'sz' in lookbehind qr/$pattern/i; # this works fine

    Is there a way to display exactly the expression, that the regexp engine is using? (So, according to Corion this would be: /(?<!▀|st)abc/i in the first example). That would be very helpful.

      "st" ("st") is similar to "ss" - in certain typefaces, it is replaced by a Typographic_ligature. See also its Unicode codepoint.
      لսႽć ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045327]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2015-07-07 13:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls