Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: qr/STRING/ fails with certain lookbehind expressions

by Corion (Pope)
on Jul 19, 2013 at 10:28 UTC ( #1045327=note: print w/replies, xml ) Need Help??


in reply to qr/STRING/ fails with certain lookbehind expressions

My interpretation of the behaviour is that you're running afoul of ▀ (sz-ligature) matching SS when doing a case-insensitive match. This means that Perl will try to match s "inside" of , which then results in a variable-length look-behind pattern.

This theory easily explains the first behaviour, as Perl compiles

/(?<!ss)/i

to

/(?<!▀|ss)/i

This creates a variable-length lookbehind, which is what Perl does not like.

For your next examples, each "s" might start "inside" of an (expanded) :

▀tabc SSTABC STABC

I don't know why your third example compiles when the others do not, but I blame this on the inconsistency of matching ss. My suggestion would be to try to avoid /i and instead enumerate the alternatives if possible. That would mean to expand at least the critical words starting with s to character classes matching the upper- and lower-case variant:

/(?<!s)tart/i ; /(?<![Ss])[Tt][Aa][Rr][Tt]/ ;

Maybe you can fudge things by constructing your master regular expression from even more parts:

my $not_s= qr/(?<![Ss])/; # no explicit /i here, will never match insi +de ▀ my $rest= qr/abc/i; my $pattern= "$not_s$rest"; ...

I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.

Replies are listed 'Best First'.
Re^2: qr/STRING/ fails with certain lookbehind expressions
by dave_the_m (Prior) on Jul 19, 2013 at 10:44 UTC
    I don't know why it doesn't happen for statically compiled regular expressions. Maybe this is the real bug here.
    But it does:
    $ perl5180 -e'/(?<!ss)a/i' Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1. $

    Dave.

      And in Strawberry 5.16.3.1

      C:\strawberry-perl-5.16.3.1-64bit-portable\scripts>perl -e "/(?<!ss)a/ +i" Variable length lookbehind not implemented in regex m/(?<!ss)a/ at -e +line 1.
Re^2: qr/STRING/ fails with certain lookbehind expressions
by Anonymous Monk on Jul 22, 2013 at 14:54 UTC

    thanks a lot for your quick explanations and advices. I wasn't aware that regexp would change my expression internally into something else (and I am actually not sure if I want that). But I kind of see the problem now (as beeing a German).

    The German sharp s '▀' doesn't exist as a capital letter in German writing. Since no German word starts with a sharp s, there is no need for a capital accordance. If used never the less in capital writing, it is written as 'SS'. So at least from a 'German point of view' it makes sense that the 'ss' extension is only implemented for the case-insensitive modifier.

    Still it is not quite clear to me what happens to the 'st' example:

    my $pattern = "(?<!st)abc"; # 'st' in lookbehind qr/$pattern/i; # error: 'Variable length lookbehind not imple +mented in regex...

    Whereas this one works (just like any other letter after 's' besides the combination 'ss' and 'st'):

    my $pattern = "(?<!sz)abc"; # 'sz' in lookbehind qr/$pattern/i; # this works fine

    Is there a way to display exactly the expression, that the regexp engine is using? (So, according to Corion this would be: /(?<!▀|st)abc/i in the first example). That would be very helpful.

      "st" ("st") is similar to "ss" - in certain typefaces, it is replaced by a Typographic_ligature. See also its Unicode codepoint.
      لսႽć ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045327]
help
Chatterbox?
[choroba]: just one of its angles, in fact
[LanX]: <|--░
[choroba]: Reminds me of an old joke about an orchestra...
erix imagines that in a true Lancs accent
[choroba]: so a Czech orchestra rehearses a new piece by a German composer
LanX LOL of the day ... the Turkish Anonymous Alcoholics stopped their "No to alcohol" campaign for political reasons ...
LanX omg my browser is posting again on its own ...
[LanX]: choroba: GREAT JOKE! xD

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (8)
As of 2017-03-27 11:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (319 votes). Check out past polls.