No such thing as a small change

Re: About \d \w and \s

by ambrus (Abbot)
on Oct 19, 2009 at 08:59 UTC

in reply to About \d \w and \s

To clarify things, does the unicode variant treat byte strings as if they were iso-8859-1 encoded? (There's also the question of how the use locale variant treats character strings, currently it assumes the string was accidentally iso-8859-1 decoded except where it has characters with code higher than 255, but it's probably always an error to actually depend on this so it doesn't matter.)

Strangely, it seems I don't have any obfus that use syntax like m/foobar/and (the closest I have is y//or in Ode for getprotobyname) so for a change this will be a new feature of perl core that does not break any of my obfus.

Re^2: About \d \w and \s
by demerphq (Chancellor) on Oct 19, 2009 at 21:54 UTC

    If I remember what iso-8859-1 is then i think so yes. In simple terms the rules will be those of unicode even tho the representation of the codepoints is bytes. In other words the matching would behave the same as would occur if you did a utf8::upgrade() on it before the match.

    How the regex engine works under use local will not be changed, except that it won't be "all or nothing", you will be able to turn it for sections of a pattern. I dont pretend to understand the use locale mode and I dont plan to do much with it. (Id like it if use locale "went away" actually.)


