Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
laziness, impatience, and hubris
 
PerlMonks  

Problems with unicode properties in regular expressions under chroot

by sgifford (Prior)
on May 10, 2013 at 06:10 UTC ( #1032892=perlquestion: print w/ replies, xml ) Need Help??
sgifford has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

Recently I was working on moving a program of mine based on Net::DNS::Nameserver to a new machine, upgrading from Debian 5 (Lenny) to Debian 6 (Squeeze). This upgraded Perl a little (5.10.0 to 5.10.1), along with various core modules, and I also installed the latest Net::DNS::Nameserver (upgrading from 749 to 1096).

The program runs in a chroot environment, and provides a simple DNS service.

After upgrading and getting the system running, I started seeing mysterious failures. In particular, timestamps couldn't be parsed from the config file. The problems went away when I didn't run under chroot.

After some detective work, I found these system calls happening after the server had started up:

open("/usr/share/perl/5.10/unicore/lib/gc_sc/SpacePer.pl", O_RDONLY) = + 5 open("/usr/share/perl/5.10/unicore/lib/gc_sc/Digit.pl", O_RDONLY) = 5

It looks like when it was running under chroot, Perl could not find these unicode files, and so without giving any kind of clear error, silently misinterpreted basic Perl regular expressions (in particular \D).

To work around this, I can try and force Perl to use all of the properties it may need later before it does chroot, since once they are loaded they will not be loaded again. I'm using some code like this:

# Bootstrap some dynamically loaded utf8 stuff. { my $str="8 foo"; utf8::upgrade($str); $str =~ /\p{Digit}/; $str =~ /\s/; $str = lc $str; }

That seems to work well enough for now, but my code depends on various modules, and it's hard to know whether they will eventually try to load a unicode property that would cause another mysterious failure. And of course it's very depressing.

So my question is, is there a way to preload all unicode properties so that I don't have to worry about this? Or maybe a way to turn off the dynamic properties and have it use the built-in defaults? Or at least a way to get a clean failure when it can't find one of these properties, instead of mysterious misbehavior? Or any other suggestion for dealing with this problem?

Thanks!

Comment on Problems with unicode properties in regular expressions under chroot
Select or Download Code
Re: Problems with unicode properties in regular expressions under chroot (install)
by Anonymous Monk on May 10, 2013 at 06:59 UTC

    So my question is, is there a way to preload all unicode properties so that I don't have to worry about this?

    Probably, but I wouldn't look to figure out what it is, I would install perl/modules/everything-you-need under the chroot jail, so it works like regular perl. See links for chroot setup

    Or at least a way to get a clean failure when it can't find one of these properties, instead of mysterious misbehavior?

    Well, AFAIK, even the buggy perl-5.10.x ought to give an error when a particular needed unicore file is missing, so you could try upgrading? Or writing a minimal testcase and submitting it using perlbug?

      Thanks for your thoughts!

      The problem with having a separate installation of Perl for every chroot program on your system is that maintainability becomes difficult. In particular, instead of relying on your distribution to let you know when there are Perl-related security updates available, now you need a way to track all of those installations for security updates yourself. In my experience the likelihood of getting that wrong outweighs the security advantages of using chroot to begin with.

      At any rate, most chroot programs don't require large installations of software systems and libraries to work. For example, many programs chroot into /var/empty, so they have access to nothing at all. They just make sure to load up everything they need beforehand.

      One of the reasons I like to use Perl is that generally I can follow this strategy: load all the resources up front, chroot into a minimal environment, then be confident that my security risks are minimized. This particular program has run that way for several years without any issues.

      Really, what I would like to do is find a way to load all of that unicode stuff up front, or else disable it for this program.

        Ok, let's see here :) looking through my stuff I find expand unicode property (eg \p{Print}) to regex character class range so this seems to work

        $ perl -Mutf8 -le " utf8->SWASHNEW(q/Print/) ; print for %INC"

        A cleaner (no warnings) version seems to be

        $ perl -le " qr{\P{Print}}; print for %INC; "

        So you might grab perluniprops and qr// up a storm or File::Find and require up a storm

        Anyway you look at it it's all kludges -- there needs to be an official API for this

        preload_unicore (); print for keys %INC; sub preload_unicore { use File::Find::Rule; use Config(); my $privlib = $Config::Config{installprivlib}.'/'; my @files = File::Find::Rule->file->name(qr/\.pl$/)->in( $privlib. +'unicore/' ); tr{\\}{/} for $privlib, @files; s{^\Q$privlib\E}{} for @files; eval { require "$_"} for @files; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1032892]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (14)
As of 2014-04-17 20:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (455 votes), past polls