Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: One true regexp for untainting windows filenames?

by cdarke (Prior)
on Jan 08, 2009 at 08:47 UTC ( #734842=note: print w/replies, xml ) Need Help??

in reply to One true regexp for untainting windows filenames?

Strictly speaking it is the filesystem which defines which characters are legal, not the operating system. This means that a drive shared between *nix and Windows (using, say, Samba) can have an interesting effect on file naming. Also remember that NTFS filenames can contain Unicode characters.

Take a look at the .pm for File::Basename (which should be in your base release).
  • Comment on Re: One true regexp for untainting windows filenames?

Replies are listed 'Best First'.
Re^2: One true regexp for untainting windows filenames?
by jaldhar (Vicar) on Jan 08, 2009 at 23:42 UTC

    Thanks for the tip. I found slightly more understandable code in File::Spec which has resulted in the following regexps: for Unix...

    qr{(\A (?: .* / (?: \.\.?\z )? )? [^/]* )}msx;
    ...and Windows (includes UNC paths)...
    qr{(\A (?: [a-zA-Z]: | (?:\\\\\\\\|//)[^\\\\/]+[\\\\/][^\\\\/]+ )? (?:.*[\\/](?:\.\.?\Z(?!\n))?)? .* )}msx;


      There is no a string that

      qr{(\A (?: .* / (?: \.\.?\z )? )? [^/]* )}msx

      won't match.

      It's wrong for two reasons.

      • "foo" gets "untainted" as "".
      • "x/xx\0xx"" is believed to be a valid file name, but it isn't.

      Valid unix paths and only valid unix paths match


      (Although that doesn't mean there can ever be a file referenced by that path.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://734842]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2017-07-20 23:54 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (316 votes). Check out past polls.