http://www.perlmonks.org?node_id=799292

isync has asked for the wisdom of the Perl Monks concerning the following question:

I've got a script which can be run on Windows and Linux/Unix platforms. One function involves identifying if a directory entry is a dir or a file.

I differentiate with my $attr = POSIX::S_ISDIR($stat[1]) ? FILE_ATTRIBUTE_DIRECTORY : FILE_ATTRIBUTE_NORMAL;

On Win32 this works okay, but when the stat is issued on a Linux platform, the POSIX::S_ISDIR($stat1) does not work correctly.

Numeric return values differ between these two platforms-
On Win32, alimited set of possible return values (2):
16895 for a dir and 33206 for a file.

On Linux, depending on permissions, modes are for example:
16877 for a dir, 33188 for a file

I've been so far unable to fully wrap my head around bit operators, but as far as I understand it, POSIX::S_ISDIR should be able to differentiate between a number of possible values, especially these coming from a Linux filesystem. And not only the limited set of Win32 modes... Right? What am I missing?

Replies are listed 'Best First'.
Re: POSIX::S_ISDIR() with $stat->mode values from Windows vs. Linux
by Corion (Patriarch) on Oct 05, 2009 at 16:37 UTC

    What's wrong with -d and -f?

      I am kind of wrapping stat() for a filesystem overlay, FUSE, Filesys::Virtual, client and server script stuff.

      You are right: The easy way would be to just use -d and -f, as you pointed out, and equal them to my own breed of flags - right. (already tried that and it worked)

      But I would like to stay away from introducing my own lingo and instead stick to standards. As of now, I only need the differentiation between file/dir, but in the future I might needed to have more resolution in that field. And starting my own flags now would lead to reinventing the whole "mode" thing in the future...
Re: POSIX::S_ISDIR() with $stat->mode values from Windows vs. Linux
by jakobi (Pilgrim) on Oct 05, 2009 at 16:47 UTC

    Those extra octets with (stat())[2] are filetype, sticky bit and others (or as you pose the question exactly the other way: the lower bits are the standard permission bits). Just mask them with and (and please consider using octals for readability, 33206 is more an illness than anything else).

    If you've access to a unix box: man 2 stat; man 3 stat.

    As you do have access to an ubuntu box: consider grepping the compiler includes (struct stat, field mode_t), they're authorative and quite often eye-opening (even if they tend to be in that wrong, lower language called C. Still that and the syscalls are the base for Perl).

    :)
    Peter

    PS: missing is e.g. this
    ls -ld . -> drwxr-xr-x 187 jakobi jakobi 16384 2009-10-05 18:56 .
    which are the lower bits in the stat call.

    ignore the leftmost d for now, and you've the 3*3 perm bits for user, group and other. Which probably are the bits accounting for most of the windows to linux difference in stat output. (please also ignore the output overload of the ls command, which just has to abose the x bit for things like hinting at the setgid or setuid bit, and worse - that's doesn't have anything to do with the bitfield from stat()). The 'd' from ls is used for a filetype of a dir, as with the test commands of test -d in the shell or Perl's -d. And this filetype is encoded in some of the higher bits returned by mask.

Re: POSIX::S_ISDIR() with $stat->mode values from Windows vs. Linux
by ambrus (Abbot) on Oct 05, 2009 at 16:57 UTC

    Tricky question. Try $stat[2] instead of $stat[1] if you're initializing @stat with @stat = stat($filename).

      As you already rightfully assumed: I in fact use a customized @stat, where mode is actually $stat[1] in contrary to standard @stat = stat($file). I omitted that... sorry.
Re: POSIX::S_ISDIR() with $stat->mode values from Windows vs. Linux
by isync (Hermit) on Oct 05, 2009 at 18:31 UTC
    jakobi's comment, I think, leads into the very core of the problem. The difference between a numeric, a string and octals. And the fact that I still haven't quite figured out the logics behind a bitmask, bit operators and the like.

    A bit of pseudo code to illustrate my script, so someone who kwows this stuff better than me can tell me where I need to throw in the oct() function or so to make it work again:
    1. server-side script does a stat()
    2. server-side script returns a HTTP::Response with content in the form "$name\t$size\t$mode\t$nlink\t$ctime\t$atime\t$mtime"
    3. client-side script receives the response content as a string
    4. client-side script does @stat  = split(/\t/, $stat); on the string
    5. client-side script does if(POSIX::S_ISDIR(int($stat[1]))){ ... } # int() to make sure its numeric, but something got lost here

    5. worked in a test-setup where "server-side" was on Windows and "client-side" was on Windows
    5. broke when I migrated the "server-side" to a Debian machine.
    (how to encode/decode $mode to keep it intact while represented as a string?)

    BTW: man (2) stat is full of insight... ;-)

      Ok, I'll bite

      First of all, I don't see any need for oct(). This is a bitfield - so you might want to do a perl -e 'printf "0%o\n", 16895' for human consumption. And if you ever see that on Unix and it isn't called tmp, you'll probably also want to give the culprit a lecture about security risks. The readable form of this number is 040777, with 777 being the permission bits of read/write/executable for all of user, group and other (read maybe as 040ugo:) ).

      Now let's peek at the includes, pretending that the 040 wasn't an obvious hint to us for the encoding of the directory file type.

      Might be a useful trick of the trade for use with more difficult future questions in situations where Perl offers less or no suitable abstractions or you need some background to properly make use of them.

      $ grep -lri mode_t /usr/include $ # as usual, do sceptically eyeball sys and linux, $ # but for now, this sounds like a hit: $ less /usr/include/bits/stat.h /* Encoding of the file mode. */ #define __S_IFMT 0170000 /* These bits determine file type. +*/ /* File types. */ #define __S_IFDIR 0040000 /* Directory. */ #define __S_IFCHR 0020000 /* Character device. */ #define __S_IFBLK 0060000 /* Block device. */ #define __S_IFREG 0100000 /* Regular file. */ #define __S_IFIFO 0010000 /* FIFO. */ #define __S_IFLNK 0120000 /* Symbolic link. */ #define __S_IFSOCK 0140000 /* Socket. */ /* POSIX.1b objects. Note that these macros always evaluate to zero. + But they do it by enforcing the correct use of the macros. */ #define __S_TYPEISMQ(buf) ((buf)->st_mode - (buf)->st_mode) #define __S_TYPEISSEM(buf) ((buf)->st_mode - (buf)->st_mode) #define __S_TYPEISSHM(buf) ((buf)->st_mode - (buf)->st_mode) /* Protection bits. */ #define __S_ISUID 04000 /* Set user ID on execution. */ #define __S_ISGID 02000 /* Set group ID on execution. */ #define __S_ISVTX 01000 /* Save swapped text after use (stick +y). */ #define __S_IREAD 0400 /* Read by owner. */ #define __S_IWRITE 0200 /* Write by owner. */ #define __S_IEXEC 0100 /* Execute by owner. */ # strange - this file doesn't define access macros for group/other..

      The numbers above are octal numbers, for improved readability, as hinted at by the leading 0.

      So you say in Perl $mode & 0777 to restrict the $mode to the actual file permissions (cf. chmod arguments!). Say $mode & 07 to check the access of nobody (but remember in the background that there's also filesystem specific stuff like ACL and Extended Attributes, which can modify the traditional perms; hopefully somebody alreade made a module for checking on these headaches :))

      To test for a directory w/o POSIX, but just boolean operators directly. Which is just what the minimal wrapping by the POSIX module does:

      warn "a dir\n" if (stat ".")[2] & 040000 == 040000

      with 040000 being __S_IFDIR from the includes above. Recognized POSIX::S_IFDIR?

      I think that should be enough about background and boolean masking. Note that instead of and, or, exor you can also left or right-shift the bits (man perlop: search for &, |, ^, <<, >>).

      AFAIS, the missing bits / misunderstandings were:

      • not printing the values in octal for readability during debugging,
      • possibly explicit testing of equality of modes as is, without masking the irrelevant bits
      • S_IFDIR considers exactly 1 bit, both on Windows and in the Linux VFS. There is exactly one type of directory on both platforms for all filesystems :). excluding of the more interesting non-standard abuses of FUSE for now.

      If possible however, try the normal perl tests like -d as already suggested.

      cu
      Peter

      my $d = "16895"; my $f = "33206"; my $od = sprintf("%5o",$d); my $of = sprintf("%5o",$f); print "$d->$od\n$f->$of\n"; my $d = "16877"; my $f = "33188"; my $od = sprintf("%5o",$d); my $of = sprintf("%5o",$f); print "$d->$od\n$f->$of\n"; # get the three lowest bits (use the INTEGER value for this, not the o +ctal string!) my $l3b = $d & 7; # 0000111 in binary, or '7' in octal and in dec print "lowest three bits of octal $od is $l3b\n"; # get the three bits starting at bit 4 (111000 in bin, 70 in oct, 56 d +ec) my $l43b = $d & oct(70); # still 6 bits though, $l43b = $l43b >> 3; print "$od and oct(70) shifted 3 bits to the right = ",sprintf("%o",$l +43b)," in octal\n";
      gives
      16895->40777 33206->100666 16877->40755 33188->100644 lowest three bits of octal 40755 is 5 40755 and oct(70) shifted 3 bits to the right = 5 in octal
Re: POSIX::S_ISDIR() with $stat->mode values from Windows vs. Linux
by isync (Hermit) on Oct 05, 2009 at 20:13 UTC
    Slap on the forehead!!
    This monk begs your pardon...

    I've found the cause for this strange behavior, the issue that the "dir-test" broke when running on Debian - this line:
    my $attr = int($mode) == 16895 ? FILE_ATTRIBUTE_DIRECTORY : FILE_ATTRI +BUTE_NORMAL;
    An old dirty hack more down my code, where I forgot about it. A line which in a stupid way testes against the two Win32-cases of dir/file... Works on Windows, but can't handle the richer responses from the Debian platform fs.

    Thank you very much for the lecture everyone.
    And for the monks who feel I wasted their time: they might take comfort from the fact that I, while this thread grew, went through possibly every iteration of oct(), hex(), sprintf("%5o",$mode), $isdir =~ /^40/, there is...

      :)

      One more thing to check given NTFS: are you certain that a native windows Perl will _ALWAYS_ see just those 2 values. Say on a host with 2 user accounts... .

        I thought of that, but at least on my XP box it seems to be just those two.
        Anyway, as I threw out this line and check with POSIX::S_ISDIR() solely, I am on the safe side, I think.
Re: POSIX::S_ISDIR() with $stat->mode values from Windows vs. Linux
by afoken (Chancellor) on Oct 06, 2009 at 16:40 UTC

    Perl's stat() emulation for Windows returns at least three different values, see Re^3: Inline.pm and untainting.

    You DO NOT NEED the POSIX macro emulations and those ugly decimal notation of file modes if you just want to test if a name belongs to a directory or a plain file, use perls -X functions, especially -d to test for a directory and -f to test for a plain file.

    print "$somename is a real directory" if -d($somename); print "$somename is a real plain file" if -f($somename);

    Note that those function usually test the link target of a symlink, except of course for -l. If you want "real" files or directories and not symlinks, either test explicitly for a symlink or use lstat(). Think twice before insisting on "real" files or directories. Most people expect symlinks to be completely transparent for applications, so don't violate that expectation except for a very good reason (like running with root privileges in a public writeable directory like /tmp).

    # way 1: explicit test print "$somename is a real directory" if !-l($somename) && -d _; print "$somename is a real plain file" if !-l($somename) && -f _; # way 2: using lstat() lstat($somename); print "$somename is a real directory" if -d _; lstat($somename); print "$somename is a real plain file" if -f _;

    Note that using a single underscore as argument reuses the struct stat of the last lstat()/stat()/-X, saving slow system calls.

    Update:

    For the stat() return values on Windows, and the reason why stat() is emulated that way unter Windows, see Re^3: Inline.pm and untainting.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Alexander talked enough about symlinks that I want to add this caution on both symlinks and hardlinks to the thread:

      If you write to existing files other than append-without-backup, and there's the tiniest risk of a user trying to symlink or hardlink that file: explicitely specify how you write out the file. Funny things happen: E.g. sed -i.bak or perl i.bak and that patch on Fedora recently being removed: changing sed's semantics wrt symlinks and surprising everyone.

      date > 1; ln -s 1 2; perl -i.bak -lpe '' 2; ls -l [12]* # enjoy

      So at the very least, document how and when you're breaking symlinks or hardlinks on writing. Or not breaking them. Which might be just as bad.

      cu
      Peter

      Footnote: -d _ reuses the previous stat, as Alexander writes, just -d does a new stat using $_. So this an an example of terseness being possibly more costly in both debugging time and runtime :).