http://www.perlmonks.org?node_id=112867

grinder has asked for the wisdom of the Perl Monks concerning the following question:

A few days ago, someone posted a question about reading directories. To discard the current and parent directories that are returned by readdir, someone suggested to:

opendir(DIR, "/users/foo/"); @bar = readdir(DIR); closedir(DIR); for (1..2) {shift @bar;} #get rid of '.' and '..'

Notwithstanding the lack of error checking, and a dubious array hack better handled by splice, merlyn subsequently pointed out that "There is no promise that the first two elements returned from a readdir are dot and dotdot," and I nodded in agreement.

Then I reflected on the fact that I have been using Perl for nearly ten years; I have use Perl on a half a dozen Unix variants, on VMS, on MS-DOS and Win32. In all that time, grovelling through filesystems has always been a large component of my work and yet I cannot recall a single time when . and .. were returned in any but the first two positions. Maybe it happens once in a while for any script in production -- I'm talking about when I'm stepping through some development code I'm testing.

I heed the warning and know that this behaviour is not guaranteed, so I dutifully code:

while( defined( my $file = readdir(DIR) )) { next if $file eq '.' or $file eq '..'; munge($file); }

but maybe it would be more elegant to be able to write code that throws away the first two results returned by readdir and then have a while block that doesn't contain first next if ... check. What I mean is that it's a nearly-invariant test that could be hoisted out of the loop if readdir was a little more deterministic.

My questions are

  1. Does there exist a platform where readdir returns . and .. in other than the first two return values?
  2. Is readdir guaranteed to return . and .. as the first two return values on any platform (specifically *BSD, Linux, Solaris and Win32)? (or more specifically the filesystem in use, such as ext2, ffs, ntfs...)

I realise my assumption is based on the notion that I consider a directory stream to be a linear list. Walking down it is akin to accessing array elements. A hash-based directory stream would produce unordered results, but I'm not sure I've ever encountered one, at least as far as the visible behaviour from userspace is concerned.

--
g r i n d e r

Replies are listed 'Best First'.
Re: Is readdir ever deterministic?
by merlyn (Sage) on Sep 17, 2001 at 18:52 UTC
    readdir on Unix returns the underlying raw directory order. Additions and deletions to the directory use and free-up slots. The first two entries to any directory are always created as "dot" and "dotdot", and these entries are never deleted under normal operation.

    However, if a directory entry for either of these gets incorrectly deleted (through corruption, or using the perl -U option and letting the superuser unlink it, for example), the next fsck run has to recreate the entry, and it will simply add it. Oops, dot and dotdot are no longer the first two entries!

    So, defensive programming mandates that you do not count on the slot order. And there's no promise that dot and dotdot are the first two entries, because Perl can't control that, and the underlying OS doesn't promise it either.

    -- Randal L. Schwartz, Perl hacker

(dkubb) Re: (2) Is readdir ever deterministic?
by dkubb (Deacon) on Sep 17, 2001 at 19:43 UTC

    If you're trying to write cross-platform code you may want to keep something else in mind - another thing that isn't absolutely gauranteed is the characters the OS uses to represent the current and the parent directories.

    I use File::Spec::Functions's no_upwards() function to check and see if the filename is the current working or parent directory or not:

    #!/usr/bin/perl -wT use strict; use File::Spec::Functions qw(catdir rootdir no_upwards); use constant START_DIR => catdir(rootdir, qw(users foo)); opendir DIR, START_DIR or die 'Could not open directory ', START_DIR, ": $!"; while( defined(my $file = readdir DIR) ) { next unless no_upwards($file); print "$file\n"; #...do something with $file } closedir DIR;
Re: Is readdir ever deterministic?
by VSarkiss (Monsignor) on Sep 17, 2001 at 19:14 UTC

    As a further point on defensive programming, consider that while some platform may guarantee to return . and .. as the first two entries, they may not guarantee that in the future. If you can make your code more robust in the face of potential future changes at no additional cost, why not do so?

    To me, the real point is this: your code should say what you mean. If you intend to exclude . and .., then write that, as shown in your example code. Even if every platform in existence guaranteed the contents of the first two directory entries, IMHO it'd be better to code it as in your example than to eliminate the first two elements and have to explain in a comment what the code is intended to achieve.

      I concur, especially since testing for . and .. is not especially expensive. Sure, there are 2 extra checks per directory entry, but unless a profiler tells you that is a specific performance problem for your program, IMO it's not worth the potential maintenance problems to "optimize" it. Later programmers might not immediately understand why the code did that, and it could the time of those assigned to maintain that code.
Re: Is readdir ever deterministic?
by arturo (Vicar) on Sep 17, 2001 at 18:54 UTC

    I'm not qualified to comment on the variations among systems or the lack of determinism, but as far as coding defensively, you could just put a filter on the files with grep :

    foreach my $file ( grep { !/^\.{1,2}$/ } readdir DIR ) { munge($file); }
    perl -e 'print "How sweet does a rose smell? "; chomp ($n = <STDIN>); +$rose = "smells sweet to degree $n"; *other_name = *rose; print "$oth +er_name\n"'
      Well, being truly defensive, you should have written:
      foreach my $file ( grep { !/\A\.{1,2}\z/ } readdir DIR ) { munge($file); }
      Or else the day someone creates a file named dot-newline or dotdot-newline, you'll be very angry.

      -- Randal L. Schwartz, Perl hacker

Re: Is readdir ever deterministic?
by ncw (Friar) on Sep 17, 2001 at 23:22 UTC
    Does there exist a platform where readdir returns . and .. in other than the first two return values?

    Hmm, good question! If any filing system would do this it would be reiserfs under linux. Reiserfs uses a hashed directory structure rather than the linear directory structure as used by most other fs. Reiser returns directory entries in hash order (a familiar concept to perl programmers!).

    I just tried it and the answer is reiserfs always puts "." and ".." first! The hash order isn't as random as a perl hash either which is interesting too.

Re: Is readdir ever deterministic?
by ChemBoy (Priest) on Dec 17, 2001 at 23:54 UTC

    Does there exist a platform where readdir returns . and .. in other than the first two return values?

    Yes! Though it took me three months after reading this question to realize it... or possibly something changed in the system setup during those three months. But in any event, I have an example case:

    % ls -a ./ ../ 10 11 2 4 5 6 8 9 % perl -le 'opendir DIR, "."; print join " ", readdir DIR' . 2 4 5 6 8 9 .. 10 11

    The files were created in the order 2,4,6,8,10,5,9,11 (you could say this was an artificial test case, I suppose...), but this does not seem to affect the sort order. The result is the same running on 5.004/Irix and 5.005/Linux: the key is seems to be the underlying filesystem, which is SGI XFS (under IRIX 6.5).

    As a further (and yet more psychotically artificial) test, we have

    % touch , % ls -a , ./ ../ 10 11 2 4 5 6 8 9 % perl -le 'opendir DIR, "."; print join " ", readdir DIR' , . 2 4 5 6 8 9 .. 10 11

    I haven't done an exhaustive investigation of exactly what XFS does for all cases, but it seems fairly clear that one-character filenames will, in general, break the common assumptions of readdir on that filesystem.



    If God had meant us to fly, he would *never* have given us the railroads.
        --Michael Flanders