Re: Is it possible to localize the stat/lstat cache?

Hi jeff,

When I come to this site with some spare time, I try to work through some script that stretches my game a little bit. I had to add print statements to figure out your syntax but wanted to ask for clarification.

$ perl stat1.pl
files are ./causes2.txt ./fears1.pl ./fears1.pl~ ./fears2.txt ./stat1.
+pl ./stat1.pl~ ./template_stuff
240
282
242
63
396
362
4096
subroutine says this is your hash: 
key: ./stat1.pl, value: HASH(0xa1519ac)
key: ./causes2.txt, value: HASH(0xa0fe7ec)
key: ./fears1.pl, value: HASH(0xa118598)
key: ./fears1.pl~, value: HASH(0xa117ddc)
key: ./fears2.txt, value: HASH(0xa12c59c)
key: ./stat1.pl~, value: HASH(0xa17581c)
key: ./template_stuff, value: HASH(0xa22a8d4)
$
[download]

Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?

I really couldn't understand the map and resulting hash until I saw that the values were themselves hash references. I'm not suggesting that I added to your script in any way to improve it; rather it is simply more verbose:

$ cat stat1.pl
use strict;
use warnings;
use 5.010;
use lib "template_stuff";
use utils1 qw(print_hash);

my @files = glob('./*');
my %stat = map {
    $_ => {
        r => (-r $_),
        w => (-w $_),
        x => (-x $_),
        s => (-s $_),
    }
} @files;

say "files are @files";
# print out all sizes, as an example
print $stat{$_}{s}, $/ for @files;
my $hashref = \%stat;
print_hash ( $hashref );
$
[download]

Q2) Do I have it correct that the stat hash has an array reference as its value, where it references a hash with the letters for filetests as keys and their stat'ed values for any given file as values?

Q3) How would I enumerate them, that is, display all their values for a directory?

Thanks for your interesting post and comment,

[reply]
[d/l]
[select]

subroutine says this is your hash: 
key: ./stat1.pl, value: HASH(0xa1519ac)
[download]

Use Data::Dumper or similar to dump the hash content.

Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?

They aren't. Directories on ext2/3/4 filesystems have a minimal size, 1 block, which is 4096 bytes on typical large filesystems. Smaller filesystems may use block sizes of 1024 or 2048. Directories filled with many files grow larger than one block. Removing the files will NOT make the directory shrink. Other filesystems may give completely different results. Unless you are writing low-level code to check, repair, or backup filesystems, it is best to completely ignore any size value for anything but plain files.

my %stat = map {
    $_ => {
        r => (-r $_),
        w => (-w $_),
        x => (-x $_),
        s => (-s $_),
    }
} @files;
[download]

Note that this code is not as efficient as it may seem. It hides four (l)stat calls per file, and so it may cause race conditions. To really reduce the number of (l)stat calls, use one explicit (l)stat and the special file handle _ instead of $_:

my %stat = map {
    lstat($_) or die "Can't lstat $_: $!";
    $_ => {
        r => (-r _),
        w => (-w _),
        x => (-x _),
        s => (-s _),
    }
} @files;
[download]

fishmonger gave a much better hint: File::stat's stat and lstat functions both return an object that could be stored in the hash, allowing you to run all tests that you need without storing each tests result in the %stat hash:

use v5.12;
use File::stat 1.02 qw( stat lstat );
# ...
my %stat = map { $_ => lstat($_) } @files;
# ...
for my $fn (@files) {
  say $fn,' is ',(-d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'exe
+cutable' : 'not executable');
  say $fn,' has a size of ',$stat{$fn}->size(),' bytes, uses ',$stat{$
+fn}->blocks(),' "blocks" of 512 bytes, the filesystem uses a block si
+ze of ',$stat{$fn}->blksize(),' bytes';
}
[download]

Update: Note that stat and lstat often return st_blocks for the historic block size of 512, even if the filesystem uses a different block size. This conforms to POSIX:

The unit for the st_blocks member of the stat structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from <sys/statvfs.h>) structure members.

Traditionally, some implementations defined the multiplier for st_blocks in <sys/param.h> as the symbol DEV_BSIZE.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

[reply]
[d/l]
[select]

Thanks for your comments and scripts, alexander. This might be a sheer banality to those with greater expereince and understanding than me, but this material is right at where I can tread the learning curve. Data::Dumper truly makes this version pretty (abridged for length):

$ ./stat3.pl
$VAR1 = {
          './stat1.pl' => {
                            'w' => 1,
                            'r' => 1,
                            'x' => '',
                            's' => 393
                          },
          './causes2.txt' => {
                               'w' => 1,
                               'r' => 1,
                               'x' => '',
                               's' => 299
                             },
...
                          },
          './stat3.pl' => {
                            'w' => 1,
                            'r' => 1,
                            'x' => 1,
                            's' => 293
                          },
          './template_stuff' => {
                                  'w' => 1,
                                  'r' => 1,
                                  'x' => 1,
                                  's' => 4096
                                },

        };
$ cat stat3.pl
#!/usr/bin/perl -w
use strict;
use v5.12;
use Data::Dumper;

my @files = glob('./*');
my %stat  = map {
  lstat($_) or die "Can't lstat $_: $!";
  $_ => {
    r => ( -r _ ),
    w => ( -w _ ),
    x => ( -x _ ),
    s => ( -s _ ),
    }
} @files;
my $hashref = \%stat;
print Dumper($hashref);
$
[download]

This other version shows the same material but with blocks used and the (abridged) output from Dumper:

$ ./stat2.pl
...
./stat3.pl is executable
./stat3.pl has a size of 293 bytes, uses 8 "blocks" of 512 bytes, the 
+filesystem uses a block size of 4096 bytes
./template_stuff is a directory
./template_stuff has a size of 4096 bytes, uses 8 "blocks" of 512 byte
+s, the filesystem uses a block size of 4096 bytes
$VAR1 = {
...
          './stat1.pl' => bless( [
                                   2049,
                                   404418,
                                   33204,
                                   1,
                                   1000,
                                   1000,
                                   0,
                                   393,
                                   1429336542,
                                   1429336472,
                                   1429336472,
                                   4096,
                                   8
                                 ], 'File::stat' ),
...

          './template_stuff' => bless( [
                                         2049,
                                         533854,
                                         16893,
                                         5,
                                         1000,
                                         1000,
                                         0,
                                         4096,
                                         1429385812,
                                         1429348668,
                                         1429348668,
                                         4096,
                                         8
                                       ], 'File::stat' ),
[download]

This shows that even the small files take up 8 blocks in 2 different ways. I've been scratching my head to figure out all these fields, and they are to be the eqivalent of stat(2):

struct stat {
    dev_t     st_dev;     /* ID of device containing file */
    ino_t     st_ino;     /* inode number */
    mode_t    st_mode;    /* protection */
    nlink_t   st_nlink;   /* number of hard links */
    uid_t     st_uid;     /* user ID of owner */
    gid_t     st_gid;     /* group ID of owner */
    dev_t     st_rdev;    /* device ID (if special file) */
    off_t     st_size;    /* total size, in bytes */
    blksize_t st_blksize; /* blocksize for file system I/O */
    blkcnt_t  st_blocks;  /* number of 512B blocks allocated */
    time_t    st_atime;   /* time of last access */
    time_t    st_mtime;   /* time of last modification */
    time_t    st_ctime;   /* time of last status change */
};
[download]

The script:

#!/usr/bin/perl -w
use strict;
use warnings;
use v5.12;
use File::stat 1.02 qw( stat lstat );
use Data::Dumper;
my @files = glob('./*');
say "files are @files";
my %stat = map { $_ => lstat($_) } @files;
# print out all sizes, as an example
for my $fn (@files) {
  say $fn, ' is ',
    ( -d $stat{$fn} ? 'a directory'
    : -x $stat{$fn} ? 'executable'
    :                 'not executable' );
  say $fn, ' has a size of ', $stat{$fn}->size(), ' bytes, uses ',
    $stat{$fn}->blocks(),
    ' "blocks" of 512 bytes, the filesystem uses a block size of ',
    $stat{$fn}->blksize(), ' bytes';
}
my $hashref = \%stat;
print Dumper($hashref);
[download]

Q1) Why does Data::Dumper bless this? I understand just enough about "bless" to be completely-miffed by it, much like in its religious context.

As to which is "better," that would clearly depend on the user's needs. Maybe the user doesn't want certain information in a large hash. For me, it was a worthwhile exercise both ways.

[reply]
[d/l]
[select]

I could do that, but we're talking about a very large number of files, so memory starts to become an issue in that case.

[reply]


more useful options
	PerlMonks