Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Is it possible to localize the stat/lstat cache?

by jeffa (Bishop)
on Apr 17, 2015 at 17:53 UTC ( [id://1123802]=note: print w/replies, xml ) Need Help??


in reply to Is it possible to localize the stat/lstat cache?

Why not store the results based on the files themselves?

use strict; use warnings; my @files = glob('./*'); my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files; # print out all sizes, as an example print $stat{$_}{s}, $/ for @files;

Now you can call stat or lstat and still have a lookup table for cached values that you can always overwrite if you wish.

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

Replies are listed 'Best First'.
Re^2: Is it possible to localize the stat/lstat cache?
by Aldebaran (Curate) on Apr 18, 2015 at 06:10 UTC

    Hi jeff,

    When I come to this site with some spare time, I try to work through some script that stretches my game a little bit. I had to add print statements to figure out your syntax but wanted to ask for clarification.

    $ perl stat1.pl files are ./causes2.txt ./fears1.pl ./fears1.pl~ ./fears2.txt ./stat1. +pl ./stat1.pl~ ./template_stuff 240 282 242 63 396 362 4096 subroutine says this is your hash: key: ./stat1.pl, value: HASH(0xa1519ac) key: ./causes2.txt, value: HASH(0xa0fe7ec) key: ./fears1.pl, value: HASH(0xa118598) key: ./fears1.pl~, value: HASH(0xa117ddc) key: ./fears2.txt, value: HASH(0xa12c59c) key: ./stat1.pl~, value: HASH(0xa17581c) key: ./template_stuff, value: HASH(0xa22a8d4) $

    Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?

    I really couldn't understand the map and resulting hash until I saw that the values were themselves hash references. I'm not suggesting that I added to your script in any way to improve it; rather it is simply more verbose:

    $ cat stat1.pl use strict; use warnings; use 5.010; use lib "template_stuff"; use utils1 qw(print_hash); my @files = glob('./*'); my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files; say "files are @files"; # print out all sizes, as an example print $stat{$_}{s}, $/ for @files; my $hashref = \%stat; print_hash ( $hashref ); $

    Q2) Do I have it correct that the stat hash has an array reference as its value, where it references a hash with the letters for filetests as keys and their stat'ed values for any given file as values?

    Q3) How would I enumerate them, that is, display all their values for a directory?

    Thanks for your interesting post and comment,

      subroutine says this is your hash: key: ./stat1.pl, value: HASH(0xa1519ac)

      Use Data::Dumper or similar to dump the hash content.

      Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?

      They aren't. Directories on ext2/3/4 filesystems have a minimal size, 1 block, which is 4096 bytes on typical large filesystems. Smaller filesystems may use block sizes of 1024 or 2048. Directories filled with many files grow larger than one block. Removing the files will NOT make the directory shrink. Other filesystems may give completely different results. Unless you are writing low-level code to check, repair, or backup filesystems, it is best to completely ignore any size value for anything but plain files.

      my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files;

      Note that this code is not as efficient as it may seem. It hides four (l)stat calls per file, and so it may cause race conditions. To really reduce the number of (l)stat calls, use one explicit (l)stat and the special file handle _ instead of $_:

      my %stat = map { lstat($_) or die "Can't lstat $_: $!"; $_ => { r => (-r _), w => (-w _), x => (-x _), s => (-s _), } } @files;

      fishmonger gave a much better hint: File::stat's stat and lstat functions both return an object that could be stored in the hash, allowing you to run all tests that you need without storing each tests result in the %stat hash:

      use v5.12; use File::stat 1.02 qw( stat lstat ); # ... my %stat = map { $_ => lstat($_) } @files; # ... for my $fn (@files) { say $fn,' is ',(-d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'exe +cutable' : 'not executable'); say $fn,' has a size of ',$stat{$fn}->size(),' bytes, uses ',$stat{$ +fn}->blocks(),' "blocks" of 512 bytes, the filesystem uses a block si +ze of ',$stat{$fn}->blksize(),' bytes'; }

      Update: Note that stat and lstat often return st_blocks for the historic block size of 512, even if the filesystem uses a different block size. This conforms to POSIX:

      The unit for the st_blocks member of the stat structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from <sys/statvfs.h>) structure members.

      Traditionally, some implementations defined the multiplier for st_blocks in <sys/param.h> as the symbol DEV_BSIZE.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        Thanks for your comments and scripts, alexander. This might be a sheer banality to those with greater expereince and understanding than me, but this material is right at where I can tread the learning curve. Data::Dumper truly makes this version pretty (abridged for length):

        $ ./stat3.pl $VAR1 = { './stat1.pl' => { 'w' => 1, 'r' => 1, 'x' => '', 's' => 393 }, './causes2.txt' => { 'w' => 1, 'r' => 1, 'x' => '', 's' => 299 }, ... }, './stat3.pl' => { 'w' => 1, 'r' => 1, 'x' => 1, 's' => 293 }, './template_stuff' => { 'w' => 1, 'r' => 1, 'x' => 1, 's' => 4096 }, }; $ cat stat3.pl #!/usr/bin/perl -w use strict; use v5.12; use Data::Dumper; my @files = glob('./*'); my %stat = map { lstat($_) or die "Can't lstat $_: $!"; $_ => { r => ( -r _ ), w => ( -w _ ), x => ( -x _ ), s => ( -s _ ), } } @files; my $hashref = \%stat; print Dumper($hashref); $

        This other version shows the same material but with blocks used and the (abridged) output from Dumper:

        $ ./stat2.pl ... ./stat3.pl is executable ./stat3.pl has a size of 293 bytes, uses 8 "blocks" of 512 bytes, the +filesystem uses a block size of 4096 bytes ./template_stuff is a directory ./template_stuff has a size of 4096 bytes, uses 8 "blocks" of 512 byte +s, the filesystem uses a block size of 4096 bytes $VAR1 = { ... './stat1.pl' => bless( [ 2049, 404418, 33204, 1, 1000, 1000, 0, 393, 1429336542, 1429336472, 1429336472, 4096, 8 ], 'File::stat' ), ... './template_stuff' => bless( [ 2049, 533854, 16893, 5, 1000, 1000, 0, 4096, 1429385812, 1429348668, 1429348668, 4096, 8 ], 'File::stat' ),

        This shows that even the small files take up 8 blocks in 2 different ways. I've been scratching my head to figure out all these fields, and they are to be the eqivalent of stat(2):

        struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for file system I/O */ blkcnt_t st_blocks; /* number of 512B blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };

        The script:

        #!/usr/bin/perl -w use strict; use warnings; use v5.12; use File::stat 1.02 qw( stat lstat ); use Data::Dumper; my @files = glob('./*'); say "files are @files"; my %stat = map { $_ => lstat($_) } @files; # print out all sizes, as an example for my $fn (@files) { say $fn, ' is ', ( -d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'executable' : 'not executable' ); say $fn, ' has a size of ', $stat{$fn}->size(), ' bytes, uses ', $stat{$fn}->blocks(), ' "blocks" of 512 bytes, the filesystem uses a block size of ', $stat{$fn}->blksize(), ' bytes'; } my $hashref = \%stat; print Dumper($hashref);

        Q1) Why does Data::Dumper bless this? I understand just enough about "bless" to be completely-miffed by it, much like in its religious context.

        As to which is "better," that would clearly depend on the user's needs. Maybe the user doesn't want certain information in a large hash. For me, it was a worthwhile exercise both ways.

Re^2: Is it possible to localize the stat/lstat cache?
by Anonymous Monk on Apr 20, 2015 at 12:19 UTC
    I could do that, but we're talking about a very large number of files, so memory starts to become an issue in that case.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1123802]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-23 22:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found