Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

reading through dated directories

by Anonymous Monk
on Jun 17, 2021 at 18:13 UTC ( #11133966=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am reading though a large set of dated directories with a large number of dates where the number of <date> directories is much larger than the number of parent <dir> directories. There is only one date "dir" per parent in the future--the rest are in the past. I just have to find these future dates.

dir1 / <date1> dir1 / <date2> dir2 / <date3> dir3 / <date4>

I am currently using glob("*/*") to loop through all the dates to find all the ones in the future. My glob has become annoyingly slow and I'd like to speed it up. Any ideas how I can use readdir() to return an array of "dir / <date> " of only the future dates

Thank you for your time--always appreciate the help & advice.

Best,

Michael

Replies are listed 'Best First'.
Re: reading through dated directories
by choroba (Archbishop) on Jun 17, 2021 at 20:03 UTC
    I created the following Makefile to simulate your situation:

    The gen.sh is used to generate the directories:

    glob.pl uses glob. It returns the entries sorted, so the last one is the future one, as dates sort alphabetically in the YYYYMMDD form (at least for the nearest future).

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Time::Piece; my $now = localtime->ymd(""); for my $dir (glob '???/') { say substr +(glob "$dir*/")[-1], 0, -1; }

    path.pl uses Path::Tiny.

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Time::Piece; use Path::Tiny qw{ path }; my $now = localtime->ymd(""); for my $dir (grep $_->is_dir, path('.')->children) { for my $date (grep $_->is_dir, $dir->children) { say $date if $date->basename > $now; } }

    readdir.pl uses opendir and readdir.

    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Time::Piece; my $now = localtime->ymd(""); opendir my $dir, '.' or die $!; while (my $pdir = readdir $dir) { next if $pdir =~ /^\.{1,2}$/ || ! -d $pdir; opendir my $ddir, $pdir or die $!; while (my $date = readdir $ddir) { next if $date =~ /^\.{1,2}$/ || ! -d $dir; say "$pdir/$date" if $date > $now; } }

    Run make compare to verify all three scripts give the same output.

    The results on my machine were

    make glob > /dev/null real 0m0.109s user 0m0.054s sys 0m0.053s make readdir > /dev/null real 0m0.074s user 0m0.045s sys 0m0.029s make path > /dev/null real 0m0.440s user 0m0.380s sys 0m0.060s

    You can see, readdir.pl has the most verbose code as it's the most low-level, but it's the fastest one. Glob is a bit slower, but still pretty good. Path::Tiny didn't really shine, but maybe its code can be improved.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thank you thank you for the amazing examples & the benchmarks! Wow. You monks never cease to amaze me!
Re: reading through dated directories
by 1nickt (Abbot) on Jun 17, 2021 at 18:46 UTC

    I suggest using Path::Tiny for working with the files: its visit method can recursively walk a directory tree and apply a subroutine to each filename found, accumulating a list of matching files. The solution to matching the filenames will depend on the naming format used.

    Update: since you showed the format elsewhere ... try something like this:

    use strict; use warnings; use feature 'say'; use Data::Dumper; use Path::Tiny; use Time::Piece; my $root = "/Users/1nickt/perlmonks/11133966"; my $today = localtime->strftime('%Y%m%d'); my %list; path($root)->visit(sub { my $path = shift; next unless -d $path; my $date = substr($path, -8); if ( $date =~ /^[0-9]{4}[0-2][0-9][0-3][0-9]$/ && $date gt $today +) { push(@{ $list{$_->parent} }, $path =~ s{.+/}{}r); } }, { recurse => 1, }); say Dumper \%list;

    Hope this helps!


    The way forward always starts with a minimal test.
      Thank you very much for your suggestion!! I will try this out!
Re: reading through dated directories
by choroba (Archbishop) on Jun 17, 2021 at 18:38 UTC
    What is the format of the date strings? YYYYMMDD, MMDDYY, or something even wilder?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thank you for your response :) Yes, it's YYYYMMDD

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11133966]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2021-07-29 05:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?