Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: using grep on a directory to list files for a single date

by zejames (Hermit)
on Dec 01, 2004 at 13:43 UTC ( #411450=note: print w/replies, xml ) Need Help??


in reply to using grep on a directory to list files for a single date

Just for fun, I wanted to measure the speed difference of greping and just using while.

So I created, in a test directory, lots of small files :

$dir = "test"; mkdir $dir or die "Unable to create dir : $!" if not ( -d "$dir"); chdir $dir; foreach ( 'aaa' .. 'zzz' ) { open F, "> $_"; my $data = chr(97 + int rand 10); print F $data; close F; }

Then I tried to list each file of this directory, and compare :</o>

use Benchmark qw/cmpthese/; $dir = "test"; opendir DIR, "< $dir"; cmpthese(1000, { 'grep' => sub { opendir DIR, "$dir" or die "Unable to open dir : $!\n"; @list=grep(!/^(\.+?)$/,readdir(DIR)); closedir DIR; }, 'while' => sub { opendir DIR, "$dir" or die "Unable to open dir : $!\n"; while (readdir(DIR)) { push @list, $_ unless /^(\.+?)$/; closedir DIR; } } });

As expected, the difference is huge :

D:\Perl\bin>perl test2.pl Rate grep while grep 6.51/s -- -100% while 2667/s 40833% -- D:\Perl\bin>

Using grep, perl interprets readdir in list context, and builds and return the whole list of files of the directory, that is huge.

When using while, perl returnes file names each by each, which is much cheaper in memory.

So, in your case : use while.

For information, I was using Windows XP SP1 and ActivePerl 5.8.4 on a NTFS file system.

HTH


--
zejames

Replies are listed 'Best First'.
Re^2: using grep on a directory to list files for a single date
by markkneen (Acolyte) on Dec 01, 2004 at 14:53 UTC
    OK, sort of got somthing working but im sure there is a more "efficent" way to do it as its still returning a large array and loads of the elements are empty???
    sub list{ my $path=shift; my $comp=shift; if (! -e $path){die "Error : $path $!\n";} opendir(DIR,$path) or die "Error : $path $!\n"; return sort map { my ($d,$m,$y) = (localtime( (stat "$path/$_")[9] ) )[3..5]; $m+=1; $y+=1900; $m=($m<10)?"0$m":$m; $d=($d<10)?"0$d":$d; my $date = "$d/$m/$y"; if($date eq $comp){"$_\n"}; } grep(!/^(\.+?)$/,readdir(DIR)); }
    any ideas??
    Thanks for you help on this so far.
    (goin to try the while() loop next)

      What is the if in the map trying to do?
      if $date eq $comp is false, map adds an undef to the returned list.
      if $date eq $comp is true, map returns "$_\n".
      Below, I assume that you're were trying to filter out dates that don't match. Filtering is grep's job, not map's. The "empty" elements you're getting are the undef returned by map when $date eq $comp is false.

      $! doesn't have any meaningful value after calling -e.

      The -e is redundant. opendir will fail if the dir doesn't exist, and you already handle that.

      The capture in /^(\.+?)$/ wastes time. The ? is meaningless. I wonder if $_ eq '.' || $_ eq '..' would be faster.

      It's probably faster to divide $comp in $year, $month, $day than to convert all the mtimes to strings.

      sub list { my ($path, $comp) = @_; $comp =~ m#^(..)/(..)/(....)$# or die("Error: Badly formatted \$comp.\n"); my $comp_d = $1; my $comp_m = $2; my $comp_y = $3; local *DIR; opendir(DIR, $path) or die("Error: Unable to open directory $path: $!\n"); my @filtered_listing; while (<DIR>) { next if /^\.+$/; my ($mtime_d, $mtime_m, $mtime_y) = (localtime( (stat "$path/$_")[9] ) )[3..5]; next unless ( $mtime_d == $comp_d && $mtime_m == $comp_m && $mtime_y == $comp_y ); push(@filtered_listing, $_); } return sort @filtered_listing; }

        oh, you might want to return a reference to the array instead of the array itself, especially if it's big.

        ... return [ sort @filtered_listing ]; } my $array = list(...); foreach (0..$#$array) { ... $$array[$_] ... }

      ikegami has already posted an excellent reply, showing exactly how to do it with while. That is probably the best way to solve this particular problem, but I thought I would show you how to use map to filter out elements, for your future reference:

      my @array = qw(foo bar baz qux); my @newarray = map { my $foo = $_; $foo =~ s/./\u$&/; # useless example $foo =~ /Ba/ ? $foo : () } grep { /a/ } @array;

      The key here is to return an empty list when the condition fails. It's neat that we can do this with map, but it's usually better to use another grep:

      my @array = qw(foo bar baz qux); my @newarray = grep { /Ba/ } map { my $foo = $_; $foo =~ s/./\u$&/; # useless example $foo } grep { /a/ } @array;

      HTH

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://411450]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2020-10-26 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (250 votes). Check out past polls.

    Notices?