Re: using grep on a directory to list files for a single date

Just for fun, I wanted to measure the speed difference of greping and just using while.

So I created, in a test directory, lots of small files :

$dir = "test";

mkdir $dir or die "Unable to create dir : $!" if not ( -d "$dir");
chdir $dir;

foreach ( 'aaa' .. 'zzz' ) {
    open F, "> $_";
    my $data = chr(97 + int rand 10);
    print F $data;
    close F;
}
[download]

Then I tried to list each file of this directory, and compare :</o>

use Benchmark qw/cmpthese/;

$dir = "test";
opendir DIR, "< $dir";

cmpthese(1000, {
  'grep' => sub {
               opendir DIR, "$dir" or die "Unable to open dir : $!\n";
               @list=grep(!/^(\.+?)$/,readdir(DIR));
               closedir DIR;
            },
  'while' => sub {
               opendir DIR, "$dir" or die "Unable to open dir : $!\n";
               while (readdir(DIR)) {
                 push @list, $_ unless /^(\.+?)$/;
               closedir DIR;
               }
             }
});
[download]

As expected, the difference is huge :

D:\Perl\bin>perl test2.pl
        Rate   grep  while
grep  6.51/s     --  -100%
while 2667/s 40833%     --

D:\Perl\bin>
[download]

Using grep, perl interprets readdir in list context, and builds and return the whole list of files of the directory, that is huge.

When using while, perl returnes file names each by each, which is much cheaper in memory.

So, in your case : use while.

For information, I was using Windows XP SP1 and ActivePerl 5.8.4 on a NTFS file system.

HTH

--
zejames

Comment on Re: using grep on a directory to list files for a single date Select or Download Code

Replies are listed 'Best First'.
Re^2: using grep on a directory to list files for a single date by markkneen (Acolyte) on Dec 01, 2004 at 14:53 UTC
OK, sort of got somthing working but im sure there is a more "efficent" way to do it as its still returning a large array and loads of the elements are empty??? `sub list{ my $path=shift; my $comp=shift; if (! -e $path){die "Error : $path $!\n";} opendir(DIR,$path) or die "Error : $path $!\n"; return sort map { my ($d,$m,$y) = (localtime( (stat "$path/$_")[9] ) )[3..5]; $m+=1; $y+=1900; $m=($m<10)?"0$m":$m; $d=($d<10)?"0$d":$d; my $date = "$d/$m/$y"; if($date eq $comp){"$_\n"}; } grep(!/^(\.+?)$/,readdir(DIR)); }` [download] any ideas?? Thanks for you help on this so far. (goin to try the while() loop next)	[reply] [d/l]
Re^3: using grep on a directory to list files for a single date by ikegami (Patriarch) on Dec 01, 2004 at 15:07 UTC
What is the `if` in the map trying to do? if `$date eq $comp` is false, `map` adds an undef to the returned list. if `$date eq $comp` is true, `map` returns "$_\n". Below, I assume that you're were trying to filter out dates that don't match. Filtering is `grep`'s job, not `map`'s. The "empty" elements you're getting are the undef returned by `map` when `$date eq $comp` is false. `$!` doesn't have any meaningful value after calling `-e`. The `-e` is redundant. `opendir` will fail if the dir doesn't exist, and you already handle that. The capture in `/^(\.+?)$/` wastes time. The `?` is meaningless. I wonder if `$_ eq '.' \|\| $_ eq '..'` would be faster. It's probably faster to divide $comp in $year, $month, $day than to convert all the mtimes to strings. sub list { my ($path, $comp) = @_; $comp =~ m#^(..)/(..)/(....)$# or die("Error: Badly formatted \$comp.\n"); my $comp_d = $1; my $comp_m = $2; my $comp_y = $3; local *DIR; opendir(DIR, $path) or die("Error: Unable to open directory $path: $!\n"); my @filtered_listing; while (<DIR>) { next if /^\.+$/; my ($mtime_d, $mtime_m, $mtime_y) = (localtime( (stat "$path/$_")[9] ) )[3..5]; next unless ( $mtime_d == $comp_d && $mtime_m == $comp_m && $mtime_y == $comp_y ); push(@filtered_listing, $_); } return sort @filtered_listing; } [download]	[reply] [d/l] [select]
Re^4: using grep on a directory to list files for a single date by ikegami (Patriarch) on Dec 01, 2004 at 16:25 UTC
oh, you might want to return a reference to the array instead of the array itself, especially if it's big. `... return [ sort @filtered_listing ]; } my $array = list(...); foreach (0..$#$array) { ... $$array[$_] ... }` [download]	[reply] [d/l]
Re^3: using grep on a directory to list files for a single date by revdiablo (Prior) on Dec 01, 2004 at 18:25 UTC
ikegami has already posted an excellent reply, showing exactly how to do it with while. That is probably the best way to solve this particular problem, but I thought I would show you how to use map to filter out elements, for your future reference: `my @array = qw(foo bar baz qux); my @newarray = map { my $foo = $_; $foo =~ s/./\u$&/; # useless example $foo =~ /Ba/ ? $foo : () } grep { /a/ } @array;` [download] The key here is to return an empty list when the condition fails. It's neat that we can do this with map, but it's usually better to use another grep: `my @array = qw(foo bar baz qux); my @newarray = grep { /Ba/ } map { my $foo = $_; $foo =~ s/./\u$&/; # useless example $foo } grep { /a/ } @array;` [download] HTH	[reply] [d/l] [select]


Welcome to the Monastery
	PerlMonks