RE: RE: Descending through directories

Replies are listed 'Best First'.
RE: RE: RE: Descending through directories by t0mas (Priest) on Jun 02, 2000 at 00:20 UTC
Did some benchmarking today. I really like knowing the most effective way to solve a certain problem and when Corion posted his code above, I got curious. To Corion, I would like to say that this is no "I'm right - you're wrong" kind of thing. I've enjoyed your code (since I love and use eConsole) for a long time, and I really didn't know which of the ways that was most effective, so please don't take this the wrong way. If someone else have ideas about this please, give it a shot with your own code. I use directory travering quite often so I would really be glad to be able to use the most effective code in my programs. Here we go: use Benchmark; use File::Spec; use File::Find; $t0 = new Benchmark; &t1('C:\\Program'); $t1 = new Benchmark; &t2('C:\\Program'); $t2 = new Benchmark; &t3('C:\\Program'); $t3 = new Benchmark; &t4('C:\\Program'); $t4 = new Benchmark; &t5('C:\\Program'); $t5 = new Benchmark; print "t1: ",timestr(timediff($t1, $t0)),"\n"; print "t2: ",timestr(timediff($t2, $t1)),"\n"; print "t3: ",timestr(timediff($t3, $t2)),"\n"; print "t4: ",timestr(timediff($t4, $t3)),"\n"; print "t5: ",timestr(timediff($t5, $t4)),"\n"; # Opens a dirhandle to read files, another to read sub-dirs and # recursive calls itself foreach subdir it finds sub t1 { my $Dir = shift; opendir(DIR, $Dir) \|\| die "Can't opendir $Dir: $!"; my @Files = grep { /.txt/ && -f "$Dir/$_" } readdir(DIR); closedir DIR; opendir(DIR, $Dir) \|\| die "Can't opendir $Dir: $!"; my @Dirs = grep { /^[^.]./ && -d "$Dir/$_" } readdir(DIR); closedir DIR; foreach $file (@Files) { print $Dir."-".$file."\n"; } foreach $SubDir (@Dirs) { &t1(join("\\",$Dir,$SubDir)); } }; # Opens a dirhandle to read files, rewinds to read sub-dirs and # recursive calls itself foreach subdir it finds sub t2 { my $Dir = shift; opendir(DIR, $Dir) \|\| die "Can't opendir $Dir: $!"; my @Files = grep { /.txt/ && -f "$Dir/$_" } readdir(DIR); rewinddir(DIR); my @Dirs = grep { /^[^.]./ && -d "$Dir/$_" } readdir(DIR); closedir DIR; foreach $file (@Files) { print $Dir."-".$file."\n"; } foreach $SubDir (@Dirs) { &t2(join("\\",$Dir,$SubDir)); } }; # Opens a dirhandle to read all directory contents and # recursive calls itself foreach subdir it finds # Uses File::Spec, which makes it portable sub t3 { my ($Dir) = shift; my ($entry,@direntries,$fullpath); opendir( DIR, $Dir ) or die "Can't opendir $Dir: $!"; @direntries = readdir( DIR ) or die "Error reading $Dir : $!\n"; closedir DIR; foreach $entry (@direntries) { next if $entry =~ /^\.\.?$/; $fullpath = File::Spec->catfile( $Dir, $entry ); if (-d $fullpath ) { &t3($fullpath); } elsif ( -f $fullpath && $entry =~ /.txt/) { print $Dir."-".$entry."\n"; } } }; # Opens a dirhandle to read all directory contents and # recursive calls itself foreach subdir it finds sub t4 { my ($Dir) = shift; my ($entry,@direntries,$fullpath); opendir( DIR, $Dir ) or die "Can't opendir $Dir: $!"; @direntries = readdir( DIR ) or die "Error reading $Dir : $!\n"; closedir DIR; foreach $entry (@direntries) { next if $entry =~ /^\.\.?$/; $fullpath = join("\\",$Dir,$entry); if (-d $fullpath ) { &t4($fullpath); } elsif ( -f $fullpath && $entry =~ /.txt/) { print $Dir."-".$entry."\n"; } } }; # Uses File::Find (whatever it does...) sub t5 { my ($Dir) = shift; find(\&found, $Dir); } sub found { /.txt/ && print $File::Find::dir."-".$_."\n"; } [download] This test was run on a Pentiun 233 with 128Mb RAM, Windows 2000, FAT32 filesystem C:\\Program holds 13477 files in 1206 folders of which 137 matches *.txt t1: 27 wallclock secs ( 8.40 usr + 16.76 sys = 25.17 CPU) t2: 24 wallclock secs ( 7.69 usr + 15.57 sys = 23.26 CPU) t3: 47 wallclock secs (20.30 usr + 23.85 sys = 44.15 CPU) t4: 36 wallclock secs (11.04 usr + 23.33 sys = 34.37 CPU) t5: 30 wallclock secs (11.12 usr + 18.02 sys = 29.13 CPU) /brother t0mas	[reply] [d/l]
Benchmarks under Linux 2.2.x by Corion (Patriarch) on Jun 02, 2000 at 11:58 UTC
I've just run your program (with slight modifications) under Linux on a dual SMP P2-350 machine, on my home directory, whose subdirectories contain about 20 text files and quite a lot (about 500MB) of html files in several directories. The results amazed me. So I did run this test four times in a row, and the last three results were identical but really amazing : t1: 7 wallclock secs ( 2.43 usr + 4.27 sys = 6.70 CPU) t2: 7 wallclock secs ( 2.43 usr + 4.32 sys = 6.75 CPU) t3: 14 wallclock secs ( 8.25 usr + 5.73 sys = 13.98 CPU) t4: 7 wallclock secs ( 1.62 usr + 4.77 sys = 6.39 CPU) t5: 1 wallclock secs ( 0.84 usr 0.21 sys + 0.00 cusr 0.01 csys = 0.00 CPU) The trend we can see is, that everything is faster in general, about the factor 3 or 4, but what really is amazing is, how little time `&t5();` takes, only 1 wallclock second. So I did interchange `&t4()` and `&t5()` to see if that result was order dependant : ... t4: 1 wallclock secs ( 0.95 usr 0.18 sys + 0.00 cusr 0.01 csys = 0.00 CPU) t5: 7 wallclock secs ( 1.75 usr + 4.65 sys = 6.40 CPU) But it wasn't. This is really strange and sheds some new light on File::Find which I always considered clumsy, and which is one of the slower routines under Win32. Wonders of Perl `:)`. To see how the results would change, I then reran your test for files that match `.html` (while going through the source code, there were some things with your regular expressions - the ".txt" RE will match anything consisting of at least four letters with "txt" not at the start and the directory matching will leave out directories which start with a "." (so unix "hidden" directories will not be searched). I ran the test three times and threw away the first test results on about 500 MB of html files. t1: 8 wallclock secs ( 2.59 usr + 4.65 sys = 7.24 CPU) t2: 8 wallclock secs ( 2.47 usr + 4.66 sys = 7.13 CPU) t3: 17 wallclock secs ( 8.65 usr + 5.90 sys = 14.55 CPU) t4: 9 wallclock secs ( 1.67 usr + 5.42 sys = 7.09 CPU) t5: 2 wallclock secs ( 1.04 usr 0.23 sys + 0.00 cusr 0.01 csys = 0.00 CPU) And amazingly, the trend continues, with `&t5()` beating the rest by far, even though I had thought the whole results should have become console bound anyway, but that wasn't so. I wonder what my tests under NT 4 will bring us `:)`	[reply]
RE: Benchmarks under Linux 2.2.x by t0mas (Priest) on Jun 02, 2000 at 12:22 UTC
It seems that Find::File is better implemented on *nix systems, or that it does a better job reading inodes than the FAT. I was quite amazed that the opendir stunt beat it on Win32. Good work. I eagerly await the NT 4 tests. /brother t0mas	[reply]
Benchmarks under NT 4.0 SP3 by Corion (Patriarch) on Jun 04, 2000 at 05:13 UTC
I finally got off my lazy back and ran the test on my home machine, a trusty P-100 with 80 MB RAM, and here are the results (with ActivePerl 5.005_03 build 517): FAT 16 drive (no HD activity during the second run) t1: 17 wallclock secs ( 6.66 usr + 9.89 sys = 16.55 CPU) t2: 16 wallclock secs ( 5.89 usr + 8.47 sys = 14.36 CPU) t3: 41 wallclock secs (16.67 usr + 18.16 sys = 34.83 CPU) t4: 27 wallclock secs ( 8.37 usr + 16.88 sys = 25.26 CPU) t5: 15 wallclock secs ( 7.75 usr + 7.07 sys = 14.82 CPU) NTFS drive (slight HD activity for the later parts of the HD) t1: 96 wallclock secs (30.07 usr + 59.09 sys = 89.17 CPU) t2: 87 wallclock secs (27.73 usr + 53.18 sys = 80.91 CPU) t3: 179 wallclock secs (72.02 usr + 96.92 sys = 168.94 CPU) t4: 142 wallclock secs (36.63 usr + 96.15 sys = 132.78 CPU) t5: 81 wallclock secs (35.33 usr + 43.25 sys = 78.58 CPU) So here `File::Find` is again on par with the solution reading any directory twice and the solution using `rewinddir()`, and my favourite method of doing stuff, `&t4` dosen't look that good either if you are going for peak performance. The fastest solution takes only half the time, and scanning the whole NTFS HD did take some time as you see `:)`. So once again the rule number one of optimizing holds. Benchmark, benchmark, benchmark.	[reply]
RE: Benchmarks under NT 4.0 SP3 by t0mas (Priest) on Jun 04, 2000 at 12:35 UTC
Thanks Corion for the testing. As you say - Benchmark, benchmark, benchmark. Speed is the King many circumstances, but maybe not all. It seems that t1,t2, and t5 is best in this simple kind of searches, but in more complex cases with lots of heavy evaluations and fileops, t3 and t4 (or a more complex t5) is perhaps better. /brother t0mas	[reply]
RE: RE: RE: RE: Descending through directories by Corion (Patriarch) on Jun 02, 2000 at 01:45 UTC
Hello t0mas ! It always amazes me at which places I find users of eConsole - never would I have thought to find a user on perlmonks `:)` ! Thanks for doing these tests - I didn't even know there was a Benchmark module ! What amazes me is, that the method of reading a directory twice (as done in `t1` and `t2`) is faster than reading it once and checking for file/directory afterwards - you never stop learning I guess ... I will run these tests on my machine (a lowly P-100 running NT 4) and maybe on a Linux machine as well to get a more complete view of the behaviour `:)`	[reply]
RE: RE: RE: RE: RE: Descending through directories by t0mas (Priest) on Jun 02, 2000 at 10:51 UTC
Hi Corion. The same thing amazes me. I guess that doing a regexp on all rows at once is faster than doing it on every $entry. I don't know how Perl handles this stuff internaly. Maybe it recomplies the regexp every time it uses it or something. Pleas do run the test. I would like to see if the results you get is along the same line as the ones I got. And about eConsole I would like to say - Transparency Rules... /brother t0mas	[reply]
Re: RE: RE: RE: Descending through directories by softworkz (Monk) on May 29, 2001 at 17:04 UTC
This code is AWESOME!	[reply]
RE: RE: RE: RE: Descending through directories by Corion (Patriarch) on Jun 02, 2000 at 01:46 UTC
Hello t0mas ! It always amazes me at which places I find users of eConsole - never would I thought to find a user on perlmonks `:)` ! Thanks for doing these tests - I didn't even know there was a Benchmark module ! What amazes me is, that the method of reading a directory twice (as done in `t1` and `t2`) is faster than reading it once and checking for file/directory afterwards - you never stop learning I guess ... I will run these tests on my machine (a lowly P-100 running NT 4) and maybe on a Linux machine as well to get a more complete view of the behaviour `:)`	[reply]
RE: RE: RE: Descending through directories by t0mas (Priest) on May 31, 2000 at 10:22 UTC
No offence taken. I think its a good thing to discuss/show different ways to solve the same problem, and I guess we all have our own toolkits of code snippets that we throw into every program we write. Maybe I'll try to benchmark some of the ways when I'll find some time. /brother t0mas	[reply]


We don't bite newbies here... much
	PerlMonks