Re: Efficient processing of large directory

Replies are listed 'Best First'.
Re: Re: Efficient processing of large directory by Elliott (Pilgrim) on Oct 02, 2003 at 16:47 UTC
Thanks for the tip - but why? (Most of all I want it to work - but I also want to understand)	[reply]
Re: Re: Re: Efficient processing of large directory by dragonchild (Archbishop) on Oct 02, 2003 at 16:54 UTC
It has to do with how foreach (and for, which is an exact synonym) works. foreach will construct the entire list, then iterate through it. This can be very memory-intensive, which will slow the processing speed (due to cache misses and virtual memory issues.) A nearly exact rewrite of foreach in terms of while would look something like: `foreach my $n (<.>) { # do stuff } ---- my @list = <.>; my $i = 0; while ($i <= $#list) { my $n = $list[$i]; # do stuff } continue { $i++; }` [download] ------ We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.	[reply] [d/l]
Re: Re: Re: Re: Efficient processing of large directory by BUU (Prior) on Oct 02, 2003 at 17:51 UTC
Er, why on earth do you tell him to use while and then to use while to do the exact same thing the for loop had been doing previously? Your method is still going to need to build the 17,000 element list and iterate over it, it just uses a more explicit form. A rewrite which gets around this would be simply: `while(my $x = <.>) { do_stuff($x); }` [download] This will only read a single file at a time and has no need to create huge lists.	[reply] [d/l]
•Re: Re: Re: Re: Re: Efficient processing of large directory by merlyn (Sage) on Oct 02, 2003 at 18:54 UTC
Re5: Efficient processing of large directory by dragonchild (Archbishop) on Oct 02, 2003 at 17:53 UTC
Re: Re: Efficient processing of large directory by Elliott (Pilgrim) on Oct 03, 2003 at 15:37 UTC
I've tried it now with `while` ... and it timed out :-( Looks like I'd better try subdirectories too.	[reply] [d/l]
Re^3: Efficient processing of large directory by Aristotle (Chancellor) on Oct 03, 2003 at 22:19 UTC
You should readdir instead. Also, if this is running from a CGI (I guess that's what timing out is referring to), then make sure to give the client a few bytes of data every now and then so it doesn't give up waiting. Makeshifts last the longest.	[reply]
Re: Re^3: Efficient processing of large directory by Elliott (Pilgrim) on Oct 05, 2003 at 15:57 UTC
Now I know that readdir exists (thanks Aristotle!) I was able to RTFM and put it into practice. Those functions that do not require me to open the files have improved stunningly. So much so that the client rang me at home to thank me and backed it up with an email full of exclamation marks. Now I have to further improve the processes that have to read all the files. But I guess that's another thread!	[reply]
Re: Re^3: Efficient processing of large directory by BrowserUk (Patriarch) on Oct 04, 2003 at 00:11 UTC
readdir is less efficient than glob if a subset of the directory contents is sought. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply]
Re^5: Efficient processing of large directory by Aristotle (Chancellor) on Oct 04, 2003 at 07:35 UTC


Perl: the Markov chain saw
	PerlMonks