http://www.perlmonks.org?node_id=883449


in reply to Fastest way to recurse through VERY LARGE directory tree

Benchmark.

It may very well be that whatever you come up with is only faster than File::Find on some setups of disks/volume managers/filesystem, and slower on others.

You need to benchmark to find out.

Now, in theory, carefully handcrafting something that does exactly what you need is going to be faster than a more general setup than File::Find. But whether that's actually be measurable is a different question.

So, benchmark.

Of course, as you describe the problem, the bottleneck might very well be your I/O.

Hence, benchmark.

Have I said you should benchmark? No? Well, benchmark!

Replies are listed 'Best First'.
Re^2: Fastest way to recurse through VERY LARGE directory tree
by Anonymous Monk on Feb 08, 2018 at 09:00 UTC
    Currently the best way for lightweight scanning very big directory tree, is using library File::Find::Object::Rule

    Using this version, you can make secure iterator object, that do not load all scanned tree into memory before it start to work. Example use is very simple as iterator mode:

    $rule=File::Find::Object::Rule->new(); $rule->Some_filter_method_read_library_examples(parameters)->eventuall +y_next_filter(); $rule->start(path_or_array_of_paths); #here will be initialized iterat +or. don't panic, it will not load all big directory structure while (){ my $item=$rule->match(); #read one single item. I prefer do it here, + it prevents matching name as while loop break last unless defined $item; #stop looping after last element #here do anything with $item, it is path, example: printf "Fetched [%s]\n",$item; if (-l $item) {print "it is symbolic link\n"}; };
    you can leave this loop in any state, and for example start next scanning by calling next $rule->start(@new_searches). It will be reinitialized, for me it works. Of course, in that situation you'' use identical filters as previous. If you want do with different filters, call .....->new() and $rule->some_filters() again. warning, this is fork from library File::Find::Rule and File::Find, currently unmaintained for a long time. this notice I found on metacpan.