Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Optimizing performance for script to traverse on filesystem

by kejohm (Hermit)
on Feb 01, 2012 at 21:37 UTC ( #951307=note: print w/ replies, xml ) Need Help??


in reply to Optimizing performance for script to traverse on filesystem

Rather than rolling your own code, you should probably use a module like File::Find. Here is an example that should do want you're trying to do (untested):

#!perl use 5.012; use File::Find; use List::Util qw(min); my $workdir = ...; my %top_size_mailbox; my $box_size; my $all_mailbox_size = 0; my $all_mailbox_count = 0; my $empty_mail_box = 0; find( { -wanted => sub { next if m/^\.+$/; next if m/\.(dat|mdb|snapshot)$/; if ( -d and m/^\d+\@\w/ ) { $all_mailbox_count++; $box_size = 0; } elsif ( m/\.msg$/ ) { my $msg_size = -s _; if ( $msg_size < 4096) { $box_size += 4096; } else { $box_size += $msg_size; } } }, -postprocess => sub { if ( $File::Find::dir =~ m/(\d+)\@/ ) { my $msisdn = $1; $all_mailbox_size += $box_size; if ( $box_size == 0 ) { $empty_mailbox++; } else { top_size_mailbox( $msisdn, $box_size ); } } }, } $workdir, ); sub top_size_mailbox { my ( $msisdn, $box_size ) = @_; if ( keys( %top_size_mailbox ) < $num_top_size_box ) { $top_size_mailbox{$box_size} = $msisdn; } else { my $min = min( keys %top_size_mailbox ); if ( $box_size > $min ) { delete $top_size_mailbox{$min}; $top_size_mailbox{$box_size} = $msisdn; } } } __END__

There are other similar modules like File::Find::Rule that you could also try.


Comment on Re: Optimizing performance for script to traverse on filesystem
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://951307]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2015-07-28 08:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (254 votes), past polls