Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Perl Directory file extension stats.

by sidsinha (Acolyte)
on Apr 15, 2014 at 21:32 UTC ( #1082395=perlquestion: print w/ replies, xml ) Need Help??
sidsinha has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I had a question regarding Perl directory statistics. Are there any perl packages that I can use to list the number of unique file extentions in a given directory?

For ex, it should parse through an input directory (including subdir) and return the number of unique file types (for ex, 200 .bmp , 50 jpg ,20 txt) etc.


Please guide me . thanks.

Comment on Perl Directory file extension stats.
Re: Perl Directory file extension stats.
by choroba (Canon) on Apr 15, 2014 at 21:55 UTC
    Shell solution (*nix):
    find -name '*.*' | rev | cut -f1 -d. | rev | sort | uniq -c | sort -n
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Perl Directory file extension stats.
by Laurent_R (Prior) on Apr 15, 2014 at 22:06 UTC
    The File::Find module should, I think, be able to give you the answer if you provide it with the right callback function.

      Assuming a %stats hash has been declared, the File::Find callback can be as simple as:

      sub wanted { return unless -f; /(?<!^)[.]([^.]+)$/; ++$stats{$File::Find::dir}{$1 || ''}; }

      That will cause all files without an extension, as well as hidden files without an extension (e.g. .bashrc), to be counted together under a '' key. Everything else will counted under keys without a leading '.', e.g. 'pl', 'txt', 'png', etc.

      If counts for individual directories aren't required, that can be simplified even further by just using:

      ++$stats{$1 || ''};

      -- Ken

Re: Perl Directory file extension stats.
by 2teez (Priest) on Apr 15, 2014 at 22:59 UTC

      After a few trials, I was able to do what I want using the file::find module and a few regex. The below code gives me the number files fore each unique extentions in a directory.

      Now I am trying to find the cumulative sizes of each type of file in the directory. For example if there are 50 Html files, I am trying to find the sum of the size of all 50 HTML files and likewise for every extention that my array @exts contains.

      I tried using file::find::rule->..->name( foreach @ext ) to parse through and find the size, but couldnt get it right. Could someone pls guide?thanks

      use strict; use warnings; use feature "switch"; use File::Basename; use File::Find; use Data::Dumper; $Data::Dumper::Sortkeys=1; my $start_dir = "F:/"; my @exts; find (\&print_all_directories, "$start_dir"); print "Parsing\n"; sub print_all_directories { return if -d; my $full_dir_path = $File::Find::name; my ($ext) = $full_dir_path =~ /(\.[^.]+)$/; my ($name,$path,$suffix) = fileparse($full_dir_path,qr"\..[^.]*$") +; push (@exts, $suffix); } my %counts; $counts{$_}++ for @exts; #print Dumper(\%counts); foreach my $name (sort {$counts{$a} <=> $counts{$b}} keys %counts) { printf "%s %s\n", $name, $counts{$name}; }

        You mean something like this, using hash to get all the statistics you need:

        use warnings; use strict; use File::Find qw(find); use Data::Dumper; # don't fix the directory my $start_dir = $ARGV[0] || '.'; my $ext = qr[\.[^.]+]; find( \&finder => $start_dir ); my %file; # hash to collect statistics sub finder { return if $_ eq '.'; if (/($ext)/) { my $size = -s; $file{$1}++; $file{size}{$1} += $size; } } { local $Data::Dumper::Sortkeys = 1; print Dumper \%file; }

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1082395]
Approved by Old_Gray_Bear
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2015-07-04 15:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls