Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Perl Directory file extension stats.

by sidsinha (Acolyte)
on Apr 15, 2014 at 21:32 UTC ( #1082395=perlquestion: print w/replies, xml ) Need Help??

sidsinha has asked for the wisdom of the Perl Monks concerning the following question:


I had a question regarding Perl directory statistics. Are there any perl packages that I can use to list the number of unique file extentions in a given directory?

For ex, it should parse through an input directory (including subdir) and return the number of unique file types (for ex, 200 .bmp , 50 jpg ,20 txt) etc.

Please guide me . thanks.

Replies are listed 'Best First'.
Re: Perl Directory file extension stats.
by Laurent_R (Canon) on Apr 15, 2014 at 22:06 UTC
    The File::Find module should, I think, be able to give you the answer if you provide it with the right callback function.

      Assuming a %stats hash has been declared, the File::Find callback can be as simple as:

      sub wanted { return unless -f; /(?<!^)[.]([^.]+)$/; ++$stats{$File::Find::dir}{$1 || ''}; }

      That will cause all files without an extension, as well as hidden files without an extension (e.g. .bashrc), to be counted together under a '' key. Everything else will counted under keys without a leading '.', e.g. 'pl', 'txt', 'png', etc.

      If counts for individual directories aren't required, that can be simplified even further by just using:

      ++$stats{$1 || ''};

      -- Ken

Re: Perl Directory file extension stats.
by choroba (Archbishop) on Apr 15, 2014 at 21:55 UTC
    Shell solution (*nix):
    find -name '*.*' | rev | cut -f1 -d. | rev | sort | uniq -c | sort -n
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Perl Directory file extension stats.
by 2teez (Vicar) on Apr 15, 2014 at 22:59 UTC

      After a few trials, I was able to do what I want using the file::find module and a few regex. The below code gives me the number files fore each unique extentions in a directory.

      Now I am trying to find the cumulative sizes of each type of file in the directory. For example if there are 50 Html files, I am trying to find the sum of the size of all 50 HTML files and likewise for every extention that my array @exts contains.

      I tried using file::find::rule->..->name( foreach @ext ) to parse through and find the size, but couldnt get it right. Could someone pls guide?thanks

      use strict; use warnings; use feature "switch"; use File::Basename; use File::Find; use Data::Dumper; $Data::Dumper::Sortkeys=1; my $start_dir = "F:/"; my @exts; find (\&print_all_directories, "$start_dir"); print "Parsing\n"; sub print_all_directories { return if -d; my $full_dir_path = $File::Find::name; my ($ext) = $full_dir_path =~ /(\.[^.]+)$/; my ($name,$path,$suffix) = fileparse($full_dir_path,qr"\..[^.]*$") +; push (@exts, $suffix); } my %counts; $counts{$_}++ for @exts; #print Dumper(\%counts); foreach my $name (sort {$counts{$a} <=> $counts{$b}} keys %counts) { printf "%s %s\n", $name, $counts{$name}; }

        You mean something like this, using hash to get all the statistics you need:

        use warnings; use strict; use File::Find qw(find); use Data::Dumper; # don't fix the directory my $start_dir = $ARGV[0] || '.'; my $ext = qr[\.[^.]+]; find( \&finder => $start_dir ); my %file; # hash to collect statistics sub finder { return if $_ eq '.'; if (/($ext)/) { my $size = -s; $file{$1}++; $file{size}{$1} += $size; } } { local $Data::Dumper::Sortkeys = 1; print Dumper \%file; }

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1082395]
Approved by Old_Gray_Bear
Front-paged by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2020-10-20 23:53 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (212 votes). Check out past polls.