Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: fast count files

by gautamparimoo (Beadle)
on Jan 06, 2012 at 08:50 UTC ( #946551=note: print w/ replies, xml ) Need Help??


in reply to Re: fast count files
in thread fast count files

Windows os NTFS filesystem


Comment on Re^2: fast count files
Re^3: fast count files
by BrowserUk (Pope) on Jan 06, 2012 at 09:48 UTC

    If you literally just want to count the files on the entire disk, this is by far the fastest simple method I know of.

    It counts the 1.2 million files on my cold-cache, 640GB (400GB used) drive in a little under 7 minutes:

    $t=time; $n = `attrib /s c:\\* | wc -l`; printf "$n : %.f\n", time()-$t;; 1233597 : 394

    Try it and see how you fare. I vaguely remember finding a faster method years ago, and I'll try to remember enough to look it up.

    Note: Don't do my @files = `attrib /s c:\\*`; my $n = scalar @files; All the memory allocation slows things down horribly.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      You might have issues using wc -l on Windows, unless you have installed it from somewhere like GNU. However it is easy to implement your own in Perl.
      Based on BrowserUk's ideas:
      use strict; use warnings; my $t=time; # $n = `attrib /s c:\\* | wc -l`; open (my $pipe, '-|', 'attrib /s c:\\*') or die "attrib: $!"; my $n = 0; while (<$pipe>) { $n++ } close $pipe; printf "$n : %.f\n", time()-$t;
      Gives: 380366 : 135 on a used space of 297GB.

      Thanks for that quick reply But I have 2 use my @files = `attrib /s c:\\*`; my $n = scalar @files; as without it says that wc is an unrecognised command .Also the I am showing the code of the fastest method I know

      use strict; my $f; # number of files my $d; # number of dirs sub count_files { my ($ref) = @_; foreach my $dir (@$ref) { $dir = readlink $dir and chop $dir if -l $dir; # read link next unless opendir(my $dir_h, $dir); # open dir o ++r next my @dirs; while (defined(my $file = readdir $dir_h)) { if ($file eq '.' or $file eq '..') { next; } if (-d "$dir/$file") { ++$d; # counting d +irs push @dirs, "$dir/$file"; } elsif(-f _) { ++$f; # counting f +iles } } closedir $dir_h; count_files(\@dirs); } [$f, $d]; } foreach my $arg (@ARGV) { my @dir = -d $arg ? $arg : next; ($f, $d) = (0, 0); print "$arg\nFiles\t: $$_[0]\nFolders\t: $$_[1]\n" for count_files +(\@ dir); }

      suggest a faster method and also how to run without using my @files = `attrib /s c:\\*`; my $n = scalar @files;

        But I have 2 use my @files = `attrib /s c:\\*`; my $n = scalar @files;

        I did warn you against loading the entire list into perl. It really slows things down.

        as without it says that wc is an unrecognised command

        There are various cures for that possible:

        • Download yourself a copy of wc.exe for Windows.

          It is easy to find and its a program that is just to useful to be without.

        • Use attrib /s c:\* | perl -nE"}{say $."

          It is a poor substitute but works for this use.

        • Use the suggestion by cdarke.

          Simple and effective.

        But I remembered a faster method. This uses the Windows Script Host to do the donkey work via Win32::OLE and runs 3 times faster on my machine.

        2:14 instead of 6:40 on my machine. It also counts the directries as it goes which may or may not be useful to you:

        #! perl -slw use strict; use Time::HiRes qw[ time ]; use Win32::OLE qw[in]; my $start = time; my $fso = Win32::OLE->new( 'Scripting.FileSystemObject' ); my @folders = $fso->GetFolder( $ARGV[0] ); my $cFolders = 0; my $cFiles = 0; while( @folders ) { local $^W; my $folder = pop @folders; $cFiles += $folder->Files->Count; $cFolders += $folder->Subfolders->Count; for my $subFolder ( in $folder->SubFolders ) { $cFiles += $subFolder->Files->Count; $cFolders += $subFolder->SubFolders->Count; push @folders, $_ for in $subFolder->SubFolders ; } } my $seconds = time - $start; my $minutes = int( $seconds / 60 ); $seconds %= 60; printf "Folders:$cFolders Files:$cFiles [%u:%.2f]\n", $minutes, $seconds; __END__ [12:05:12.81] c:\test>countFiles c:\ Folders:68860 Files:1234105 [2:14.00]

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946551]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (17)
As of 2014-07-23 15:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (145 votes), past polls