Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Finding and sorting files in massive file directory

by dave_the_m (Parson)
on Jan 20, 2013 at 20:38 UTC ( #1014326=note: print w/ replies, xml ) Need Help??


in reply to Finding and sorting files in massive file directory

You don't make it clear whether all these files are in a single directory, or in a directory heirarchy. I'm going to assume the former. Most shell commands and much perl code will appear to hang forever, since it will attempt to read the entire directory listing into memory, then sort it, before doing anything else. What you need (in general terms) is the following perl code, which must be run from the directory in question:

#!/usr/bin/perl use warnings; use strict; opendir my $dir, '.' or die "opendir .: $!\n"; my $file; my $count = 0; while (defined($file = readdir($dir))) { # give yourself some progression feedback $count++; print "file $count ...\n" unless $count % 1000; # skip all files not begining with b next unless $file =~ /^b/; # if you've created directories, may need to skip them; # this will slow things down, so don't do so unless necessary next unless -f $file; # do something with the file rename $file, "b/$file" or die "rename $file b/$file: $!\n"; }
This example deals with directory entries at the most efficient and lowest level. In this case, it just moves all files starting with "b" into the subdirectory b/.

Obviously it needs adapting to your particular needs. For example, the rename could become

system "gzip", $file;

Dave.


Comment on Re: Finding and sorting files in massive file directory
Select or Download Code
Re^2: Finding and sorting files in massive file directory
by CColin (Scribe) on Jan 20, 2013 at 21:28 UTC
    >You don't make it clear whether all these files are in a single directory Yes, they are. So basically this reads one file into memory at a time? Do you know how to deal with zipping and tarring each file up in that case?
      So basically this reads one file into memory at a time?
      No, it reads one filename at a time into memory.
      Do you know how to deal with zipping and tarring each file up in that case?
      Well I've already shown you the easy way to compress individual files with the external gzip command. If you want to combine multiple file into a single tar file (possibly compressed), you're going to have to be more specific: how many files approximately are you wanting to put in a single tar file? All 2 million of them? Or just a select few? And do you want to delete the files afterwards?

      You are likely to need to use the Archive::Tar module.

      Dave

        Hi,

        I'd like to be able to combine the distinct file types into single compressed tar files. There are approx 5 types, split roughly c. 1 million for one type and the other 4 c. 300k - 500k each.

        Yes, the files need to be deleted afterwards to make disk space for incoming.

        Colin

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014326]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (15)
As of 2014-12-19 20:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (91 votes), past polls