Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: Finding and sorting files in massive file directory

by pvaldes (Chaplain)
on Jan 20, 2013 at 21:04 UTC ( #1014332=note: print w/replies, xml ) Need Help??

in reply to Finding and sorting files in massive file directory

I have a directory with c. 2-3 million files (and growing) that I need to categorise and compress.

shell commands on the directory - find, ls etc - just hang!

Use wildcards in bash or glob in perl (ie: Fragment and move to several directories with mv a*.* /A-dir. Compress the files beginning for b with something like gzip b*.*)

  • Comment on Re: Finding and sorting files in massive file directory

Replies are listed 'Best First'.
Re^2: Finding and sorting files in massive file directory
by CColin (Scribe) on Jan 20, 2013 at 21:30 UTC
    Seem to recall when I tried something like this it failed due to too many arguments being passed to the command. ie. you can't run a million or so files through gzip command?
      If you put something like this in a while (there are files that start w/ a) loop, you should get around the issue of too many arguments. (Be sure to set increment=1 before the loop).
      mkdir dir_a.$increment mv `ls a* | head -n1000` dir_a.$increment/ let increment++
        The problem is you will get "too many arguments" from ls a* already.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      See the xargs command. It can help with breaking apart large command lines.

      find . -type f -name b\* -depth -2 | xargs $command_that_can_be_run_mu +ltiple_times

      Update: added example.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1014332]
[erix]: ah, pcregrep is a stanalone program?
[Lady_Aleena]: erix, yes.
[Lady_Aleena]: pcre stands for 'Perl Compatible Regular Expressions'.
[erix]: ok, ok, I thought you meant the engine
[perldigious]: Tanktalus: Ha, understandable, that does make it harder when you don't want to move. I assume you are looking at remote work from home jobs too?
[Tanktalus]: yeah, I'm looking at a lot of remote work - I've been working from home since 2002, so I know how to do the work from remote, I just need a new employer ;)
[Lady_Aleena]: Discipulus, I wouldn't know how to get a script to do what the greps do in one line.
[perldigious]: Even working for a large software engineering contractor that's willing to do remote could be an option. The fortune 500 company at $job-- I worked for had quite a lot of that work to hire out to large contractors.
[perldigious]: The pay and benifits leave a lot to be desired though, or so I've heard...
[Lady_Aleena]: Discipulus, grep doesn't have an option to go recursively through a directory as far as I know.

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (14)
As of 2017-05-23 19:27 GMT
Find Nodes?
    Voting Booth?