http://www.perlmonks.org?node_id=1066128


in reply to system "gzip $full_name" is taking more time

There are several things you can do to improve your script.

The first thing I'd do would be to use the File::Find module. It handles the recursion to traverse a directory tree.

I'd also use another one of the cpan modules, such as IO::Compress::Gzip to replace your system call to gzip.

Rather than compressing each individual file by itself, I'd probably group them into one or more tar.gz archives. Maybe one archive per directory.

  • Comment on Re: system "gzip $full_name" is taking more time

Replies are listed 'Best First'.
Re^2: system "gzip $full_name" is taking more time
by dushyant (Acolyte) on Dec 07, 2013 at 15:41 UTC

    I Tried using IO::Compress::Gzip but it is not replacing original file but creating another compressed file. So I have to make another system call for removing original file. And this will not improve the performance.

    I Cant create one compressed file for all. I have to follow some rules and standard in my production environment.

    I am thinking to run 5-6 copies of same script. Dividing 120 directories between them.

      So I have to make another system call for removing original file

      You mention that you are aware of perl's unlink.

      But better use the move command from File::Copy that ships with every perl.

      Cheers, Sören

      Créateur des bugs mobiles - let loose once, run everywhere.
      (hooked on the Perl Programming language)

      There are other similar modules to choose from and I'm sure at least one of them would be able to replace the original file. I have not looked over the related modules in any detail, so I can't say which one would be better suited for your needs.

      What do you think would be a reasonable amount of time to compress an averaged size file from one of the directories from the command line? Would you say that a tenth of a second would be reasonable? Multiple that time by 90,000 and that will give you a very rough estimate of the required time per directory. Assuming a tenth of a second average, you'd be looking at more than 2 hours per directory and that's not including the overhead of executing the system function/call.

      Having an average of 90,000 files per dir seems to be a major factor in the overall problem. Can you rework your policies to work on 30 day intervals rather than 90 day intervals?