Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: Perl Script Causing high CPU usage

by graff (Chancellor)
on Sep 30, 2013 at 06:14 UTC ( #1056283=note: print w/replies, xml ) Need Help??

in reply to Perl Script Causing high CPU usage

If I understand you correctly, you're saying that when the job is i/o bound (because it's doing lots of "mv, zip or unlink"), CPU usage is "okay" (at 4-11%), but when there's relatively little need to alter disk content, CPU usage shoots up as high as 83%, which is bad for some reason.

The replies above about reducing how often you invoke "stat" will help some, but I would also tend to worry about having so many files in a single directory that everything gets bogged down, just because it takes so long to scan a directory that has so many files in it (especially if you're doing multiple stat calls on every file, instead of using all the information you get from a single stat call).

If you're seeing 190,000 files being created in less than a week (over 2700 a day), you might want to see if you can divide that up among different directories, to limit the number of files per directory. File age seems to be most important, and it's apparently ok to move things around, so you might want to try creating a directory for each date, and move files into daily directories according to their age. That would make things a lot easier to manage, in addition to reducing the overall load on both cpu and disk system.

Replies are listed 'Best First'.
Re^2: Perl Script Causing high CPU usage
by Anonymous Monk on Oct 01, 2013 at 01:24 UTC

    Hi All, Thanks a lot for so many ideas. To be more clear, in a single day system generates 5-6 lakh files in just 3-4 directories consuming around 10-14 GB space per day (individual file size is not huge - 12345 bytes, 5-6 digits number in bytes). There are files of 4-5 different patterns but count is huge. No date timestamp in file name (e.g ABC_YYYMMDD.log), just random 8 digits number in file names (i.e. ABC_????????.log) As per my observation, glob is not consuming high CPU. As soon as we are entering while loop, then can see CPU spikes. After commenting out mv, zip, unlink, just keeping print statements, I noticed CPU around 80%. After adding below in "next" section, CPU is under control 10-12% or even low to 4% sometime. select (undef, undef, undef, 0.250); but already process running slow because of huge number of files and stat calls as highlighted by you "all" above, cant afford sleep. Once again thanks to all of you, will try out various ideas given by you. Thanks for letting me know about File::Find::Rule as well.

      Good luck in the hunt -- if the loop is using lots of CPU just to do print statements, then I don't know that there's much you can do to limit its CPU utilization. Sounds like the OS is giving your script everything it needs; are you sure it's a problem?

      As a complete side note, I am deadly curious -- what is a lakh file? Is that some kind of special log format or something?

        I suspect that the poster is using the Indian term for 100,000

        Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1056283]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2018-05-25 08:04 GMT
Find Nodes?
    Voting Booth?