Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^2: cron script best practices

by jimbus (Friar)
on Aug 10, 2005 at 21:35 UTC ( #482770=note: print w/replies, xml ) Need Help??

in reply to Re: cron script best practices
in thread cron script best practices

Here is my crontab:
reports@clarkkent/home/reports(7): crontab -l
00 06 * * * /home/reports/ftp/SMSC0/
00 06 * * * /home/reports/ftp/MMSC1/
20,35,50,05 * * * * /home/reports/ftp/YTSMSC50/
20,35,50,05 * * * * /home/reports/ftp/FDSMSC/
00 06 * * * /home/reports/ftp/proptima/ is the script I'm checking on ("ps -ef|grep loadData|wc -l"). The first two should be running once a day at 6am and the second two every fifteen minutes... which is 96 times each per 24 hour period. I'm assuming the issue is with the second two, which are the same but for different boxes.

These scripts digest a log file that is a series of reports from about 12 nodes, each one has between 50 and 225 key and value pairs, one per line. I loop through the nodes, building a hash of the key/value, then build a huge insert based from them... with upto 225 columns, the insert is built dynamically.

I have filled /usr a couple times, once recently. I thought things would recover, but I end up with all these processes and mysqld running at 60-70% of cpu.

I guess the real thing is I'm resource strapped and perl inexperienced and getting a bit overwhelmed by the amount of data being chucked at me and was hoping to find someone who had documented what it took to write mature cron/logging scripts :) With Perl and JDBC for JSP, I find all kinds of simplest case stuff on the web, but not a lot on what I would think would be typical useage patterns.



Never moon a werewolf!

Replies are listed 'Best First'.
Re^3: cron script best practices
by anthski (Scribe) on Aug 10, 2005 at 23:09 UTC
    20,35,50,05 * * * * /home/reports/ftp/YTSMSC50/ 20,35,50,05 * * * * /home/reports/ftp/FDSMSC/

    Do these scripts need to be run simaltaneously? You could immediately reduce the number of connections to your DB if one script ran, executed, exited, and then the other was fired off.


    20,35,50,05 * * * * /home/reports/ftp/YTSMSC50/; /home/repo +rts/ftp/FDSMSC/

    I've got to ask - do any of your inserts work at all? As mentioned in another response, if your script is working fine from hand but not from cron, it may be an environment issue. I'd modify my cron to something like

    20,35,50,05 * * * * /bin/env > /tmp/env.output; /home/reports/ftp/YTSM +SC50/; /home/reports/ftp/FDSMSC/

    and then check the contents of /tmp/env.output and compare them to the output of env when you run it at a command line for any important/potential differences. You could then set these env variables to your perl script.

    Some other obvious things would be to make sure that you're disconnecting from the db. And if it's not running properly from cron on a regular basis, then run it only once from cron and debug that and ensure that it does run fine from cron, before filling up your cron with multiple runs each hour.

    Finally, why is your /usr filling up on a regular basis as a result of this script?

Re^3: cron script best practices
by polettix (Vicar) on Aug 11, 2005 at 11:37 UTC
    Some thoughts:
    • Do you really need 225 columns in one table? Maybe you could split data over different tables, keeping columns together based on what they represent - these "classes" shouldn't be difficult to spot among 225 possible keys of a log from an SMSC.
    • It seems that you basically replicate the script inside many directories - I hope this is done via (hard|sym)linking instead of plain copy. You could probably add an input parameter to the script, kept in a single known point: it could increase maintainability.
    • As others said, you should avoid to have them run contemporary. This could mean avoding CRON entirely: I was once biten by a similar problem (collection and elaboration of data from RNCs or from provisioning nodes) and I eventually resorted to using a single scheduling script that runs the jobs *sequentially* instead of in parallel. OTOH, if you need to stick to CRON, try to time the execution time of the different processes, and strive to separate their executions by at least those execution times (in the case of the repetitive tasks it would be probably wise to use 05,20,35,50 for one and 12,27,42,57 for the other).
    • If you fill your disks... you need bigger ones. Probably some monitoring script with some alarm capabilities would help too.

    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
Re^3: cron script best practices
by Anonymous Monk on Aug 11, 2005 at 13:02 UTC
    Questions, remarks:
    • How come the script is filling up /usr? Where is it writing, with whose permission, and why? Ideally, the size of /usr should only change when installing patches, or upgrading your Operating Environment.
    • Do you script actually do what they are supposed to do? Do you scripts connect to the database, or are they just hanging there, trying to log on?
    • How fast do your scripts run "by hand"? If it takes 20 minutes by hand, and you start one every 15 minutes, you will run into problems.
    • To avoid having to many instances running, if I write cron jobs making database connections that fire every 15 minutes, I use a lock file to avoid multiple instances from running. Policies can vary: the one failing to get the lock exits, the one failing to get a lock kills the one holding the lock, or a combination of the two (exit if the lock is held by a process that started less then $X minutes ago - else kill the one holding the lock). Waiting for the lock usually isn't a good idea.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://482770]
and the fog begins to lift...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2017-07-28 13:35 GMT
Find Nodes?
    Voting Booth?
    I came, I saw, I ...

    Results (429 votes). Check out past polls.