Beefy Boxes and Bandwidth Generously Provided by pair Networks httptech
Think about Loose Coupling
 
PerlMonks  

help me fork

by mhearse (Hermit)
on Jul 20, 2004 at 13:09 UTC ( #375891=perlquestion: print w/ replies, xml ) Need Help??
mhearse has asked for the wisdom of the Perl Monks concerning the following question:

Howdy monks. I'm trying to benchmark syslogd on my loghost. I've created a shell script to do this, but I really want to beat the machine up. With Perl, can I use fork to increase the number of messages sent to syslogd (Within limits of course. The current script sends 3000 per minute)? I have never used fork before. As seen below, I am using the logger command to interact with syslogd. Thanks in advance.
#!/bin/sh start=`date "+%M:%S"` x=0 trap 'echo ""; echo "started at $start"; echo "finished at `date "+%M:%S"`"; echo "sent $x messages"; echo "found `grep TEST_MESSAGE /var/log/messages | wc -l` messag +es"; exit' 2 while() do logger -p syslog.notice TEST_MESSAGE_$x x=`expr $x + 1` echo sent $x done

Comment on help me fork
Download Code
Re: help me fork
by Joost (Canon) on Jul 20, 2004 at 13:14 UTC
      If yes, how?
Re: help me fork
by mutated (Monk) on Jul 20, 2004 at 13:22 UTC
    I'm going to say no, well not much anyways, unless you have a multi-processor system (If you do have a multiprocessor system forking off as many proccesses as you have processors would be a good thing). When you start forking all you are doing is wasting time switching between contexts, it doesn't create more processor time, it just shares the processor between more processes. I suspect what you really want to do is increase the priority of your process, see the manpage for the unix command nice. The big thing here is the bottle neck probably isn't your program it's syslog blocking while it tries to enter your log.


    daN.
      Ahh. So fork is not the best answer. Better to make the program nicer. Will give it a try.
        Perhaps some additional info is warranted. The loghost in question is servicing messges from about 40 other machines. Some of the messges aren't being recorded properly(if at all). I assumed either a UDP issue or syslogd. A sniffer and netstat showed no UDP issues. So I am trying to determing how far I can push syslogd.
      This is bad advice. More precisely, it is advice that only applies to CPU-bound jobs. If your job spends a significant fraction of its time waiting for network or disk, then you're wrong.

      Increasing the priority of something that isn't waiting for CPU does you no good at all since they aren't having trouble there. Adding processes is good because it is not that much extra work to have 5 processes waiting for something external rather than 1. And while a disk is spinning around to where one process has its data, nothing stops it from reading or writing somewhere else for another process. Note that syslogd is an I/O bound process, so unless there is a global lock preventing two copies doing work at once, it will benefit from running multiple times. Of course too many waiting jobs runs into trouble as the disk is trying to do too many things.

      What the optimal threshold is for any particular job is highly dependent on your exact hardware and configuration. Test and benchmark it. The last time that I did this for an I/O bound job, I found that on the machine I tested for the job that I was doing, I got maximum throughput at 5 jobs. I therefore batched my report to run 5 at a time. For a database-bound job I found that I got the best results at 7 copies. Had I taken your advice in either case I would have only used 2 processes - and would have got less than half the throughput that I did.

        Thanks for the reply. I'm definitely learning something here. Just to clarify, you are suggesting to experiment to find the optimum number of simultaneous instances of my benchmark program. You mentioned running them in batch. Would this best be done using the afore mentioned Parallel::ForkManager module? I've been reading up on fork. I don't believe that the plain fork function has the ability to control the number of children, does it? Is there a general rule to tell whether a process is CPU or I/O bound?
Re: help me fork
by pbeckingham (Parson) on Jul 20, 2004 at 13:24 UTC

    I would guess that using Perl as a load driver for syslogd is more likely to beat up your machine because of the forking Perl.

Re: help me fork
by Zaxo (Archbishop) on Jul 20, 2004 at 15:33 UTC

    If you are runnning a central log server, you can connect several machines through udp on port 514 as well as any other sockets you have instructed syslogd to listen on. You don't need to mess with the call to logger if you know what ports or devices you can use. You can see what your syslogd is listening to with the system call lsof -p `pidof syslogd`.

    I think that some monks' warnings about fork not helping this are bogus. This is an I/O heavy application, and any process talking over a port or to a disk file spends a lot of time sleeping, waining for the IO system to respond. Having many processes active uses time the sleepers have relenquished.

    I think your question is really about how to use fork. Here is a snippet which will open a connection to udp port 514 on host "logserver", and then spawn fifty processes to all talk at once. Untested, I don't have remote logging set up here.

    use IO::Socket::INET; my $log = IO::Socket::INET->new( PeerAddr => 'logserver', PeerPort => 514, Proto => 'udp' ); die unless $log->connected();
    That provides an IO::Socket handle to the syslogd port on the server. The handle will be duplicated by all the child processes we spawn. You spoke of wanting to beat 3000 messages per minute. run under time to check. We'll fork 50 children to each send 60 messages.
    my %kid; for (1..50) { my $pid = fork; $kid{$pid} = undef, next if $pid; next if not defined $pid; # in child undef %kid; close STDERR; close STDOUT; close STDIN; for (1..60) { $log->send("<DEBUG> Child $0: Message #$_\n"); } exit 0; } # No Zombies! delete $kid{wait()} while %kid;
    The socket code will probably need twiddling, but the fork related stuff is what you wanted. In particular, it may be desirable to move the socket constructor inside the loop so each child has its own connection to the server.

    The %kid hash is how the parent process keeps track of what child processes there are. Using wait at the end reaps child exits to prevent zombies from forming, and also causes the parent to hang around until they are done. That makes the entire operation easy to time.

    After Compline,
    Zaxo

      Thanks. I'm going to vary the number of children vs. log messages per child. To see what combination produces 3000 messges the fastest.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://375891]
Approved by NovMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-04-17 04:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (439 votes), past polls