Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Sun Solaris (SPARC processor) + Threads + performance/optimization

by gulden (Monk)
on Apr 15, 2009 at 15:22 UTC ( #757699=perlquestion: print w/replies, xml ) Need Help??
gulden has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm developing a perl script to import data from a file and put it a MySQL Database. I'm using a Threaded::Queue to put each line of the file, and in parallel N threads to read from that queue and put it in the DB. In Sun Solaris (UltraSPARC T2) we must use parallelism in order to take advantage from the CPU. However i'm not getting improvements when increasing the number of threads to process the Thread::Queue. We are getting processing times very similar for both approaches, using and not using threads. I'm already try to use two parallel Queues and the results are similar. Any tips? Snnipet of the script:
#!/opt/coolstack/bin/perl -w use strict; use POSIX; use threads ('yield', 'stack_size' => 64*4096, 'exit' => 'threads_only', 'stringify'); use Thread::Queue; my $nthreads = 16; my $indata = new Thread::Queue; print "START PARSER: " . localtime() . "\n"; my @tloaders; print "LAUNCH LOADER THREADS\n"; my ($thr) = threads->create(\&load, $fname); push @tloaders, $thr->tid(); print "LOADER THREADS LAUNCHED\n"; print "LAUNCH WORKER THREADS\n"; for (my $i=0; $i < ($nthreads); $i++) { my ($thr) = threads->create(\&analyse ); } print "WORKER THREADS LAUNCHED\n"; print "WAITING FOR LOADERS TO FINISH\n"; foreach my $i (@tloaders) { my $thr = threads->object($i); print "WAITING ON THREAD: " . $thr->tid() . "\n"; $thr->join(); } print "LOADERS HAVE FINISHED: " . localtime() . "\n"; for (my $i=0; $i < ($nthreads); $i++) { $indata->enqueue("STOP"); } foreach my $thr (threads->list()) { print "WAITING ON THREAD: " . $thr->tid() . "\n"; $thr->join(); } print "END PARSER: " . localtime() . "\n"; exit;
Perl Version
$ perl -v This is perl, v5.8.8 built for sun4-solaris-thread-multi Copyright 1987-2006, Larry Wall

Replies are listed 'Best First'.
Re: Sun Solaris (SPARC processor) + Threads + performance/optimization
by BrowserUk (Pope) on Apr 15, 2009 at 15:43 UTC

    You are unlikely to see much improvement using threads (or processes), because the DB will almost certainly serialise all the DB activity. Especially if the inserts are all into the same table.

    You'd be much quicker to use the bulk loader for your DB.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I've replaced the insert_into_db() by only making the
      $indata->dequeue();
      And i'm getting no advantage in increasing the threads number and the CPU usage its about 10%!!! Something is wrong... I'm not taking advantage of parallelism/threads. Should I use more threads? more Thread::Queues? more Thread::Queues and passed them to different processing threads? Is there any problem with Perl and/or threads on Solaris? Should I use Forks instead of Threads?
Re: Sun Solaris (SPARC processor) + Threads + performance/optimization
by perrin (Chancellor) on Apr 15, 2009 at 17:07 UTC

    Ok, two comments:

    On Solaris, or any other Unix, you will usually get better performance by forking than by using perl threads.

    This is the wrong way to speed up a MySQL data load. You should be using LOAD DATA INFILE. There is also a fast parallel CSV loader in the Maatkit tools.

Re: Sun Solaris (SPARC processor) + Threads + performance/optimization
by gulden (Monk) on Apr 15, 2009 at 17:45 UTC
    Your comments were very helpful. However, I remain intrigued by the fact that I have a simple procedure, and I'm not taking advantage of the CPU...
      Isn't this something that would be well suited for DTrace? I am assuming you are running Solaris 10... Sorry I was in a meeting the first go around Brendan Gregg would do something like this

      #!/usr/sbin/dtrace -s /* * mysqld_pid_etime.d - measure mysqld query execution latency. * Written for Solaris 10 (needs DTrace). * * 01-Jun-2007, ver 0.50 * * USAGE: ./mysqld_pid_etime.d -p `pgrep -x mysqld` * * This prints distribution plots of the elapsed time during the execu +tion * of MySQL statements, with a plot for each query string traced. This * measure the execution stage only, not the parse or plan stages. * * This is written using the DTrace pid provider, which means it uses +an * unstable interface and is likely to stop working for future version +s of * mysql (this was tested on mysql-5.1.17-beta). * * 01-Jun-2007 Brendan Gregg Created this. */ #pragma D option quiet dtrace:::BEGIN { printf("Tracing... Hit Ctrl-C to end.\n"); } pid$target::*mysql_parse*:entry { self->query = copyinstr(arg1); } pid$target::*mysql_execute_command*:entry { self->start = timestamp; } pid$target::*mysql_execute_command*:return /self->start/ { this->elapsed = timestamp - self->start; @time[self->query] = quantize(this->elapsed); self->query = 0; self->start = 0; } dtrace:::END { printf("MySQL Query execution latency (ns):\n"); printa(@time); } </

      There is a lot more detail to be found HERE

      Hope this points you in the right direction. Cheers - Jeffery
Re: Sun Solaris (SPARC processor) + Threads + performance/optimization
by gulden (Monk) on Apr 16, 2009 at 15:59 UTC
    Now, I've one loader putting data to a Thread::Queue and 8 threads reading from that Thread::Queue and writing to a single output file.
    #-------------------------- # Insert Into File #-------------------------- sub insert_into_file{ my $fh = shift; my $record = shift; lock($x); $fh->print("$record\n"); }
    The problem isn't with IO, see below the number of threads wanting for IO, its irrelevant...
    $ iostat -xndz 1
        r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
        1.0   21.0    8.0  958.5  0.1  0.7    4.0   33.2   9   9 md/d10
        0.0   21.0    0.0  958.5  0.0  0.6    0.0   30.6   0   7 md/d11
        1.0   21.0    8.0  958.5  0.0  0.5    0.0   24.5   0   8 md/d12
        1.0   26.0    8.0  959.5  0.0  0.7    0.0   24.6   0  10 c0t1d0
        0.0   26.0    0.0  959.5  0.0  0.8    0.0   30.3   0  10 c0t0d0
    
    Wait: number of threads queued for I/O
    Actv: number of threads performing I/O
    wsvc_t: Average time spend waiting on queue
    asvc_t: Average time performing I/O
    %w: Only useful if one thread is running on the entire machine  time spent 
    waiting for I/O
    %b: Device utilization  only useful if device can do just 1 I/O at a time (invalid 
    for arrays etc...)
    
    Output of truss
    syscall               seconds   calls  errors
    read                     .560    7092
    write                    .201    2363
    lwp_park                3.621  132604
    lwp_unpark              2.656  125751
    yield                    .031    1148
                         --------  ------   ----
    sys totals:             7.070  268958      0
    usr time:              80.730
    elapsed:               39.650
    
      see below the number of threads wanting for IO, its irrelevant...

      Your test is bogus.

      You cannot speed up IO to a single file by writing to it from multiple threads. The bottleneck is the disk transfer rate, not the CPU. By adding threads to the equation, you are adding unnecessary context switches and locking to the problem, which will slow things down, not speed them up

      Your statistics are meaningless in the context of your original question. And your interpretation of them even more meaningless.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        As you said, what I've said above is bogus, I'm not writing to a single file, but to one different file per Thread.

        You're also right in the remaining comment. I'm not taking advantage of the CPU because I serialize the processing.
        Because all the threads are competing for a lock on the same single data structure, so only one thread will ever doing anything useful at any given time. There are simple ways of avoiding this problem, but which is applicable depends upon what you are doing in your program.
        Your comment in this node is quite clear. Tomorrow I will change the code and do some more tests...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://757699]
Front-paged by Arunbear
help
Chatterbox?
[marto]: Back later, got to get the kids ready
[ambrus]: Also, I still have the suspicion that Perlmonks as a website is still horribly insecure, and that a malicious attacker could take over anyone's account easily, and I should tell the details of why I think this in some Cabal-only place.
[Corion]: But the site now is on https (only), and now also with one unified SSL certificate for all PM hoss
[ambrus]: This worries me a lot especially because as a cbstream maintainer, if this happens, I could get falsely blamed for any insecurity.
[ambrus]: Cbstream itself is also insecure because I abandonned it for too long, and it's really ripe for a full rewrite or something now.
[ambrus]: But that's a totally different issue.

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2018-07-19 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (404 votes). Check out past polls.

    Notices?