Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Segmentation fault: problem with perl threads

by katharnakh (Novice)
on Sep 15, 2008 at 11:47 UTC ( #711429=perlquestion: print w/ replies, xml ) Need Help??
katharnakh has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Background:
I am working on a replication project, where i have to replicate dependencies of projects. I use rsync command to replicate dependencies. Since the dependency list of a project varies and will usually have a huge list, hence i want to it replicate parallely by giving a bunch of file-names to replicate to each rsync command(using --files-from=<filename>) through a file. I am doing the same using perl's threads module.

Problem:
When i have more than one project(a common case), i create 10 threads to run rsync commands. This way i want to achieve replication of all dependencies parallely. I wait in main thread to finish all child threads using join() function of threads module. The script was running fine for somedays, but recently the script terminates, saying 'Segmentation fault', no line number is printed.

I have tried to debug(using perl -d <script-name>) the script, but the debugger does not comeup or no response when it lands on statement to create thread(threads->create(...)), i had to wait long time but i did not comeup so i have to kill the process.

I cannot post the whole code, because it is big and uses some of private packages, but here is how i create 10 threads for each project.

The script terminates by printing Segmentation fault, after executing last print statement in _replicate function. I not able to make out where and why that error is occuring. Or is there a other way to achieve parallel processing withoug using threads? Could someone help me here please?

I am using perl, v5.8.3 built for x86_64-linux-thread-multi

sub _replicate{ my $ref = shift; my $logger = get_logger(); print "Starting replication of dependency files", $/; foreach my $sc(@{$ref}){ next unless (defined $sc); mkdir($LOG_FOLDER."/".$sc->{sc_name}); print "\tScenario: ".$sc->{sc_name}, $/; print "\tLatest Dependencies: ".$sc->{total_dep}." of size "._ +get_readable_size($sc->{total_size}), $/; my @thr_arr = (); print "Creating parallel threads", $/; foreach my $robj(@{$sc->{rsync}}){ my $th = threads->create(\&worker, $robj); # i create thre +ads this way push @thr_arr, $th; } #$logger->info("\twaiting for threads to finish its job..."); print "\twaiting for threads to finish its job...", $/; foreach my $t(@thr_arr){ if (defined $t){ my $k = $t->join(); # this is how i wait for all thre +ads to finish } } #map {my $k = $_->join} threads->list; # map{ # my $th = $_; # my $k = $th->join if($th); # just a blind belief +whether this might cause 'Segmentation fault', hence the check. # }@thr_arr; #$logger->info("\tFinished replicating dependencies of ".$sc-> +{sc_name}); print "\tFinished replicating dependencies of ".$sc->{sc_name} +, $/; } } sub worker{ my $robj = shift; my ($rsync, $server, $from, $to) = @{$robj->{elements}}; my $alt_server = $RSYNC_CONN_STR_2; my $rsync_cmd = $rsync.$server.$from.$to; print "Thread-",threads->self->tid," executing ", $rsync_cmd; }

Thanks in advance,
katharnakh.

Comment on Segmentation fault: problem with perl threads
Select or Download Code
Re: Segmentation fault: problem with perl threads
by moritz (Cardinal) on Sep 15, 2008 at 12:02 UTC
    There are basically two reasons why your program segfaults (at least I can think of two).

    The first is a bug in perl. You can try to run your script with perl-5.10.0 or perl-5.8-maint (the soon-to-be 5.8.9). Both contain many fixes for bugs in perl-5.8.

    The second possible is a non-thread-safe XS module that you use, or a buggy XS module. Without knowing what modules you use this is impossible to diagnose remotely.

      Hi,

      Thanks for the quick reply. I am actually using following module.

      use Log::Log4perl qw(get_logger); use Log::Log4perl::Appender; use Log::Log4perl::Layout; use XML::Simple; use File::stat; use FileHandle; use threads; use Net::SFTP;

      For trying with upgraded version of perl, i have to look into it.

      Thanks very much,
      katharnakh.

        That is a large number of modules that you require to be "thread-safe". Threads get an exact copy of the parent thread at the time of creation, and if for some reason (and it usually happens) a thread dosn't clean itself up completely, it will hang around wasting memory, which then gets incorporated into the next thread. Since you are not sharing data between threads in realtime, and apparently logging to a file, you could easily switch to a forked solution and save all the hassles you are experiencing with threads.

        I would be particularly worried about the thread-safety of Net-SFTP, a quick google for "Net::SFTP thread safety" indicates it is not safe for thread usage.


        I'm not really a human, but I play one on earth Remember How Lucky You Are
Re: Segmentation fault: problem with perl threads
by BrowserUk (Pope) on Sep 15, 2008 at 12:38 UTC
    I am using perl, v5.8.3

    That's 5 years and many releases old. There have been a lot of fixes to threads in the mean time. You must upgrade to find a solution.

    I would move up to 5.8.6 as that was the most stable version for threading, with subsequent changes making it less reliable. Hopefully, the imminent 5.8.9 will have resolved some of the new quirks, but only time will tell.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Hopefully, the imminent 5.8.9 will have resolved some of the new quirks, but only time will tell.

      That sounds a bit fatalistic.

      One way you can help to actually make it better is to test it now. If you have an application with heavy threads usage, download 5.8-maint now and report any errors.

      Hi,

      _replicate() is passed with a datastructure and it looks like this,

      $VAR1 = [ { 'sc_name' => 'XYZ_Scenario', 'rsync' => [ { 'elements' => [ 'rsync --archive --relative + --stats --verbose --links --copy-links --copy-unsafe-links --safe-li +nks --times --files-from=\'./Sep_17_2008(22h.50m.52s)/gen/file-from/ +XYZ_Scenario/rsync_input-1.txt\' ', 'rsync://xxx.yyy.zzz.corp:1 +873/', 'contexts', ' /var/workshare/contexts > +> \'./Sep_17_2008(22h.50m.52s)/log/XYZ_Scenario/rsync_input-1.log\' 2 +>&1' ], 'statistics' => { 'total_files' => 863, 'size' => 232563375 }, 'status' => undef # here i write something +meaningful for db updation later. }, ... ... ]# this array will have 10 or less, such objec +ts with diff. 'rsync_input-*.txt' files to replicate 'total_size' => 2184209735, 'total_dep' => 13725 }, ... ... ]# varies, based on scenarios

      I checked this datastructure carefully, it looks like what i intended, so no problem till here

      Below is the same set of functions im posting as earlier. Because i think the problem lies here. I tried running my script on perl5.8.8 and still i get 'Segementation fault' even if i actually execute the rsync command in thread or just print rsync command in thread and return. I strongly believe, i might be calling join() method on thread object which might have died after finishing its job. Hence i try to dereference a reference which is deallocated(may be or ..?).

      This happens because, when 10 threads are running parallely and i wait for a 2nd thread, suppose, to join. Meanwhile 3rd or 4th or 8th(anything till 10) might have finished running. Once 2nd joins and main thread tries to call join() on next thread object, in the array(either returned by threads->list or i keep thread object in a array), which no more exists, or no clue whether the thread is joinnable.

      I tried to make sure whether thread is running as you can see in below code, _replicate(),

      sub _replicate{ my $ref = shift; my $logger = get_logger(); print "Starting replication of dependency files", $/; $logger->info("Starting replication of dependency files"); foreach my $sc(@{$ref}){ next unless (defined $sc); mkdir($LOG_FOLDER."/".$sc->{sc_name}); $logger->info("\tScenario: ".$sc->{sc_name}); $logger->info("\tLatest Dependencies: ".$sc->{total_dep}." of +size "._get_readable_size($sc->{total_size})); my @thr_arr = (); foreach my $robj(@{$sc->{rsync}}){ # I will add a key to this datastructure, to check whether + thread is joinnable or it is still running? $robj->{thr} => 'running'; my $th = threads->create(\&worker, $robj); $logger->info("\tThread-".$th->tid.", Total files: ".$robj +->{statistics}->{total_files}.", Size: "._get_readable_size($robj->{s +tatistics}->{size})."[".$robj->{statistics}->{size}."B]"); $logger->info("\tcmd: ".join("", @{$robj->{elements}})); push @thr_arr, $th->tid; } $logger->info("\twaiting for threads to finish its job..."); # 3rd try foreach my $k(0..$#thr_arr){ # lets check tid and then access the thread object! print $k," ",$sc->{rsync}->[$k]->{thr}, $/; if ($sc->{rsync}->[$k]->{thr} eq 'running'){# if not, thre +ad might have died and we try to acces the mem. which is deallocated +after thread's death my $t = $thr_arr[$k]; my $th = threads->object($t); $th->join() if ($th); } } # 2nd try # map{ # my $th = $_; # just a blind belief whether this might cause 'Segmentati +on fault', hence the check. But here may, the thread object im referr +ing might have been deallocated due to death of thread, hence i get ' +Segmentation fault' .... ? # my $k = $th->join if($th); # }@thr_arr; # 1st try # May be the thread objects returned by threads->list are unjo +ined, but are they joinnable? no clue...! #map {my $k = $_->join} threads->list; $logger->info("\tFinished replicating dependencies of ".$sc->{ +sc_name}); } } sub worker{ my $robj = shift; my ($rsync, $server, $from, $to) = @{$robj->{elements}}; my $alt_server = $RSYNC_CONN_STR_2; print "Thread-".threads->self->tid." running"; my $i = 0; while(++$i <= $MAX_REPL_ATTEMPT){ #$logger->info("\t\t[Attempt-".$i."]Thread-".threads->self->ti +d." executing [".$rsync_cmd."]"); #$logger->info("\t\t\tTotal files: ".$robj->{statistics}->{tot +al_files}.", Size: "._get_readable_size($robj->{statistics}->{size}). +"[".$robj->{statistics}->{size}."B]"); my $rsync_cmd = $rsync.$server.$from.$to; `$rsync_cmd`; if ($?){ # because of connection refusal from server, command +fails $robj->{status} = "Completed with error!"; $rsync_cmd = $rsync.$server.$from.$to; $server = ($i%2) ? $RSYNC_CONN_STR_1 : $RSYNC_CONN_STR_2; +# just a small trick to use other port on the same server for connect +ion #$logger->error("ERROR: Thread-".threads->self->tid." says +, replication Attempt-".$i." failed, trying again after 2 mins."); sleep(120); }else{ $robj->{status} = "Completed"; last; } } $robj->{thr} = 'done'; my $etime = time; my $spent_time = $etime - $stime; my $logger = get_logger(); $logger->info("\t\t[Attempt-".$i."]Thread-".threads->self->tid." t +ook "._format_spent_time($spent_time)." time"); }

      I would ask, is there anyway i would make sure all threads are finished or call join on only those threads which are joinnable or i have to go with other solution which sent earlier, fork() ing processes, instead thread?

      Thanks in advance,
      katharnakh.

        This happens because, when 10 threads are running parallely and i wait for a 2nd thread, suppose, to join. Meanwhile 3rd or 4th or 8th(anything till 10) might have finished running. Once 2nd joins and main thread tries to call join() on next thread object, in the array(either returned by threads->list or i keep thread object in a array), which no more exists, or no clue whether the thread is joinnable.

        This is a red herring. When non-detached threads end, they wait until you call join on them before being cleaned up. You do not need to check anything before calling join. If the thread has ended before you call join, it will return immediately. If the thread is still running, it will block until the thread ends. This is how they are designed to work. Your problem lies elsewhere.

        You keep posting these snippets of code, but they are so dependant upon the rest of the program that you are not posting, that it is impossible for anyone to run them in order to try and help. They are also full of lumps of commented out code, rambling comments that wrap 3 times and worst of all, all this insane "logger" crap which completely obscures the structure of the code. It is not surprising that you cannot get this to work as you cannot see what it is that you own code is doing.

        So, a lot of critisism which you may not like, so I'll try to show you that the critisism can help.

        Here is your code above, with all the crap stripped away, a few extra spaces and blank lines etc.

        sub _replicate{ my $ref = shift; foreach my $sc ( @{ $ref } ) { next unless (defined $sc); mkdir( $LOG_FOLDER . "/" . $sc->{sc_name} ); my @thr_arr = (); foreach my $robj( @{ $sc->{ rsync} } ){ $robj->{thr} => 'running'; my $th = threads->create( \&worker, $robj ); push @thr_arr, $th->tid; } $_->join for @thr_arr; } } sub worker{ my $robj = shift; my( $rsync, $server, $from, $to ) = @{ $robj->{ elements } }; my $alt_server = $RSYNC_CONN_STR_2; for my $i ( 0 .. $MAX_REPL_ATTEMPT ){ my $rsync_cmd = $rsync . $server . $from . $to; `$rsync_cmd`; if ($?){ $rsync_cmd = $rsync . $server . $from . $to; $server = ( $i % 2 ) ? $RSYNC_CONN_STR_1 : $RSYNC_CONN_STR +_2; sleep(120); } else{ $robj->{status} = "Completed"; last; } } $robj->{thr} = 'done'; }

        Now the structure and essentials of the code are clear and easy to follow, and it is easy to pick out several problems:

        1. You create your thread here my $th = threads->create( \&worker, $robj );,

          but then you do push @thr_arr, $th->tid;

          which means that @thr_arr contains a list of thread ids, not thread objects!

          which means when you come to try and join your threads, you are trying to call the method join() on a number and that obviously isn't going to work.

          Now that should not segfault. You should be seeing an error message, (assuming you are using strict & warnings) along the lines of:

          Can't call method "join" without a package or object reference at...
          .

          And you shoud have seen that error the very first time you ran this code, and every time you've run it since.

          Instead of fixing the actual problem, you've guessed as to what the cause might be and basically wasted your time trying to fix a problem that doesn't exist.

          Please note: I'm not saying your code will work once you've fixed that problem. I am saying that it will never work until you do.

        2. You are calling rsync using backticks: `$rsync_cmd`;, but you are doing nothing with any ouput produced.

          That means you are having the system build a pipe and collect the output, and then just throwing it all away.

          Have you heard of system?

        3. And now for the biggest problem, the design of your code in _replicate().

          You have 2 nested loops. Within the outer loop you run the inner loop which creates a bunch of threads all trying to contact same server.

          And then block until that finishes, with several retrys and 120 second waits, before starting another bunch of threads to contact the next server. This is fundamentally bad design.

          If one server is slow, or broken, with all your threads trying to talk to the same server, you will basically be doing a lot of nothing, when you could be talking to one or more of the other servers in parallel.

        If you are going to be doing multi-processing, whether through threads or forks, the secret is to start simple. Write your worker subroutine in a standalone, single threaded program, and make it work.

        Once you've make sure it is working that way, then try running two copies concurrently using threads or forks.

        Once you've got that working reliably, only then try to scale it up!

        You asked whether you should move to using forks. If you have a native fork on the platform you are working on, then there is nothing obvious from the code you have posted that requires threads, so you probably could use forks.

        But, on the basis of the code you've posted, I think that you are likely to have just as many problems trying to work in that environment as you are having with threads.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://711429]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2014-12-22 18:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (126 votes), past polls