Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Net::OpenSSH killing script

by jaiieq (Novice)
on Feb 02, 2011 at 15:43 UTC ( [id://885771]=perlquestion: print w/replies, xml ) Need Help??

jaiieq has asked for the wisdom of the Perl Monks concerning the following question:

Here is a quick synopsis of what I am doing, and the error(s) I am receiving.

I have about 40 computers that I use Net::OpenSSH to connect to. It will connect to one of the remote computers, check for the existence of a process using a simple shell command (using ssh->system). If the process is not running, it then uses the ssh->spawn method to launch it. Finally, I use the capture2 method to read some data that is generated by the process that was spawned (or running) earlier.

This process loops through the ~40 remote computers constantly for a given period of time (can be days)

Everything works fine... for awhile. Randomly, the Net::OpenSSH module will puke and kill the script (Killed by Signal 1). I start to see the following before the script is killed....:

"Use of unitialized value $check in pattern match (m//) at /usr/local/ +lib/perl5/site_perl/5.12.2/Net/OpenSSH.pm line 662" "Use of unitialized value $check in pattern match (m//) at /usr/local/ +lib/perl5/site_perl/5.12.2/Net/OpenSSH.pm line 671"

And $ssh->error prints "unable to establish master SSH Connection: Unknown error"

Any ideas as to why everything would work fine, but eventually (and randomly) to spitting out these errors and finally kill the script?

This is all on MAC OS X (10.5 or above).

If anymore information is needed, please let me know and I'll post more details as needed. Thanks!

Replies are listed 'Best First'.
Re: Net::OpenSSH killing script
by salva (Canon) on Feb 02, 2011 at 16:31 UTC
    Randomly, the Net::OpenSSH module will puke and kill the script (Killed by Signal 1).
    Are you sure that the script is being killed? this message is usually generated by the ssh master process.

    Maybe you have in your code something similar to...

    $ssh->error and die ...
    I have revised the module and can't find any place where it could be killing the perl process.
    Use of unitialized value $check in pattern match (m//) at /usr/local/l +ib/perl5/site_perl/5.12.2/Net/OpenSSH.pm line 662

    This is caused by a harmless bug on the module, it will be solved for the next development release (BTW, I am Net::OpenSSH author).

    If anymore information is needed, please let me know and I'll post more details as needed

    Add the following line at the beginning of your script:

    $Net::OpenSSH::debug = 5;

    restart it, wait for it to fail and post here the generated debugging information.

    Include also the script source code, send it to me by email if you don't want to post it here.

    Finally, a capture of the process with dtrace may also help, though it will contain sensible information as SSH keys or passwords. You should revise it.

      Thanks. I posted here because I knew you (the author) frequented the site.

      It could very well be how I've coded it as I'm no expert by any means, but I do tend to know enough to be dangerous. I have an array of ~40 computers that I loop through continuously. The code posted below is run on each of those computers. I have excluded some minor mundane code (such as writing data to a file) for the sake of brevity..

      # Create connection my $SSH = Net:OpenSSH->( $IP, user => 'user', password => 'pass' ); if( !$SSH->error ) { # Is my process running? my $PROC = $SSH->capture( "ps aux | grep <process_name> | grep -v + grep | wc -l | sed 's/ *//'" ); if( $PROC == 0 ) { # 3rd party program is run here via perl system method # If it happens to fail, it has no adverse consequences on # the rest of the script. <run 3rd party program> if( <my above 3rd party output condition is met> ) { # Read File on remote machine ( $DATA, $ERR ) = $SSH->capture2( "cat <file on remote +machine" ); # Write/Append $DATA to loca file <write data to loca file> # Delete the remote file $SSH->system( "rm -f <remote file>" ); # And finally, since my remote process is not currently # running, spawn it on the remote machine $SSH-spawn( "./script.pl" ); } } } else { print $SSH-error }

      I know all machines can communicate and the main machine that runs this script also has every machine in its known_hosts.

      My only assumption is that since this happens for each of the 40 machines, numerous times a day, that OpenSSH (the program, not the module) is hitting some type of barrier due to the sheer # of connections being made causing it to fail

      I have added the debug line you suggested and I will post back once it fails (hopefully today, but no later than tomorrow morning)

        It seems that you are using spawn in order to create detached processes on the remote host, but it doesn't work that way.

        spawn forks a new local ssh process that continues running on the background until the remote process exits and you have to take care of reaping those ssh processes with waitpid, otherwise zombies will pile up and at some point the OS will refuse to fork new processes... and that is probably the reason for your script failing.

        The right way to do that is to run the remote command with nohup:

        $ssh->system("nohup ./script.pl &");
        Though, as it seems that the remote command is actually a Perl script, you can also convert it into a daemon letting it take the responsibility of going into the background. There are several CPAN modules that allow to do that (i.e. Proc::Daemon).

        Besides that, there are other places where you can improve your Net::OpenSSH usage:

        # Read File on remote machine ( $DATA, $ERR ) = $SSH->capture2( "cat <file on remote machine" ); # Write/Append $DATA to loca file <write data to loca file>
        ...can just be written as...
        $ssh->system({stdout_file => ['>>', $local_file]}, cat => $remote_file).
        And in...
        my $PROC = $SSH->capture( "ps aux | grep <process_name> | grep -v grep + | wc -l | sed 's/ *//'" ); if( $PROC == 0 ) {...
        you should check that the command did not fail due to some SSH error. Otherwise, you could end running several instances of ./script.pl.
Re: Net::OpenSSH killing script
by zentara (Archbishop) on Feb 02, 2011 at 16:25 UTC
      I don't want to get involved in the shouting match here, but I did want to say I always use Google and search the web (before posting any questions) since 90% of questions asked are already answered somewhere else, although I didn't find anything pertaining to my exact situation. I also knew that salva frequents here (and thankfully he helped me correct my mistake, even though it wasn't an Net::OpenSSH error)
      Oh, my stars... Another totally useless reply from one of the GreatMinds at PerlMonks.

      Have you googled for "unable to establish master SSH Connection: Unknown error"?

      Didn't think so. If you had, you'd have seen that it returns two (2) hits, both of which are for the root node of this thread.

      Shame on you! And to think others have probably up-voted such a pointless and unhelpful reply. But you seem to have a special knack for those.

        Have you googled for "unable to establish master SSH Connection: Unknown error"? Didn't think so.

        As a matter of fact I did, and although these perlmonks nodes are mentioned, the first link I got back from Google was a node where it was pointed out it is usually a filesystem permissions error which spits out that message.

        I didn't include the quotations in the search string. That explains the difference in our search results.

        You show why you remain anonymous, since your programming troubleshooting techniques are so poor.

        Bad wu wei man. You come here asking for free answers, then complain when you are not spoon fed an answer.


        I'm not really a human, but I play one on earth.
        Old Perl Programmer Haiku ................... flash japh

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://885771]
Approved by broomduster
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 21:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found