need to poll/manage thousands of wireless devices
which, being wireless, may be off line
so I would like a quick test to determine on/off line status
before attempting a login so I am trying to use "ping"
but had problems with hangs (ping never completes)
so I added timeout but still hangs some times immediately after line 11
(never reaches line18 or 21)
code
1 #! /usr/bin/perl -w
2
3 sub ping {
4 my $ip = shift || die "ping: no ip provided\n";
5 my $debug = shift || 0;
6
7 my $TIMEOUT = 5;
8 my $remaining = 0;
9 my $result = 0 ;
10
11 print LOG "ping v 1 entry IP $ip debug $debug\n" if $debug;
12 my @lines;
13 eval {
14 local $SIG{ALRM} = sub {die "alarm\n"} ;
15 alarm $TIMEOUT ;
16 @lines = split /^/m,`ping -n -W 2 -c 5 -i 0.2 -w 1 $ip`;
17 $remaining = alarm 0;
18 print LOG "ping($ip): remaining $remaining\n" if $debug;
19 };
20 if($@) {
21 print LOG qq{ping("$ip"): timed out\n} if $debug;
22 return 0;
23 }
24 for my $line (@lines) {
25 if($line =~ /Host Unreachable/o) {
26 print LOG qq{ping($ip): Host Unreachable\n};
27 last;
28 }
29 print LOG " ping: $line" if $debug;
30 if($line =~ / from $ip/) {
31 $result++;
32 }
33 }
34 if($result > 2) {
35 printf LOG "ping($ip): returning result $result\n" if $deb
+ug;
36 return $result;
37 } else {
38 printf LOG "ping($ip): returning result 0\n" if $debug;
39 return 0;
40 }
41 }
42
43 1;
44
The module originaly was embedded in a large program (~500 line).
To simply testing, I wrote just a short loop to observe behavior and gor even stranger symptoms.
17 while(my $s = $S->fetchrow_hashref) {
18 my $ip = $$s{d07};
19 my $start = time();
20 printf LOG "\nService $$s{number} $ip\n";
21 my $ping = ping($ip,1) ;
22 printf LOG "reply \"$ping\" \nelapsed time %d seconds.\n",
time() - $start;
23 }
24
After running (under nohup) through a hundred or so IPs, the output of the printf at line 22 disappeared from the LOG file. There were no hangs for this simple loop, also no timeouts reported.
Even stranger, attidional jobs appeared out of nowhere.
[john@scan test]$ jobs
[1] Running nohup ./ping.pl &
[2] Running nohup ./ping.pl &
[3]- Running nohup ./ping.pl &
[4]+ Running nohup ./ping.pl &
[john@scan test]$
and zomnies where left behind
john 25136 0.0 0.0 4336 648 pts/3 S 15:51 0:00 ping
+-n -W 2 -c 5 -i 0.2 -w 1 10.105.44.202
john 25137 0.0 0.0 4336 636 pts/3 S 15:51 0:00 ping
+-n -W 2 -c 5 -i 0.2 -w 1 10.104.10.22
john 25138 0.0 0.0 4336 636 pts/3 S 15:51 0:00 ping
+-n -W 2 -c 5 -i 0.2 -w 1 10.104.41.200
john 25139 0.0 0.0 4336 640 pts/3 S 15:51 0:00 ping
+-n -W 2 -c 5 -i 0.2 -w 1 10.107.8.245
john 25140 0.0 0.0 2044 572 ? S 15:51 0:00 ping
+-n -W 2 -c 5 -i 0.2 -w 1 10.104.13.115
I think I have done something horrible wrong in the ping module. Can you spot the mistake?