Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: When starting a process, at what point does "open()" return?

by graff (Chancellor)
on Aug 18, 2003 at 03:11 UTC ( #284477=note: print w/replies, xml ) Need Help??


in reply to When starting a process, at what point does "open()" return?

Our problem comes from the fact that we have a somewhat flakey executable which we use for testing. Sometimes it hangs on start up and manages to hang the test scripts as well.

And you're using a flakey executable because...? What makes you sure that you're getting stuck in the open statement? (Showing us some code around the open statement might help.)

As we're using about 30 of these processes for testing, we would like to determine which executable is causing the problem.

Do you mean you have about 30 different flakey executables, or that you are trying to run 30 instances of the same piece of crap? Are you talking about looping 30 times over an "open; do something; close" type of block, or are you trying to have 30 pipeline file handles open at once? (Don't some OS's have a problem with opening too many file handles?)

If you haven't tried this yet, you could do something like:

$|++; # turn off output buffering on STDOUT my @pipeproc = ("flakey_writer |","| flakey_reader",...); for (0..29) { my $start = time(); my $pid = open( FH, $pipeproc[$_] ); my $now = time(); print "opening $pipeproc[$_] returned $pid in ",$now-$start," sec\n" +; close FH; }
(or, some variant that would be more relevant to your needs). If all the processes start and you see how long each one took, then it must be some other stage in your script where you hang (i.e. a read or write, or maybe something unrelated to the pipeline process). If the loop doesn't complete, you'll at least know where the problem starts. Then you'll want to see whether the problem always happens at the same iteration.

Maybe you've been over this ground already, but your post didn't really give enough information.

Replies are listed 'Best First'.
Re: Re: When starting a process, at what point does "open()" return?
by tid (Beadle) on Aug 20, 2003 at 23:55 UTC

    Thanks for your response. My apologies for the lack of completeness of my post. I was trying to pose a more general question, rather than focus on the specifics of my situation. In response to your queries:

    And you're using a flakey executable because...?

    From bitter experience, I would say that complex software projects tend to require moderately complex tools to test some aspects of them. The unfortunate part is that it is rare to find the same effort put into testing tools as it is into the original project.

    I am currently on a short term contract (6 weeks total), in which I have not had the opportunity to spend the time debugging their test executables. I have access to the scripts, and I do what I can.

    What makes you sure that you're getting stuck in the open statement? (Showing us some code around the open statement might help.)

    The following code is a cut from the script:

    open(STDOUT, "> $l_cqr_std_output_dir\\cqr_out_$l_counter_x.tx +t"); $l_proc_id = open ($handleName, "| $command "); print STDOUT "CQRD Process ID: [".$l_proc_id."]\n";

    The results when it fails (which is only fairly occasionally) is that the output log file created by the first line is created, but the print line is never executed, and the script hangs. If you find the rogue process created by the open command and terminate it using the Task Manager, the script recovers, and the output log file shows the process ID of the CQRD that you just terminated.

    or that you are trying to run 30 instances of the same piece of crap

    Bingo.

    Don't some OS's have a problem with opening too many file handles?

    It seems to work fine unless the errant process has a problem on startup. While I can't completely discount the possibility that Windows is jamming its head firmly into something unholy, I think it's unlikely this time.

    Cheers
    Mike

      Ah. This makes it very clear. Thanks (and ++!!)

      So, while I don't understand what could cause the behavior that you observe (let's hope someone else looks more closely at this thread and can explain it), there might be some way to write another perl script that automates the recovery procedure you've been doing manually -- but I'm only guessing, because my exposure to process-control details on windows is nil.

      But consider... if you make the outputs to the log file "atomic" (add the extra overhead to open/write/close each time you append a message to the log, so some other process has a chance to read it while this test script is running), you might be able to run a separate script that loops on "check the log file; sleep". Make sure the problem script (which is trying to launch the flakey executable) logs when it's about to start a process (including the iteration number or some other distinct id), as well as when the open call returns.

      Now, the separate monitor script could figure out when the problem script is hanging; it knows the pids of the jobs that have been opened successfully (they're in the log file), and now it just needs to find a pid in the windows process table that is associated with the flakey executable, but isn't in the log file. The monitor script kills that "outlier" pid (could even append to the log file to report that), and the problem/test script would move on.

      As I said, I'm only guessing that something like this would work -- I don't know how you would actually implement it. And of course it doesn't really answer you initial question or solve the real problem. But if it works as intended, you could at least move on towards whatever your "true" objective may be.

        I'd started down that path, and only gotten to the point of timing out the log files created by the test EXEs when they start correctly. But I hadn't thought to search through the task list and match to two up. Worth a shot!

        Many Thanks!
        Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://284477]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2021-06-19 08:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (91 votes). Check out past polls.

    Notices?