Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Calling strace from a perl program

by jason_s (Acolyte)
on Mar 11, 2008 at 07:37 UTC ( #673422=perlquestion: print w/replies, xml ) Need Help??
jason_s has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I am trying to execute strace from within a perl program. Basically, I want to strace a process, and see if it is blocking while waiting for a ("flock") lock on a file. Since I'm looking for blocking processes, strace will just hang on these -- so I use "alarm(1)" to prevent my script from hanging.

The first step is figuring out how to capture the output of strace, and I'm getting stuck. Here's what I have so far, which doesn't work:
#!/usr/bin/perl -w use strict; my $pid=$ARGV[0]; my $trace_out; my $strace_pid; eval { local $SIG{ALRM} = sub { kill 2, $strace_pid || warn "couldn't kill $strace_pid: $!"; close STRACE or die "couldn't close strace"; die "exiting eval"; }; $strace_pid = open (STRACE,"strace -q -p$pid 2>&1|") or die "couldn' +t open strace"; alarm (1); undef $/; $trace_out=<STRACE> ; }; print "traceout is $trace_out";
I have a test script that takes an exclusive lock on a file, and then just goes into an infinite loop:
#!/usr/bin/perl use Fcntl ':flock'; open FH, "file.txt" ; flock FH, LOCK_EX or die "couldn't lock"; while (true) {};
If I run two instances of this, (the first instance gets the lock, the 2nd blocks while waiting for a lock), and attach strace to them, here's what I see:
>./ & [1] 17289 >./ & [2] 17290 >strace -q -p17289 >strace -q -p17290 flock(3, LOCK_EX <unfinished ...>
As expected, pid 17289 has no strace output, since it's just looping. And 17290 is blocking attempting to take a lock.

But, when I run my script on pid 17290, it's not capturing the "flock(3, LOCK_EX" text:
./hang_finder 17290 traceout is
My intention is that the output should be:
traceout is flock(3, LOCK_EX
Obviously,I'm doing something wrong, but I can't figure out what . . .

Replies are listed 'Best First'.
Re: Calling strace from a perl program
by ikegami (Pope) on Mar 11, 2008 at 07:44 UTC

    Don't know, but I spotted a second problem:
    kill 2, $strace_pid || warn "couldn't kill $strace_pid: $!";
    kill 2, ($strace_pid || warn "couldn't kill $strace_pid: $!");
    so warn will never get executed. Use lower-precedence or instead of ||, parens around kill's arguments, or parens around the kill call.

Re: Calling strace from a perl program
by cdarke (Prior) on Mar 11, 2008 at 08:31 UTC
    I can't test this right now, but I suspect you are being misled by buffering. I recall that strace writes to STDERR, which is normally non-buffered. However you are redirecting it to STDOUT (which you have to do to use the pipe) which is buffered. $|=1; before the open is worth a try.

      For starters, $|=1; in the Perl program won't affect any buffering strace might do on its output. The pipe to which strace writes a completely independent file handle from the terminal (or whatever) the Perl script writes to. Furthermore, a variable in one process won't affect a library in a different process.

      Secondly, if strace doesn't buffer what it sends to its STDERR, the output is not going to magically buffer itself if redirected.

      Unfortunately, there's no way to control whether another application buffers its output or not (unless it specifically gives you the means of doing so, of course).

Re: Calling strace from a perl program
by zentara (Archbishop) on Mar 11, 2008 at 12:48 UTC
    Another thing to try, instead of trying to capture the strace output in a filehandle, is to use the -o option to set a file for it to output to. It will probably work better. That way you can fork off the "system strace -o $pid_out -p $pid", and it shouldn't block.

    I'm not really a human, but I play one on earth. Cogito ergo sum a bum
      Thanks! I did get it to work this way. I just forked, had strace write to a file with the "-o" option, had the parent sleep a second, then send the child a SIGALRM. I then processed the strace output file in a second step. I actually found that using strace isn't the best way to find processes blocking in flock -- but I've been wanting to do other things with strace from Perl, so this will be good for future use.
Re: Calling strace from a perl program
by pc88mxer (Vicar) on Mar 11, 2008 at 21:00 UTC
    I think this is a quirk of strace when invoking it on an already running process. strace relies on the kernel to be notified when a process enters or leaves a syscall. If you run strace on a process that is already in a system call, it may not produce any output until the process is interrupted or returns from the system call.

    To implement your hang finder, you might have to use some other mechanism to indicate that a program is attempting to obtain a lock. For instance, one method is to change $0 to reflect the state of the program. On many OS's this will be observable via the ps command, e.g.:

    sub get_lock { local($0) = "(acquiring lock)"; flock(...); # returning restores previous value of $0 }
      Unfortunately, the program I need to trace locking on is a closed-source binary, so I can't change it . . .

      BUT, I found that /proc/locks has all the information I needed -- it shows both locks, and processes blocking while trying to obtain a lock. It was much easier to read through /proc/locks to get what I need than to try and control strace.

      Interesting. I tried this on my Linux system, and it was not at all as I expected. Long story short:

      • Different behavior when using ps vs. ps ww!
      • ps was not affected by the change to $0 at all.
      • ps ww did see the change, but using local didn't unmake the change as I would have expected.
      sub ps_line { my $ps = shift; print join ' ', "$ps:\t", @{ (grep { $_->[0] == $$ } map { [ split ] } `$ps | tail -n +2`)[0] }; print "\n"; } print "\$0 = $0\n"; ps_line( 'ps' ); ps_line( 'ps ww' ); { local( $0 ) = 'nyuck' x 3; print "\$0 = $0\n"; ps_line( 'ps' ); ps_line( 'ps ww' ); } print "\$0 = $0\n"; ps_line( 'ps' ); ps_line( 'ps ww' ); __END__ $0 = ./ ps: 875 pts/1 00:00:00 perl ps ww: 875 pts/1 S+ 0:00 perl ./ $0 = nyucknyucknyuck ps: 875 pts/1 00:00:00 perl ps ww: 875 pts/1 S+ 0:00 nyucknyucknyuck $0 = ./ ps: 875 pts/1 00:00:00 perl ps ww: 875 pts/1 S+ 0:00 ./
        That's very odd. I tried it using this test program:
        #!/usr/bin/perl print "my pid = $$\n"; get_lock(); print "enter return again:"; my $x = <STDIN>; sub get_lock { local($0) = "(acquiring lock)"; print "enter return:"; my $x = <STDIN>; }

        In another terminal I get:

        $ /bin/ps 27940 PID TTY STAT TIME COMMAND 27940 pts/0 S+ 0:00 (acquiring lock) $ /bin/ps 27940 PID TTY STAT TIME COMMAND 27940 pts/0 S+ 0:00 ./d0

        I'm using Ubuntu 7.10.

        Update: Ok, it does seem that you will lose the initial 'perl' from the COMMAND field when $0 is restored, but the script name should still remain. Don't know why that happens, and I also can explain yet your 'ww' output.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://673422]
Approved by ikegami
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2018-03-19 03:36 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (232 votes). Check out past polls.