Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: open3 buffering in linux vs. os x

by bruno (Friar)
on Jan 09, 2009 at 01:30 UTC ( [id://735052]=note: print w/replies, xml ) Need Help??


in reply to open3 buffering in linux vs. os x

What a small small world this is! Last month I needed to do something similar using ProFit. I didn't see the need to wrap the program with Perl though, since it's a command line application that supports scripting (you can pass your profit script using the -f flag).

So, to align an arbitrary number of structures, I created a text file that only said "fit", and then I looped using bash:

for F in *.pdb; do profit -f script.txt complex.1.pdb $F \\ >> output.txt; done
Then I used grep to just keep the RMSD value of the run for each structure:
cat output.txt | grep 'RMS' | awk '{print $2}' > rmsd.txt
Of course, for anything slightly more complicated than that, I see why you'll want to use Perl.

If you think it's worth it, consider writing a module under Bio::Tools::Run, so that you can later share your wrapper with everyone. You should look at the Bio::Tools::WrapperBase module and subclass from it; it'll give you a nice interface with all the error checking built in. I could help you do it if you want to!

Replies are listed 'Best First'.
Re^2: open3 buffering in linux vs. os x
by Lexicon (Chaplain) on Jan 09, 2009 at 02:05 UTC
    I have triplicate simulations of several homologs of a protein. The goal is to compare the structures at certain time ranges across all simulations and record statistics (avg, min, stddev of RMSD) in different subgroups. Oh, and over different zones for alignment. I imagine almost every protein simulation group has written something similar at one point or another, and now it's my turn. :)

    Right not the code is very specific for my situation, but I'm planning to generalize it for my lab after I understand the results a little better. I don't know if it would generalize well past that... over half the code is just parsing through which PDBs I have, giving them titles, organizing timepoints, etc... The part that deals with profit is basically one loop and one subroutine.

    I'll be happy to email a copy to you if you're interested, but I don't think you'll find it very enlightening. Here's the core subroutine. $filets are lists of PDBs to be compared. $ranges are the zones to use corresponding to each set. All the hard stuff is the bookkeeping, which will sadly be very user dependent.
    sub blast_filets { my ($PF_READ, $PF_WRITE); my $pid = open2($PF_READ, $PF_WRITE, $PROFIT) or die "Couldn't open pipe to profit. $!"; my $filets1 = $_[0]; my $filets2 = $_[1]; my $range1 = $_[2]; my $range2 = $_[3] || $range1; my @rmsd = (); my $count = 0; if ( @$range1 != @$range2 ) { die "Ranges do not have equivalent zone sizes."; } foreach my $f1 ( @$filets1 ) { print "REFERENCE $f1\n" if $VERBOSE >= 3; print $PF_WRITE "REFERENCE $f1\n"; foreach my $f2 ( @$filets2 ) { if ( $VERBOSE >= 3 ) { print "MOBILE $f2\n"; print "ZONE CLEAR\n"; print "ATOMS CA\n"; foreach my $i ( 0..$#$range1 ) { print "ZONE $range1->[$i][0]-$range1->[$i][1]" . ":$range2->[$i][0]-$range2->[$i][1]\n"; } print "FIT\n"; } print $PF_WRITE "MOBILE $f2\n"; print $PF_WRITE "ZONE CLEAR\n"; print $PF_WRITE "ATOMS CA\n"; foreach my $i ( 0..$#$range1 ) { print $PF_WRITE "ZONE $range1->[$i][0]-$range1->[$i][1]" . ":$range2->[$i][0]-$range2->[$i][1]\n"; } print $PF_WRITE "FIT\n"; print $PF_WRITE "\n\n\n\n\n\n\n\n\n\n"; } } print $PF_WRITE "QUIT\n"; my $result ; #print "Reading results\n"; while ( defined ($result = readline($PF_READ))) { #print "RESULT = $result\n" if $VERBOSE >= 2; if ($result =~ /RMS: ([\d\.]+)/m) { #print "Rmsd = $1\n"; push @rmsd, $1; $count++; } elsif ( $result =~ /Error/i ) { print "Error: $result\n"; } } my $result_wait = waitpid($pid, 0); if ( $result_wait != $pid ) { die "Waitpid returned $result_wait instead of $pid. $?."; } #print "DONE WITH RMSD\n"; my ( $rmsd, $stddev ) = Utility::Mean_and_Stddev (@rmsd); return ($rmsd, $stddev, $count, \@rmsd); }
      Thanks for posting that! Although the ProFit invocation there is not isolated from your case-specific code, it's still very useful to see how should the system calls be done. About the wrapper I was thinking about a cleaner, more OO interface. Something like:
      my $profitter = Bio::Tools::Run::ProFit->new( files => \@pdbfiles, reference => $pdbreference ); $profitter->fit; my %rmsds = $profitter->get_rmsds;
      But then, since I haven't used the program much, I don't know what else it could/should do. If I ever need to use it again, I'll get back to this thread and, with your permission, steal the portions of your code that successfully interact with ProFit.
        You're more than welcome to throw any of this code in a perl module. I can send you the full wrapped code whenever you want and it'll probably make more sense.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://735052]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2024-03-28 11:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found