Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
The stupid question is the question not asked
 
PerlMonks  

Re: Processing data with lot of math...

by ColonelPanic (Friar)
on May 12, 2004 at 14:43 UTC ( #352760=note: print w/ replies, xml ) Need Help??


in reply to Processing data with lot of math...

Your speed problem is probably caused by using up all of your RAM by putting the entire files in memory. If you can figure out an algorithm that doesn't do this, you'll be much better off. Also, if you post the code you have now, it will make it easier to offer additional help.



When's the last time you used duct tape on a duct? --Larry Wall


Comment on Re: Processing data with lot of math...
Re: Re: Processing data with lot of math...
by qhayaal (Beadle) on May 13, 2004 at 08:49 UTC
    1. RAM problem.
    RAM is 512MB and swap 1GB.
    A file typically is 1.2MB
    No. of lines (atoms) 20,000.
    Note: This is in test phase, but it can increase by about 10-50 fold.

    2. Graph Theory problem.
    This unfortunately is not a Graph-Theory problem, but more of a gas-phase problem. I need to find out all the 'interacting-pairs' (ie pairs of atoms close enough) to do a more complicated analysis.

    3. FORTRAN object file.
    Ok, may be I can make this piece into a library file, so that I should be able to use it as a POSIX funtion?

    4. Grid-wise calculation.
    Promising. :) I thought, but was a bit lazy to try. After brute-force I began to wonder if it would be worth the effort... Thanks. :)

    5. Chemistry::Bond::Find
    Yes, I must try this. :) Mine's is a protein with hell-lot of water, and I have to find water-protein hydrogen bonds. At least based on distance alone. Will get back to you. itub. :D

    6. Using square of distance.
    Well, yes, I was already using the square of the distance, and yes, it is on PDB files. :)

    7. Using x:y:z boxinfo.
    Yes. I thought of it, but was lazy as I thought I need to put in lot of code. But now I am convinced it won't be so much. Thanks a lot BrowserUk. :)

    8. The code:
    # Open each PDB foreach my $pdb_file (<$pdb_list>) { { chomp($pdb_file); my $tmp_file; # We would open the PDB with this handle open($tmp_file, "< $pdb_file") or (die "Cannot open PDB: $ +!"); # Read X,Y,Z coordinates my @X; # X-coordinates my @Y; # Y-coordinates my @Z; # Z-coordinates my @pdb_tmp=<$tmp_file>; foreach (@pdb_tmp) { if (substr($_,0,3) eq "ATO") { push @X, substr($_,30,8); push @Y, substr($_,38,8); push @Z, substr($_,46,8); } } # Find the interaction pairs. Also strore the best angle, sh +ould it # occur again with another proton. Also keep track of the wa +ters # that have made h-bond with the solute atoms. my @sel_wat; # Array of water molecule (numbers) selected my %hbond; # $hbond[$tag1:$tag2][0]=distance # $hbond[$tag1:$tag2][0]=angle # Solute as donor and water as acceptor { my @atom_cov; # Polar solute atom that is already cover +ed. for (my $i=0; $i <= $#pol_h; $i++) { # If this donor is not already covered, then go ahead. if ( ! defined($atom_cov[$pol_h[$i][1]]) ) { for (my $j=0; $j <= $#wat_a; $j++) { my $dx=$X[$pol_h[$i][1]]-$X[$wat_a[$j]]; my $dy=$Y[$pol_h[$i][1]]-$Y[$wat_a[$j]]; my $dz=$Z[$pol_h[$i][1]]-$Z[$wat_a[$j]]; my $distSq=($dx*$dx)+($dy*$dy)+($dz*$dz); if ($distSq <= $hb_dist) { $atom_cov[$pol_h[$i][1]]=1; print $idty[$pol_h[$i][1]], " ", $idty[$wat_a[$j]], $distSq,"\n"; } } } } } } }
      2. Graph Theory problem. This unfortunately is not a Graph-Theory problem, but more of a gas-phase problem. I need to find out all the 'interacting-pairs' (ie pairs of atoms close enough) to do a more complicated analysis.

      Your APPLICATION may be a gas-analysis problem, but the guts of it, finding points within a certain distance of a known point, is exactly a graph-theory problem.

        Your APPLICATION may be a gas-analysis problem, but the guts of it, finding points within a certain distance of a known point, is exactly a graph-theory problem.

        You are right. What I am trying to do is to construct the adjacency matrix efficiently (quickly), which is sparse. Unfortunately, this is where graph theory begins and my problem ends. :(

      Regarding the issue of finding H-bonds in PDB files:

      Besides all the algorithmic suggestions already given, The problem may be even "smaller" than it looks. If your PDB file has water molecules and the protein labeled properly, you now have to consider only water atom-protein atom pairs, instead of every possible pair. And not even every protein atom, if you restrict your definition of H-bond to the typical O...H or N...H.

        Yes, at the moment I am looking only for the distance, though I am leaving the in a shape to impliment angle when required. Even for that it was taking too long a time. :( Reason being the number of atoms is about 20,000. I am looking for something that can work upto 100K

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://352760]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2014-04-16 19:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (433 votes), past polls