<?xml version="1.0" encoding="windows-1252"?>
<node id="835568" title="how to make this code more efficient? Need to Compute the Min and Max of a Temperature using an equation" created="2010-04-19 14:37:19" updated="2010-04-19 14:37:19">
<type id="115">
perlquestion</type>
<author id="798405">
BhariD</author>
<data>
<field name="doctext">
&lt;p&gt; Here is a part of the code that I have pasted followed by my question.  This program uses, perl and one function below uses bioperl to obtain all the possible sequences from a given degenerate iupac string &lt;/p&gt;
&lt;p&gt; For example, I will use the iupac string here as:&lt;/p&gt;
&lt;p&gt; 'NNYDBAVDVHVHNGGNR' &lt;/p&gt;
&lt;p&gt; Here, A,C,G, or T are single characters, everything else either carries 2, 3 or 4 characters as follows: &lt;/p&gt;
&lt;p&gt; N =&gt; ACGT &lt;/p&gt; 
&lt;p&gt; Y =&gt; CT &lt;/p&gt;
&lt;p&gt; D =&gt; AGT &lt;/p&gt;
&lt;p&gt; B =&gt; CGT &lt;/p&gt;
&lt;p&gt; V =&gt; ACG &lt;/p&gt;
&lt;p&gt; H =&gt; ACT &lt;/p&gt;
&lt;p&gt; R =&gt; AG &lt;/p&gt;

&lt;p&gt;So we can see how many possible combinations can come out of this iupac string probably &gt;10000000 seqs with only A,C,G or T characters &lt;/p&gt;
&lt;/p&gt;
&lt;code&gt;

#!/usr/bin/perl
use strict;
use warnings;

use Bio::SeqIO;
use Bio::SeqFeature::Primer; 
use Bio::Tools::IUPAC;
use Bio::Tools::SeqPattern;

#some parameters for the equation to compute Tm
my $conc_primer = 0.25;
my $conc_salt = 50;
my $mgconc = 0;

my $iupac = 'NNYDBAVDVHVHNGGNR';

#this calls the function to generate all possible seqs from iupac string
my @PotSeqs = find_degenerates($iupac);
my @Tm_Thermodynamic=();

foreach my $PotSeqs(@PotSeqs){
#this call the function to compute tm for each seq
my $Tm_Thermodynamic = tm_Base_Stacking($PotSeqs,$conc_primer,$conc_salt,$mgconc);
      push(@Tm_Thermodynamic, $Tm_Thermodynamic);

      #I have to add a function here to get the min and max tm values from the Tm_thermodynamic array filled above
          # printf "%.4f", $Tm_Thermodynamic_min;
          # print "\t";
          # printf "%.4f", $Tm_Thermodynamic_max;
          # print "\n";
}

#function to get all possible combination of an iupac string
sub find_degenerates
{
        my $sequence = $_[0];
	my $seq_obj = Bio::Seq-&gt;new(-seq =&gt; $sequence, -alphabet =&gt; 'dna');
	my $stream  = Bio::Tools::IUPAC-&gt;new(-seq =&gt; $seq_obj);
 
	my @oligomers;
	while (my $uniqueseq = $stream-&gt;next_seq()) { 
               push @oligomers, $uniqueseq-&gt;seq; }
	       return @oligomers;
}

#Function to compute the Temperature for a given sequence (sequence valid only if contains A,C,G or T)

sub tm_Base_Stacking
{
        my ($c,$conc_primer,$conc_salt,$conc_mg) = @_;
        my $c_len = length($c);
        my $h=0;
        my $s=0;

        # enthalpy values
        my %array_h = (  'AA'=&gt; '-7.9',
        		 'AC'=&gt; '-8.4',
                         'AG'=&gt; '-7.8',
                         'AT'=&gt; '-7.2',
                         'CA'=&gt; '-8.5',
                         'CC'=&gt; '-8.0',
                         'CG'=&gt; '-10.6',
                         'CT'=&gt; '-7.8',
                         'GA'=&gt; '-8.2',
                         'GC'=&gt; '-10.6',
                         'GG'=&gt; '-8.0',
                         'GT'=&gt; '-8.4',
                         'TA'=&gt; '-7.2',
                         'TC'=&gt; '-8.2',
                         'TG'=&gt; '-8.5',
                         'TT'=&gt; '-7.9'
                    );

	# entropy values
        my %array_s = (  'AA'=&gt; '-22.2',
        		 'AC'=&gt; '-22.4',
                         'AG'=&gt; '-21.0',
                         'AT'=&gt; '-20.4',
                         'CA'=&gt; '-22.7',
                         'CC'=&gt; '-19.9',
                         'CG'=&gt; '-27.2',
                         'CT'=&gt; '-21.0',
                         'GA'=&gt; '-22.2',
                         'GC'=&gt; '-27.2',
                         'GG'=&gt; '-19.9',
                         'GT'=&gt; '-22.4',
                         'TA'=&gt; '-21.3',
                         'TC'=&gt; '-22.2',
                         'TG'=&gt; '-22.7',
                         'TT'=&gt; '-22.2'
                    );

#correction for salt concentration      
my $salt_effect= ($conc_salt/1000)+(($conc_mg/1000) * 140);
# effect on entropy
   $s+=0.368 * ($c_len-1)* log($salt_effect);

       # compute new H and s based on sequence. 
        for(my $i=0; $i&lt;$c_len-1; $i++){
                my $subc=substr($c,$i,2);
                $h+=$array_h{$subc};
                $s+=$array_s{$subc};
        }
  
        my $rlnk = ($conc_primer/2000000000);
        my $r = log10($rlnk);
 
        #equation to compute the Tm
        my $tm=((1000*$h)/($s+(1.987*$r)))-273.15;
	return $tm;
}

#function to take the log
sub log10 {
  my $n = $_[0];
  return log($n)/log(10);
}
&lt;/code&gt;

&lt;p&gt; my question is how can I make this efficient.  I only am looking for the range that is Min and Max Tm or temperature values for a given iupac string.  Here, what I am doing is I am computing Tm for every possible seqs of a given iupac to get the accurate range, but this is too slow and inefficient.. I am running totally out of memory and it becomes worse when iupac degeneracy increases.  Is there a way I can make it efficient and still be able to get more or less an accurate range (min and max Tm values)? &lt;/p&gt; 
&lt;p&gt;
&lt;p&gt; I would appreciate any help/suggestions. Please let me know if I need to further explain anything. &lt;/p&gt;
</field>
</data>
</node>
