Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

string mutation script behaving erraticaly

by zing (Beadle)
on Jun 25, 2012 at 19:05 UTC ( #978243=perlquestion: print w/ replies, xml ) Need Help??
zing has asked for the wisdom of the Perl Monks concerning the following question:

The code that i wrote for mutation of a hardcoded string based on random number isnt working properly. Heres what the code does. The string to be mutated has already been provided inside the code(line 30). The string is made of 20 alphabets(line 5-24). %nuc (line 4)is a 24x24 matrix made in hash. The program generates a random number between 0 and 1 for length times the input string (length $string). It then checks this value in the corresponding hash %nuc. In our case since the $string has been hardcoded consisting of only "A", therefore each time a random number is generated the program goes to line 17 only. It then traverses the line 17 horizontally until a number greater than the random number is encountered. The value of the counter is noted when a number greater than the random number is encountered. For example in case our our $string which has only "A", it will mutate to another alphabet only if the random number is greater than .9799 otherwise it will remain as "A" only. The biggest problem with this code is that hashes dont get printed in the same order you store them,this shuffling leads to erratic results. I tried rearranging the rows of hash %nuc but it still didnt help. Another error is that Im getting an alphabet more in the output string than the hardcoded string.
#!/usr/bin/perl -w use Time::HiRes qw(usleep nanosleep); %nuc = ( F => [qw( .0001 .0001 .0001 .0000 .0000 .0000 .0000 .0001 .0002 .0008 + .0006 .0000 .0004 .9944 .0000 .0000 .0001 .0003 .0028 .0000)], T => [qw( .0022 .0002 .0013 .0004 .0001 .0003 .0002 .0002 .0001 .0011 + .0002 .0008 .0006 .0001 .0005 .0032 .9874 .0000 .0002 .0009)], N => [qw( .0004 .0001 .9867 .0036 .0000 .0004 .0006 .0006 .0021 .0003 + .0001 .0013 .0000 .0001 .0002 .0020 .0009 .0001 .0004 .0001)], V => [qw( .0013 .0002 .0001 .0001 .0003 .0002 .0002 .0003 .0003 .0057 + .0011 .0001 .0017 .0001 .0003 .0002 .0010 .0000 .0002 .9866)], K => [qw( .0002 .0037 .0025 .0006 .0000 .0012 .0007 .0002 .0002 .0004 + .0001 .9858 .0020 .0000 .0003 .0008 .0011 .0000 .0001 .0001)], E => [qw( .0010 .0000 .0007 .0056 .0000 .0035 .9865 .0004 .0002 .0003 + .0001 .0004 .0001 .0000 .0003 .0004 .0002 .0000 .0001 .0002)], Y => [qw( .0001 .0000 .0003 .0000 .0003 .0000 .0001 .0000 .0004 .0001 + .0001 .0000 .0000 .0021 .0000 .0001 .0001 .0002 .9960 .0001)], Q => [qw( .0003 .0010 .0004 .0005 .0000 .9901 .0027 .0001 .0024 .0001 + .0003 .0006 .0004 .0000 .0006 .0002 .0002 .0000 .0000 .0001)], I => [qw( .0002 .0002 .0003 .0001 .0002 .0001 .0002 .0000 .0000 .9915 + .0009 .0002 .0012 .0007 .0000 .0001 .0007 .0000 .0001 .0033)], C => [qw( .0001 .0001 .0000 .0000 .9987 .0000 .0000 .0000 .0001 .0001 + .0000 .0000 .0000 .0000 .0001 .0005 .0001 .0000 .0000 .0002)], L => [qw( .0003 .0001 .0003 .0000 .0000 .0006 .0001 .0001 .0004 .0022 + .9871 .0002 .0045 .0013 .0003 .0001 .0003 .0004 .0002 .0015)], M => [qw( .0001 .0001 .0000 .0000 .0000 .0002 .0000 .0000 .0000 .0005 + .0009 .0005 .9968 .0001 .0000 .0001 .0002 .0000 .0000 .0005)], A => [qw( .9799 .0002 .0009 .0010 .0003 .0008 .0017 .0021 .0002 .0006 + .0004 .0002 .0006 .0002 .0022 .0035 .0032 .0000 .0002 .0018)], W => [qw( .0000 .0002 .0000 .0000 .0000 .0000 .0000 .0000 .0000 .0000 + .0000 .0000 .0000 .0001 .0000 .0001 .0000 .9995 .0001 .0000)], S => [qw( .0028 .0011 .0034 .0007 .0011 .0004 .0006 .0016 .0002 .0002 + .0001 .0007 .0004 .0003 .0017 .9800 .0038 .0005 .0002 .0002)], P => [qw( .0013 .0005 .0002 .0001 .0001 .0008 .0003 .0002 .0005 .0001 + .0002 .0002 .0001 .0001 .9935 .0012 .0004 .0000 .0000 .0002)], H => [qw( .0001 .0008 .0018 .0003 .0001 .0020 .0001 .0000 .9933 .0000 + .0001 .0001 .0000 .0002 .0003 .0001 .0001 .0001 .0004 .0001)], D => [qw( .0006 .0000 .0042 .9869 .0000 .0006 .0053 .0006 .0004 .0001 + .0000 .0003 .0000 .0000 .0001 .0005 .0003 .0000 .0000 .0001)], R => [qw( .0001 .9929 .0001 .0000 .0001 .0010 .0000 .0000 .0010 .0003 + .0001 .0019 .0004 .0001 .0004 .0006 .0001 .0008 .0000 .0001)], G => [qw( .0021 .0001 .0012 .0011 .0001 .0003 .0007 .9906 .0001 .0000 + .0001 .0002 .0001 .0001 .0003 .0021 .0003 .0000 .0000 .0005)]); #SVEQRISTDIGQAYQLQGLGSNLRSIRSKTGAGEVNYIDAAKSVNDNQLLAEIG $string='AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'; @seq=split(//,$string); $size=scalar(@seq); print "$size\n"; foreach $seq (@seq) { &mutate($seq) ; print "##########SEQ=$seq\tRANDOM=$ran\tK=$k################"; usleep (1000000); } $k=0; $p=0; sub mutate # Start Of Sub { $ran=`awk 'BEGIN {{srand()} {print rand()}}'`; for ($j = 1; $j <= 20;$j++) { $mut[$j]=$nuc{$seq}[$j]; } for ($i=1; $i<=20; $i++) { print "$mut[$i]\t$ran\t"; if ($ran < $mut[$i]) {#print "\n$mut[$i] \t hello"; push (@names, $i); print "i=$i"; #push (@names, '-'); } last if $ran < $mut[$i]; print"\tran=$ran\t muti=$mut[$i]"; $k =$mut[$i]; print "K = $k"; } $p++; #counter for serial number } # End Of Sub # $nam=join ' ',@names; # for ($j = 0; $j <=$size;$j++) # {print $names[$j];} print"\n"; for ($j = 0; $j < $size;$j++) { if($names[$j]==1) {$names[$j]="A";} elsif($names[$j]==2) {$names[$j]="R";} elsif($names[$j]==3) {$names[$j]="N";} elsif($names[$j]==4) {$names[$j]="D";} elsif($names[$j]==5) {$names[$j]="C";} elsif($names[$j]==6) {$names[$j]="Q";} elsif($names[$j]==7) {$names[$j]="E";} elsif($names[$j]==8) {$names[$j]="G";} elsif($names[$j]==9) {$names[$j]="H";} elsif($names[$j]==10) {$names[$j]="I";} elsif($names[$j]==11) {$names[$j]="L";} elsif($names[$j]==12) {$names[$j]="K";} elsif($names[$j]==13) {$names[$j]="M";} elsif($names[$j]==14) {$names[$j]="F";} elsif($names[$j]==15) {$names[$j]="P";} elsif($names[$j]==16) {$names[$j]="S";} elsif($names[$j]==17) {$names[$j]="T";} elsif($names[$j]==18) {$names[$j]="W";} elsif($names[$j]==19) {$names[$j]="Y";} else {$names[$j]="V";} } print "orig = $string \nmuta = "; for ($j = 0; $j <=$size;$j++) {print $names[$j];} @arr = %nuc; print "@arr";

Comment on string mutation script behaving erraticaly
Download Code
Re: string mutation script behaving erraticaly
by toolic (Chancellor) on Jun 25, 2012 at 19:33 UTC
    The biggest problem with this code is that hashes dont get printed in the same order you store them,
    Tie::IxHash: "This Perl module implements Perl hashes that preserve the order in which the hash elements were added."
Re: string mutation script behaving erraticaly
by McA (Curate) on Jun 25, 2012 at 20:17 UTC
    Hi
    my first hints:
    a) use 'use strict;' to protect yourself. Yes, this advice is annoying but helpful.
    b) The script gives warnings on runtime: Follow them.
    c) The line for ($j = 1; $j <= 20;$j++) is IMHO wrong as a perl array is accessed with index starting with 0, so for ($j = 0; $j < 20;$j++).
    d) Why do you use an external program to generate random numbers? See perldoc -f rand.
    e) Don't initialize the random numer generator before every rand().
    f) Your code to instantiate the %nuc-hash is in a way that you get an array of strings. Afterwards you convert them implicitly to float numbers. So instantiate them acordingly.
    %nuc = ( 'F' => [0.001, 0.001, ...], ... )
    g) Instead of programming this "ugly" if-elsif-else-statement at the end use a mapping-hash. It's better readable, it's better maintainable and it's less error prone.
    my %mapping = ( 1 => "A", 2 => "R", 3 => "N", ... ); if(exists $mapping{$names[$j]}) { $names[$j] = $mapping{$names[$j]}; } else { # Default value (last else-Statement) $names[$j]="V"; }
Re: string mutation script behaving erraticaly
by choroba (Abbot) on Jun 25, 2012 at 20:45 UTC
    Can you be more specific? Can you describe the whole algorithm? I do not understand how a character should be changed.

      choroba actually the hash named %nuc ( line 4) was supposed to be a PAM matrix but isnt properly ordered,coz if it were so then you would be looking at this matrix -------- http://homepages.rpi.edu/~zukerm/MATH-4961/scoring/img97.gif

      .If you look at this figure, u can easily make out that in order to mutate the alphabet A (Ala) to R (Arg) the random number generated should be > 9867 but < 9869 (i.e. 9867+2) Similarly for A to N we need > 9867 but < 9878 (i.e. 9867+2+9). Thus we are doing a cumulative sum.

      What I was trying to do with my code was that I was comparing the random number generated for an alphabet with the progressive cumulative sum for that corresponding alphabet in the %nuc. Now this comparison was done under a counter. For example if the random number generated was .9877,and the alphabet to be mutated is A,then the value of counter will be 3(see above). Now finally what I did was I collected these respective counters(numeric) for each of the alphabets. Then I tried converting them to alphabets based on their arrangements in the %nuc.

      And this is where the code fails, because the hashes get jumbled up,they never behave in the same order I entered them.
        Oh, I see. Then maybe this can help you: Update: bugfix.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://978243]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2014-09-23 03:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls