http://www.perlmonks.org?node_id=912011


in reply to Virus toy model^2

a probability of substituting any given position in the string with A C T G
Look at it as not only a project related to Perl since this posts an intersection of math and statistics (error rate, transition rates, transversion rates...etc), biology (simulating a molecular process of replication) and programming (implementing the computational algorithm using a programming language)

So, in each clone, what's the error (mutation) limit?, does the mutation position hold any significance? do you need to filter your genomes afterwards to select only those still-functional ones that were not corrupted by the mutations but where the mutation may present phenotypic differences?

My initial thought was to make a mutlidimensional array, where each successive round of replication would be represented as values in the new array dimension
If I understand you correctly, this type of array can be an arbitrarily-depth depending on the recursion of replication rounds, in Perl, it is not recommended you add or remove to an array while performing operations on it so you may have to initiate an array a-priori as indicated by the number of replications to be performed then populate that.

In light of the above, hidden markov models can use the transition probabilities while performing the mutations (maybe difficult when wanting to consider all the different cases out there), your best bet is to grab a statistician or to keep one handy. HMMs have been extensively used before in sequence alignment and detecting indels hence this scenario here is related.

A simpler approach is to use randomization to simulate mutations, check this discussion and its implementation or to consult the feasibility of using an already available solution.

UPDATE:Scanning the simulation program in the links I gave you; it seems to have a loose criteria on what base can be replaced with what so I followed another approach by introducing conditions on mutation types (favoring transitions rather than fatal transversions) and giving you control over the number of replication iterations then showing you the sequences aligned in proximity for good visuals

# a program that generates a 50 bp random DNA seq #and introduces 1 mutation per DNA replication use strict; use warnings; use Data::Dumper; #generating a 50 bp seq one base at a time. #you can read sets of seqs from a file instead my ($base,$virusGenome); my @nucleotide = ('a','t','g','c'); for(1..50){ $base = $nucleotide[rand(3)]; $virusGenome .=$base; } my @seqArray; #a hash table of the permissible mutations #no freqs rates, note the base case. my %mutationBounds=('a'=>'T','c'=>'G','g'=>'C','t'=>'A'); print "how many replications?"; chomp(my $rounds=<>); mutate(times=>$rounds, seq=>$virusGenome); sub mutate{ my %params=@_; print "original seq=> $params{seq}\n"; for my $i (0..$params{times}-1){ #convert seq from string to array @seqArray = split("",$params{seq}); #pick a base from a random position my $randPos = int(rand(length($params{seq}))); my $toReplace=$seqArray[$randPos]; #mutate based on %mutationBounds # assign to a seq the altered $virusGenome. my $seqMute = $params{seq}; substr ($seqMute, $randPos,1,$mutationBounds{$toReplace}); + print "mutant seq",$i+1,"=> ", $seqMute,"\n"; #which rep cycle is done. What was replaced where. #print "Cycle " ,$i+1, " $toReplace replaced w $mutationBounds +{$toReplace} @ $randPos\n"; } }; #print Dumper(\%mutationBounds); #print Dumper(\%params); #print Dumper(\@seqArray); ##OUTPUT hisham@bioslayer:~/Desktop$ perl virusToy.pl how many replications?3 original seq=> gaataggaataatagggagattgtaagagagtgggttagtatagttaata mutant seq1=> gaataggaataatagggagattgtaagagagtgggttagtatagttaaAa mutant seq2=> gaTtaggaataatagggagattgtaagagagtgggttagtatagttaata mutant seq3=> gaataggaataataCggagattgtaagagagtgggttagtatagttaata

David R. Gergen said "We know that second terms have historically been marred by hubris and by scandal." and I am a two y.o. monk today :D, June,12th, 2011...