|Pathologically Eclectic Rubbish Lister|
Re: Using grep in a scalar contextby perlhappy (Novice)
|on Feb 06, 2013 at 16:12 UTC||Need Help??|
I have a good idea of what you want to do and the data your dealing with (I do quite a lot of bioinformatics based work).
Anyway, I've written two scripts for you to look at. I've kept the code simple and commented so you should be ok with it. Over a chromosome, this mightnt be as fast as it could be but should be ok.
Firstly, you wont want to split the sequence into an array, unless you are absolutely sure you arent going to miss out a count on an odd number occurrance of the acid eg FAAAD would be split into FA-AA-D? or ?F-AA-AD so you would only count AA once whereas its actually got 2 pairs.
This is the first script. This will find only AA pairs and count them. The sequence is ASDTDAAFRASEQSAAAFDG (its in the code) so the number of AA's should be 3.
This is the second script, its a bit more complicated. Instead of only counting the number of AA's it will count all pairs. It creates these on the fly and if it encounters one it has already created it just increments the value. Just to note for the 22 possible amino acids the number of possible pairs will be much higher (484 I think)
I hope either of these do what you want