|XP is just a number|
Weighted frequency of characters in an array of stringsby K_Edw (Beadle)
|on Jun 07, 2016 at 16:32 UTC||Need Help??|
K_Edw has asked for the
wisdom of the Perl Monks concerning the following question:
I have a script which extracts equal length DNA-sequences at user-specified positions from a genome and then calculates the frequency at which each letter (A/G/C/T) occurred at each position.
The user-specific positions can also come with a frequency - how many times that point was seen.
If this frequency is ignored (i.e. position-weighted analysis), the script runs at a satisfactory speed. However, it is when the frequencies are taken into consideration that it is considerably slow.
My approach for this has been:
Which is evidently inefficient and quite literal - i.e. add that sequence X(freq) number of times into the array.
Is there a better way to do weighted frequency calculations and if so, how?
EDIT: The sequences can range from 30 to 250 characters. Weighting occurs at an average of 26, but with a range of 5-3565 for one particular sample. In any given run, around 200,000 unique positions exist within the input data.