I have the following dataset, where each DNA (ACGT) string has its corresponding value. The values comes in group and TAB separated. Given the length K string, there will be K group. Each group contain 4 values, which correspond to A,C,G,T respectively.
AGAC <TAB> 9 -29 -39 -37 <TAB> 27 -28 -39 -37 <TAB> 26 -27 -39 -37
+ <TAB> 27 -27 -39 12
What I want to do is to extract the corresponding base value of the given DNA string. Hence with the given string above the desired output is:
$VAR = [9,-39, 26, -27];
Note that tag length may be greater than four (up to 100 bp). Is there a fast way to achieve this?
For there are millions of such lines.
neversaint and everlastingly indebted.......