Hello fellow Monks! I need your help to discover the bug in my script... It should something really silly, but I can't seem to be able to detect it..
So, suppose you have a file like the following:
LA5 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGCG
+----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCTG
+GTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CAG
+GTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGTA
+TGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTGG
+GTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGACA
+CCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTAT
+TCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GACT
+AACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTCC
+TACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA--
+CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG--
+CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTTC
+CGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAAA
+CGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGGT
+ATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---ATC
+GATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCAG
+CCGCAGGCT-
RKS5078 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAATGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
06-0676 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
58-6482 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
648905 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTA
+GCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGG
+CTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-
+CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCC
+GTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAAC
+TGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAG
+ACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGG
+TATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----G
+ACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTT
+TCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGT
+A--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAA
+G--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGG
+TTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAG
+AAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGT
+GGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---
+ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACT
+CAGCCGCAGGCT-
8b-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGC
+G----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCT
+GGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CA
+GGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGT
+ATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTG
+GGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGAC
+ACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTA
+TTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GAC
+TAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTC
+CTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA-
+-CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG-
+-CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTT
+CCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAA
+ACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGG
+TATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---AT
+CGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCA
+GCCGCAGGCT-
22510-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
and you want to count, per position, how many A, T, G, C or - you have.
I have written this:
@all_seqs = ();
while(<>)
{
if($_=~/(.*)\t(.*)/)
{
$id=$1;
$seq=$2;
push @all_seqs, $seq;
}
}
for ($i=0; $i<=$#all_seqs; $i++)
{
$seq_to_examine=$all_seqs[$i];
@split_seq_to_examine=split(//, $seq_to_examine);
for($j=0; $j<=1108; $j++)
{
if ($split_seq_to_examine[$j] eq 'A') {$count_A++;}
elsif ($split_seq_to_examine[$j] eq 'T') {$count_T++;}
elsif ($split_seq_to_examine[$j] eq 'C') {$count_C++;}
elsif ($split_seq_to_examine[$j] eq 'G') {$count_G++;}
elsif ($split_seq_to_examine[$j] eq '-') {$count_non++;}
print $j."\t".$count_A."\t".$count_T."\t".$count_C."\t".$count_G."
+\n";
}
}
but it keeps increasing the counters, and not reporting position-by-position. I think I must somewhere set the counters to 0, but the positions I tried just made the script worse...
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|