http://www.perlmonks.org?node_id=1007821

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello fellow Monks! I need your help to discover the bug in my script... It should something really silly, but I can't seem to be able to detect it..
So, suppose you have a file like the following:
LA5 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGCG +----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCTG +GTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CAG +GTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGTA +TGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTGG +GTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGACA +CCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTAT +TCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GACT +AACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTCC +TACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA-- +CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG-- +CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTTC +CGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAAA +CGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGGT +ATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---ATC +GATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCAG +CCGCAGGCT- RKS5078 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAATGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT- 06-0676 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT- 58-6482 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT- 648905 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTA +GCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGG +CTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG- +CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCC +GTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAAC +TGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAG +ACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGG +TATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----G +ACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTT +TCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGT +A--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAA +G--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGG +TTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAG +AAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGT +GGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--- +ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACT +CAGCCGCAGGCT- 8b-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGC +G----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCT +GGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CA +GGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGT +ATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTG +GGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGAC +ACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTA +TTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GAC +TAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTC +CTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA- +-CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG- +-CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTT +CCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAA +ACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGG +TATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---AT +CGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCA +GCCGCAGGCT- 22510-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT-

and you want to count, per position, how many A, T, G, C or - you have.
I have written this:
@all_seqs = (); while(<>) { if($_=~/(.*)\t(.*)/) { $id=$1; $seq=$2; push @all_seqs, $seq; } } for ($i=0; $i<=$#all_seqs; $i++) { $seq_to_examine=$all_seqs[$i]; @split_seq_to_examine=split(//, $seq_to_examine); for($j=0; $j<=1108; $j++) { if ($split_seq_to_examine[$j] eq 'A') {$count_A++;} elsif ($split_seq_to_examine[$j] eq 'T') {$count_T++;} elsif ($split_seq_to_examine[$j] eq 'C') {$count_C++;} elsif ($split_seq_to_examine[$j] eq 'G') {$count_G++;} elsif ($split_seq_to_examine[$j] eq '-') {$count_non++;} print $j."\t".$count_A."\t".$count_T."\t".$count_C."\t".$count_G." +\n"; } }

but it keeps increasing the counters, and not reporting position-by-position. I think I must somewhere set the counters to 0, but the positions I tried just made the script worse...