Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello fellow Monks! I need your help to discover the bug in my script... It should something really silly, but I can't seem to be able to detect it..
So, suppose you have a file like the following:
and you want to count, per position, how many A, T, G, C or - you have.
I have written this:
but it keeps increasing the counters, and not reporting position-by-position. I think I must somewhere set the counters to 0, but the positions I tried just made the script worse...
So, suppose you have a file like the following:
LA5 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGCG +----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCTG +GTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CAG +GTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGTA +TGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTGG +GTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGACA +CCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTAT +TCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GACT +AACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTCC +TACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA-- +CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG-- +CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTTC +CGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAAA +CGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGGT +ATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---ATC +GATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCAG +CCGCAGGCT- RKS5078 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAATGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT- 06-0676 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT- 58-6482 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT- 648905 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTA +GCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGG +CTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG- +CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCC +GTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAAC +TGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAG +ACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGG +TATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----G +ACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTT +TCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGT +A--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAA +G--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGG +TTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAG +AAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGT +GGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--- +ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACT +CAGCCGCAGGCT- 8b-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGC +G----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCT +GGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CA +GGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGT +ATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTG +GGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGAC +ACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTA +TTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GAC +TAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTC +CTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA- +-CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG- +-CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTT +CCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAA +ACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGG +TATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---AT +CGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCA +GCCGCAGGCT- 22510-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT +AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG +GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG +-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC +CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA +CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA +GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG +GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG---- +GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT +TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG +TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA +AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG +GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA +GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG +TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG-- +-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC +TCAGCCGCAGGCT-
and you want to count, per position, how many A, T, G, C or - you have.
I have written this:
@all_seqs = (); while(<>) { if($_=~/(.*)\t(.*)/) { $id=$1; $seq=$2; push @all_seqs, $seq; } } for ($i=0; $i<=$#all_seqs; $i++) { $seq_to_examine=$all_seqs[$i]; @split_seq_to_examine=split(//, $seq_to_examine); for($j=0; $j<=1108; $j++) { if ($split_seq_to_examine[$j] eq 'A') {$count_A++;} elsif ($split_seq_to_examine[$j] eq 'T') {$count_T++;} elsif ($split_seq_to_examine[$j] eq 'C') {$count_C++;} elsif ($split_seq_to_examine[$j] eq 'G') {$count_G++;} elsif ($split_seq_to_examine[$j] eq '-') {$count_non++;} print $j."\t".$count_A."\t".$count_T."\t".$count_C."\t".$count_G." +\n"; } }
but it keeps increasing the counters, and not reporting position-by-position. I think I must somewhere set the counters to 0, but the positions I tried just made the script worse...
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: What is wrong in this code???
by moritz (Cardinal) on Dec 07, 2012 at 20:47 UTC | |
by Anonymous Monk on Dec 07, 2012 at 20:59 UTC | |
by moritz (Cardinal) on Dec 08, 2012 at 00:15 UTC | |
Re: What is wrong in this code???
by Lotus1 (Vicar) on Dec 07, 2012 at 22:01 UTC | |
Re: What is wrong in this code???
by Kenosis (Priest) on Dec 07, 2012 at 20:56 UTC | |
by Anonymous Monk on Dec 07, 2012 at 21:06 UTC | |
by Kenosis (Priest) on Dec 07, 2012 at 22:50 UTC | |
by Anonymous Monk on Dec 07, 2012 at 21:33 UTC | |
Re: What is wrong in this code???
by thundergnat (Deacon) on Dec 07, 2012 at 21:45 UTC | |
Re: What is wrong in this code???
by CountZero (Bishop) on Dec 08, 2012 at 11:28 UTC | |
Re: What is wrong in this code???
by brap (Pilgrim) on Dec 07, 2012 at 21:08 UTC | |
Re: What is wrong in this code???
by BillKSmith (Monsignor) on Dec 08, 2012 at 20:02 UTC |
Back to
Seekers of Perl Wisdom