Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello fellow Monks! I need your help to discover the bug in my script... It should something really silly, but I can't seem to be able to detect it..
So, suppose you have a file like the following:
LA5 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGCG
+----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCTG
+GTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CAG
+GTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGTA
+TGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTGG
+GTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGACA
+CCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTAT
+TCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GACT
+AACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTCC
+TACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA--
+CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG--
+CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTTC
+CGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAAA
+CGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGGT
+ATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---ATC
+GATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCAG
+CCGCAGGCT-
RKS5078 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAATGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
06-0676 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
58-6482 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
648905 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTA
+GCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGG
+CTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-
+CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCC
+GTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAAC
+TGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAG
+ACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGG
+TATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----G
+ACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTT
+TCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGT
+A--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAA
+G--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGG
+TTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAG
+AAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGT
+GGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---
+ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACT
+CAGCCGCAGGCT-
8b-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGTAGC
+G----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGGGCT
+GGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG-CA
+GGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGCCGT
+ATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAACTG
+GGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CAGAC
+ACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCGGTA
+TTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----GAC
+TAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGTTTC
+CTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAGTA-
+-CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGAAG-
+-CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACGGTT
+CCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGAGAA
+ACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACGTGG
+TATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG---AT
+CGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAACTCA
+GCCGCAGGCT-
22510-1 ATGAAAAAGAC--AGCTATCGCGATTGCAGTGGCACTGGCTGGTTTCGCTAC----CGT
+AGCG----CAGGCCG----------CTCCGAAAGATAACACCTGGTACGCTGGTGCT-----AAACTGG
+GCTGGTCTCAGTACCATGACACCGGCTTCATTCACAATGATGGCCCGACTCATGAAAACCAACTGGGCG
+-CAGGTGCTTTTGGTGGTTACCAGGTTAACCCGTATGTTGGCTTTGAAATGGGCTACGACTGGTTAGGC
+CGTATGCCGTACAAAGGCGACAACATCAATGGCGCTTATAAAGCTCAGGGCGTTCAGTTGACCGCTAAA
+CTGGGTTATCCAATCACTGACGATCTGGACG--TTTATACCCGTCTGGGTGGTATGGTATGGCGTG-CA
+GACACCAAGTCTAACGTCCCTGGC------GGCCCGTCTACTAAAGACCACGACACCGGCGTTTCCCCG
+GTATTCGCGGGCGGTATCGAGTATGCCATCACCCCTGAAATCGCAACCCGTCTGGAATACCAGTG----
+GACTAACAACATCGGTGATGCCAACACCATCGGCACCCGTCCGGACAACGGCCTGCTGAGCGTAGGTGT
+TTCCTACCGTTTCGGCCAGCAAGAAGCTGCTC-CGGTAGTAGCTCCGGCACCGGCTCCGGCTCCGGAAG
+TA--CAG---ACCAAGCACTTCACTCT-GAAGTCTGACGTACTGTTCAACTTCAACAAATCTACCCTGA
+AG--CCGGAAGGCCAGCAGGCT-CTGGATCAGCTGTACAGCCAGCTGAGCAACCTGGATCCGAAAGACG
+GTTCCGTTGTCGTTCTGGGCTTCACTGACCGTATCGGTTCTGACGC-TTACAACCAGGGTCTGT-CCGA
+GAAACGTGCTCAGTCTGTTGTTGATTACCTGATCTCCAAAGGTATTCCGTCTGACAAAATCTCCGCACG
+TGGTATGGGCGAATCTAACCCGGTTACCGGCAACACCTGTGACAACGTGAAACCTCGCGCTGCCCTG--
+-ATCGATTGCCTGGCT-CCGGATCGTCGCGTAGAGATCGAAGTTAAAG--GCGTTAAAGACGTGGTAAC
+TCAGCCGCAGGCT-
and you want to count, per position, how many A, T, G, C or - you have.
I have written this:
@all_seqs = ();
while(<>)
{
if($_=~/(.*)\t(.*)/)
{
$id=$1;
$seq=$2;
push @all_seqs, $seq;
}
}
for ($i=0; $i<=$#all_seqs; $i++)
{
$seq_to_examine=$all_seqs[$i];
@split_seq_to_examine=split(//, $seq_to_examine);
for($j=0; $j<=1108; $j++)
{
if ($split_seq_to_examine[$j] eq 'A') {$count_A++;}
elsif ($split_seq_to_examine[$j] eq 'T') {$count_T++;}
elsif ($split_seq_to_examine[$j] eq 'C') {$count_C++;}
elsif ($split_seq_to_examine[$j] eq 'G') {$count_G++;}
elsif ($split_seq_to_examine[$j] eq '-') {$count_non++;}
print $j."\t".$count_A."\t".$count_T."\t".$count_C."\t".$count_G."
+\n";
}
}
but it keeps increasing the counters, and not reporting position-by-position. I think I must somewhere set the counters to 0, but the positions I tried just made the script worse...
Re: What is wrong in this code???
by moritz (Cardinal) on Dec 07, 2012 at 20:47 UTC
|
I think I must somewhere set the counters to 0, but the positions I tried just made the script worse...
If you want to count the number of symbols per line or record, you must reset the counters before counting the symbols in each line/record.
| [reply] [Watch: Dir/Any] |
|
Yes but where, can you tell me? I tried printing outside the loop, but printed statistics only for the last position (all sequences have the same number of letters, and I want to keep statistics for each position, but for all the sequences.
So for example, since I have 200 sequences in my file, for position 56, I could have 150 A, 40 C, 25 G and 35 -.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: What is wrong in this code???
by Lotus1 (Vicar) on Dec 07, 2012 at 22:01 UTC
|
use warnings;
use strict;
print "LATCG-\n";
while(<DATA>) {
print $. . tr/A// . tr/T// . tr/C// . tr/G// . tr/-// . "\n";
}
__DATA__
ATGAAAAAGAC
AGCTATCGCGA
TTGCAGTGGCA
CTGGCTGGTTT
CGCTACCGTAG
CGC-A---TAG
Output:
LATCG-
171120
232330
323240
405240
522430
621224
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: What is wrong in this code???
by Kenosis (Priest) on Dec 07, 2012 at 20:56 UTC
|
use strict;
use warnings;
my %sequences;
while (<>) {
for my $char ( split '', (split)[1] ) {
$sequences{$char}++;
}
}
print "$_ => $sequences{$_}", "\n" for keys %sequences;
Output on your data set:
- => 406
A => 1708
T => 1639
C => 2057
G => 1946
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Thanks, but that is not what I need. What I must have, is the number of A, C, T, G and - per POSITION. So the output would be something like:
POS #1 A:150 | T:50 | G:0 | C:20 | -:30
POS 21 A:80 | T:60 | G:40 | C:20 | -:0
etc
So the numbers must add up the number of sequences, each time... | [reply] [Watch: Dir/Any] [d/l] |
|
use strict;
use warnings;
my %sequences;
while (<>) {
my $i = 1;
for my $char ( split '', (split)[1] ) {
$sequences{ $i++ }{$char}++;
}
}
for my $pos ( sort { $a <=> $b } keys %sequences ) {
my @charCount;
while ( my ( $char, $count ) = each %{ $sequences{$pos} } ) {
push @charCount, "$char:$count";
}
print "POS #$pos " . ( join ' | ', @charCount ) . "\n";
}
This is the output:
POS #1 a:4
POS #2 q:1 | b:3
POS #3 c:4
POS #4 x:1 | d:2 | z:1
POS #5 e:4
POS #6 f:4
POS #7 c:1 | a:1 | b:1 | d:1
POS #8 h:4
POS #9 i:4
POS #10 j:4
Run on this data:
12 abcdefahij
12 abcdefbhij
12 aqcxefchij
12 abczefdhij
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
use strict;
use warnings;
while (<>) {
my %sequences;
for my $char ( split '', (split)[1] ) {
$sequences{$char}++;
}
print "$_ => $sequences{$_}", "\n" for keys %sequences;
}
Note how I declared the hash inside the loop. Scoping variables correctly is the easiest way to reset them. | [reply] [Watch: Dir/Any] [d/l] |
Re: What is wrong in this code???
by thundergnat (Deacon) on Dec 07, 2012 at 21:45 UTC
|
use strict;
use warnings;
my @gene;
while (<>) {
my $i = 0;
for my $char ( split '', (split)[1] ) {
$gene[$i++]{$char}++;
}
}
my $i = 0;
for my $pos (@gene){
print "\nPOS: ", $i++;
for my $key ( sort keys %$pos){
print " | $key => ", $pos->{$key};
}
};
| [reply] [Watch: Dir/Any] [d/l] |
Re: What is wrong in this code???
by CountZero (Bishop) on Dec 08, 2012 at 11:28 UTC
|
use Modern::Perl;
use Data::Dump qw /dump/;
my @results;
while (<DATA>) {
my @codon = split '', (split /\s+/)[1];
while (my ($seq, $char) = each @codon) {
$results[$seq]{$char}++;
}
}
say dump(@results);
__DATA__
first ACGATT--GAT
second AGTTAAC-TTT
third TTAGCAGG-TA
Output:(
{ A => 2, T => 1 },
{ C => 1, G => 1, T => 1 },
{ A => 1, G => 1, T => 1 },
{ A => 1, G => 1, T => 1 },
{ A => 1, C => 1, T => 1 },
{ A => 2, T => 1 },
{ "-" => 1, "C" => 1, "G" => 1 },
{ "-" => 2, "G" => 1 },
{ "-" => 1, "G" => 1, "T" => 1 },
{ A => 1, T => 2 },
{ A => 1, T => 2 },
)
Alternatively you can write the while loop more compact (but less easy to read)while (<DATA>) {
my $seq;
$results[$seq++]{$_}++ for split '', (split /\s+/)[1];
}
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: What is wrong in this code???
by brap (Pilgrim) on Dec 07, 2012 at 21:08 UTC
|
Hey anonymous,
I could be way off base here, but it almost sounds like you want an array of counters, where each element of the array represents the position of the line you're counting. Something along the order of (untested):
if ($split_seq_to_examine[$j] eq 'A') {$count_A[$j]++;}
elsif ($split_seq_to_examine[$j] eq 'T') {$count_T[$j]++;}
elsif ($split_seq_to_examine[$j] eq 'C') {$count_C[$j]++;}
elsif ($split_seq_to_examine[$j] eq 'G') {$count_G[$j]++;}
elsif ($split_seq_to_examine[$j] eq '-') {$count_non[$j]++;}
To find out what you have in, say, position 56, look at $count_A[56], $count_T[56], $count_G[56], and $count_C[56].
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: What is wrong in this code???
by BillKSmith (Monsignor) on Dec 08, 2012 at 20:02 UTC
|
A much shorter solution (probably slower) uses a regexp to copy all the A's to a temp array and then uses the size of that array as the count. Repeat for other letters.
use strict;
use warnings;
use Data::Dumper qw(Dumper);
my @letters = ('A', 'T', 'G', 'C', '-');
while (my $seq = <>){
print $seq;
my @temp;
my %cnts = map {($_, scalar ( @temp = $seq =~ /($_)/g))} @letters;
print Dumper \%cnts;
}
| [reply] [Watch: Dir/Any] [d/l] |
|
|