http://www.perlmonks.org?node_id=1007823


in reply to What is wrong in this code???

Try the following:

use strict; use warnings; my %sequences; while (<>) { for my $char ( split '', (split)[1] ) { $sequences{$char}++; } } print "$_ => $sequences{$_}", "\n" for keys %sequences;

Output on your data set:

- => 406 A => 1708 T => 1639 C => 2057 G => 1946

Replies are listed 'Best First'.
Re^2: What is wrong in this code???
by Anonymous Monk on Dec 07, 2012 at 21:06 UTC
    Thanks, but that is not what I need. What I must have, is the number of A, C, T, G and - per POSITION. So the output would be something like:
    POS #1 A:150 | T:50 | G:0 | C:20 | -:30 POS 21 A:80 | T:60 | G:40 | C:20 | -:0 etc

    So the numbers must add up the number of sequences, each time...

      My apologies; I did misunderstand. The following should work:

      use strict; use warnings; my %sequences; while (<>) { my $i = 1; for my $char ( split '', (split)[1] ) { $sequences{ $i++ }{$char}++; } } for my $pos ( sort { $a <=> $b } keys %sequences ) { my @charCount; while ( my ( $char, $count ) = each %{ $sequences{$pos} } ) { push @charCount, "$char:$count"; } print "POS #$pos " . ( join ' | ', @charCount ) . "\n"; }

      This is the output:

      POS #1 a:4 POS #2 q:1 | b:3 POS #3 c:4 POS #4 x:1 | d:2 | z:1 POS #5 e:4 POS #6 f:4 POS #7 c:1 | a:1 | b:1 | d:1 POS #8 h:4 POS #9 i:4 POS #10 j:4

      Run on this data:

      12 abcdefahij 12 abcdefbhij 12 aqcxefchij 12 abczefdhij
      use strict; use warnings; while (<>) { my %sequences; for my $char ( split '', (split)[1] ) { $sequences{$char}++; } print "$_ => $sequences{$_}", "\n" for keys %sequences; }

      Note how I declared the hash inside the loop. Scoping variables correctly is the easiest way to reset them.