in reply to To count letters (%identity) in DNA alignment

I've been known to reinvent wheels from time to time, but this particular wheel has been around the block a few times...

CPAN. Specifically Bio::Matrix::PSM::IO::masta. Per the docs, this module will

convert a set of aligned sequences:
ACATGCAT ACAGGGAT ACAGGCAT ACCGGCAT
to a PFM (SiteMatrix object)

Given the number of PFM-related modules within Bioperl I would imagine anything you want to do is already written.

Unfortunately I don't have time to provide an example. However, I would be very interested to see what you come up with if you go this route. Please post your solution when you are done so others can learn from your experience.

HTH

Replies are listed 'Best First'.
Re^2: To count letters (%identity) in DNA alignment
by sedm1000 (Initiate) on Jan 27, 2009 at 22:39 UTC
    Thanks for your input - very useful. I skipped the BioPerl modules as I wanted the script to be passed around non-BioPerl users. After further help and modification the final product was.
    use strict; use warnings; my $in = $ARGV[0]; my $frequencies; my $maxCol = 0; open (IN, "$in") or die "cannot open $in!\n"; while (my $line = <IN>) { chomp $line; my ($name, $seq) = split ("\t", $line); if (defined $seq) { my @bases = split '', $seq; $maxCol=@bases; for (my $i=0; $i<@bases; $i++) { ++$frequencies->{$bases[$i]}[$i]; # if ($maxCol < @bases) { # $maxCol = @bases; # } } } } print_hoa($frequencies, $maxCol); #foreach my $base (keys %frequencies) { # $frequencies{$base}[$_] ||= 0 for 0 .. $maxCol; # print "$base @{$frequencies{$base}}\n"; #} ##Subroutine that prints a hash fo arrays reference. sub print_hoa { my ($hashref, $len) = @_; for my $person (sort keys %{$hashref}) { my $array = $hashref->{$person}; my @array_print = @$array; print "$person\t"; foreach my $num (@array_print) { if (defined $num) { print "$num "; } else { print "0 "; } } my $i =@array_print; if ($i<$len) { my $z_pr = $len-$i; print "0 " x $z_pr, "\n"; } else { print "\n"; } } }