UPDATE! I fixed it :D

Could you please take a look at this and let me know if there's any logical error? I think it works well.. Also if you can give me any suggestions, please do not hesitate to reply / message. Thank you so much!! (***Special thanks to BioLion! You're awesome :D***)

#!/usr/bin/perl
use warnings; # Perl interpreter command
use strict;


my %bow1 = ();
my $file1 = shift;
open (FILE1, "$file1")|| die "Failed to open $file1 for reading : $!";
+ # Open first file

while (<FILE1>) { # Reading first hash
    my ($ID, undef, undef, undef, $Seq) = split;
    $bow1{$ID}[0] = $ID;
    $bow1{$ID}[1] = $Seq;
}
close FILE1 || die "Failed to close $file1 : $!";


my %bow2 = ();
my $file2 = shift;

open (FILE2, "$file2") || die "Failed to open $file2 for reading : $!"
+; # Open first file

while (<FILE2>) { # Reading second hash
    my ($ID, undef, undef, undef, $Seq) = split;
    $bow2{$ID}[0] = $ID;
    $bow2{$ID}[1] = $Seq;
}
close FILE2 || die "Failed to close $file2 : $!";


print"Match status\t$file1 ID\t$file1 Sequence\t$file2 ID\t$file2 Sequ
+ence\n"; # Print title

my $totalCount=0; #initialize variables for counting
my $identical=0;
my $diffSeq=0;
my $unique=0;

foreach my $ID (keys %bow1){ # can use (sort keys %hash) to put items 
+in a specified order
   if (exists $bow2{$ID}[0] ){
      if ( $bow1{$ID}[0] eq $bow2{$ID}[0] ){
         ## id and sequence are stored as key value pairs
          if ( $bow1{$ID}[1] eq $bow2{$ID}[1] ){
         #print "Identical\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$ID}[0
+]\t$bow2{$ID}[1]\n"; #display ID and sequences  -->too many: commente
+d out
         $identical=$identical+1; #count identical pairs
         }
         else{
        print "SameID, DiffSeq\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$I
+D}[0]\t$bow2{$ID}[1]\n"; #display ID and sequences
        $diffSeq=$diffSeq+1; #count pairs with different sequences but
+ identical IDs
        }
      }
    }
    else {
    print "Unique\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t - \t - \n"; #display
+ ID and sequences
    $unique=$unique+1; #count unique IDs from first file
    }
}
$totalCount = $identical + $diffSeq + $unique; #total count - should m
+atch with total ID in first file
print "Identical\tSeq is different\tUnique in $file1\tTotal\n"; #print
+ title
print "$identical\t$diffSeq\t$unique\t$totalCount\n"; #print numbers

exit;
[download]

Comment on UPDATE! I fixed it :D Download Code

Replies are listed 'Best First'.

Re: UPDATE! I fixed it :D
by BioLion (Curate) on Nov 09, 2009 at 12:08 UTC

You are still comparing the IDs twice (checking for key existence and then comparing if ( $bow1{$ID}[0] eq $bow2{$ID}[0] ){..., so you can streamline more there.

Also, you don't need to read in both files this way, but simply read in the first, then compare the ids and sequences of the second as you read them in, this will help you with memory issues is your files are huge.

Also, just as a bit of error checking, that you aren't overwriting any IDs, i would check for existence of the ID in the hash when you read in the file :

while (<$fh1>) { # Reading first hash
    my ($id, undef, undef, undef, $seq) = split;
    if ( exists $bow1{$ID} ){
       warn "ID \'$id\' already exists for hash 1!\nThe current sequen
+ce value is \'$bow1{$id}\', which would be replaced with \'$seq\'.\n"
+;
       .... handle the error better (maybe ignore if sequences are the
+ same?) ....
       next; ## skip it
    }
    $bow1{$id} = $seq;
}
close $fh1 || die "Failed to close $file1 : $!";
[download]

Lastly, and this is just a matter of style, but this is important for writing code that will last and other people can read, in Perl variable names are usually all lower case, with words separated by an underscore ( <c>$this_that</\> ). All caps variables are reserved for global variables. This is just convention, but it makes your code easier to interpret, not just for others, but for yourself 6 months down the line...

Well done for working it out though, and thanks for posting it here!

Just a something something...

[reply]
[d/l]
[select]


go ahead... be a heretic
	PerlMonks