Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

UPDATE! I fixed it :D

by FluffyBunny (Acolyte)
on Nov 06, 2009 at 22:11 UTC ( [id://805593]=note: print w/replies, xml ) Need Help??


in reply to Re^4: how to compare two hashes with perl?
in thread how to compare two hashes with perl?

Could you please take a look at this and let me know if there's any logical error? I think it works well.. Also if you can give me any suggestions, please do not hesitate to reply / message. Thank you so much!! (***Special thanks to BioLion! You're awesome :D***)
#!/usr/bin/perl use warnings; # Perl interpreter command use strict; my %bow1 = (); my $file1 = shift; open (FILE1, "$file1")|| die "Failed to open $file1 for reading : $!"; + # Open first file while (<FILE1>) { # Reading first hash my ($ID, undef, undef, undef, $Seq) = split; $bow1{$ID}[0] = $ID; $bow1{$ID}[1] = $Seq; } close FILE1 || die "Failed to close $file1 : $!"; my %bow2 = (); my $file2 = shift; open (FILE2, "$file2") || die "Failed to open $file2 for reading : $!" +; # Open first file while (<FILE2>) { # Reading second hash my ($ID, undef, undef, undef, $Seq) = split; $bow2{$ID}[0] = $ID; $bow2{$ID}[1] = $Seq; } close FILE2 || die "Failed to close $file2 : $!"; print"Match status\t$file1 ID\t$file1 Sequence\t$file2 ID\t$file2 Sequ +ence\n"; # Print title my $totalCount=0; #initialize variables for counting my $identical=0; my $diffSeq=0; my $unique=0; foreach my $ID (keys %bow1){ # can use (sort keys %hash) to put items +in a specified order if (exists $bow2{$ID}[0] ){ if ( $bow1{$ID}[0] eq $bow2{$ID}[0] ){ ## id and sequence are stored as key value pairs if ( $bow1{$ID}[1] eq $bow2{$ID}[1] ){ #print "Identical\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$ID}[0 +]\t$bow2{$ID}[1]\n"; #display ID and sequences -->too many: commente +d out $identical=$identical+1; #count identical pairs } else{ print "SameID, DiffSeq\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t$bow2{$I +D}[0]\t$bow2{$ID}[1]\n"; #display ID and sequences $diffSeq=$diffSeq+1; #count pairs with different sequences but + identical IDs } } } else { print "Unique\t$bow1{$ID}[0]\t$bow1{$ID}[1]\t - \t - \n"; #display + ID and sequences $unique=$unique+1; #count unique IDs from first file } } $totalCount = $identical + $diffSeq + $unique; #total count - should m +atch with total ID in first file print "Identical\tSeq is different\tUnique in $file1\tTotal\n"; #print + title print "$identical\t$diffSeq\t$unique\t$totalCount\n"; #print numbers exit;

Replies are listed 'Best First'.
Re: UPDATE! I fixed it :D
by BioLion (Curate) on Nov 09, 2009 at 12:08 UTC

    You are still comparing the IDs twice (checking for key existence and then comparing if ( $bow1{$ID}[0] eq $bow2{$ID}[0] ){..., so you can streamline more there.

    Also, you don't need to read in both files this way, but simply read in the first, then compare the ids and sequences of the second as you read them in, this will help you with memory issues is your files are huge.

    Also, just as a bit of error checking, that you aren't overwriting any IDs, i would check for existence of the ID in the hash when you read in the file :

    while (<$fh1>) { # Reading first hash my ($id, undef, undef, undef, $seq) = split; if ( exists $bow1{$ID} ){ warn "ID \'$id\' already exists for hash 1!\nThe current sequen +ce value is \'$bow1{$id}\', which would be replaced with \'$seq\'.\n" +; .... handle the error better (maybe ignore if sequences are the + same?) .... next; ## skip it } $bow1{$id} = $seq; } close $fh1 || die "Failed to close $file1 : $!";

    Lastly, and this is just a matter of style, but this is important for writing code that will last and other people can read, in Perl variable names are usually all lower case, with words separated by an underscore ( <c>$this_that</\> ). All caps variables are reserved for global variables. This is just convention, but it makes your code easier to interpret, not just for others, but for yourself 6 months down the line...

    Well done for working it out though, and thanks for posting it here!

    Just a something something...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://805593]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-04-18 02:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found