Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Reading concurrently two files with different number of lines

by bioinformatics (Friar)
on Apr 10, 2013 at 16:09 UTC ( [id://1028006]=note: print w/replies, xml ) Need Help??


in reply to Reading concurrently two files with different number of lines

In the above code, you are limited to the length of the shortest file. Instead, you can read each line separately into a hash with the key being the line number. You can do a single pass comparison for the same key in each hash.

Example:
# read in file one and place into a hash my $line_number = 1; while ( <$infile1> ) { chomp; $hash1{$line_number} = $_; $line_number++; } # read in file 2 and place in a hash $line_number = 1; while (<$infile2>) { chomp; $hash2{$line_number) = $_; $line_number++; } # determine the file with the largest number of keys my @keys1 = keys %hash1; my @keys2 = keys %hash2; my $NoK1 = @keys1; my $NoK2 @keys2; if ($NoK1 > $NoK2) { for $key (sort {$a <=> $b} keys %hash1) { if (!defined $hash2{$key}) { next; } elsif ($hash1{$key} eq $hash2{$key}) { #do nothing? next; } else { print "Line number $key in both files is different"; } } # the rest of the code you can figure out from here ...
Hope that helps!

Bioinformatics

Replies are listed 'Best First'.
Re^2: Reading concurrently two files with different number of lines
by choroba (Cardinal) on Apr 10, 2013 at 16:32 UTC
    Why a hash? If there are no "gaps", you can use an array.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^2: Reading concurrently two files with different number of lines
by frogsausage (Sexton) on Apr 11, 2013 at 08:45 UTC
    Thanks for the help! I thought about reading the two files separately but the only thing is I can't discard all line-to-line matching files.

    However I will definitely put everything in a hash as it is what I need later on for further comparisons.

    The only thing I am worried about is memory consumption and slowdown. How much is 100 000 lines in a hash structure?

    Thanks for the code example. As for the rest of the code, it already exists, just need to get my datas in a better way to be crunched!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1028006]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-03-19 10:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found