vamsikrishna has asked for the wisdom of the Perl Monks concerning the following question:

Re: comparing two huges files
by GrandFather (Saint) on Jan 28, 2008 at 03:55 UTC

    This sounds very like how to find differences between two huge files. Maybe you could read that thread and find what you need, or perhaps you should talk to your workmate/classmate and see how he solved it?

    Update: hmm, on second thoughts it's not the same - it was hard to tell because of your rubbish formatting, sorry.

    Build a hash from the first (smaller) file, then use it in a single pass through the second (larger) file to figure out where stuff goes. Consider:

    use strict; use warnings; #Hello, # #Currently I'm facing problem with comparing two huges files on a part +icular key #column. One file consists of 10k records and other one 18million reco +rds. Both #files are | (pipe) delimited. I am comparing based on the first colum +n in the #two files and redirecting them to two separate files. #If Key columns are same it has to pick the record from 10k records fi +le and send #it to one file. #If the Key columns are not matching ie., the key column is present in + 18 million #records file but not in 10k records file, it has to go into another f +ile. # #Here I'm pasting the query what I have written, taking more time. my $oldFile1 = <<DAT; 1|oldFile|another field 5|oldFile DAT my $newFile1 = <<DAT; 1|newFile1|z 2|newFile1|x 3|newFile1|y 4|newFile1|p DAT my $oldFile2; my $changes1; open OLDFILE1, '<', \$oldFile1; # Build the reference hash from the 'small' file my %oldKeys; while (<OLDFILE1>) { chomp; my ($key, $tail) = split /\|/, $_, 2; if (exists $oldKeys{$key}) { warn "Key $key duplicated. Duplicate ignored!\n"; next; } $oldKeys{$key} = $tail; } close OLDFILE1; # Process the new file open NEWFILE1, '<', \$newFile1; open OLDFILE2, '>', \$oldFile2; open CHANGES1, '>', \$changes1; while (<NEWFILE1>) { chomp; my ($key, $tail) = split /\|/, $_, 2; if (exists $oldKeys{$key}) { print OLDFILE2 "$key|$oldKeys{$key}\n"; } else { print CHANGES1 "$key|$tail\n"; } } close (NEWFILE1); close (OLDFILE2); close (CHANGES1); print "OLDFILE2:\n$oldFile2\n\n"; print "CHANGES1:\n$changes1\n\n";


    OLDFILE2: 1|oldFile|another field CHANGES1: 2|newFile1|x 3|newFile1|y 4|newFile1|p

Re: comparing two huges files
by grep (Monsignor) on Jan 28, 2008 at 04:08 UTC
    It's a little difficult to understand the specific details (that and it's hard to read w/o <code> tags), but I think I can figure out enough to help you.

    Create a hash. Read the 10K file first. Use the 1st col as the key and the rest of the record as the value.

    Then when you loop over the second file if the first col exists in the original hash, write whatever record you want to the file.

    Here is some code - It may not do exactly what you want, but that is because I'm guessing your spec.

    ## UNTESTED use strict; my %hash; open(FH,'<','oldfile') or die "$!\n"; foreach (<FH>) { chomp; my ($key,$data) = split(/\|/,$_,2); $hash{$key} = $data; } close FH; open(NEW,'<','newfile') or die "$_\n"; open(OUT,'>','outfile') or die "$_\n"; foreach (<NEW>) { chomp; my ($key) = split(/\|/,$_,2); if ( exists $hash{$key} ) { print OUT "$hash{$key}\n"; } }
