Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: searching in large file

by Laurent_R (Canon)
on Jan 06, 2018 at 10:33 UTC ( [id://1206808]=note: print w/replies, xml ) Need Help??


in reply to searching in large file

Hi sabas,

assuming you want to find the lines of file1 that are in file2 (that's what you say at the beginning of your post), perhaps something like this:

my %hash; open my $IN2, "<", $file2 or die "could not open $file2 $!"; while (my $line = <$IN2>) { chomp $line; next if $line =~ /^\s*$/; # skip empty line $hash{$line} = 1; } close $IN2; open my $IN1, "<", $file1 or die "could not open $file1 $!"; open my $OUT, ">", $file3 or die "could not open $file3 $!"; while (my $line = <IN1>) { chomp $line; next if $line =~ /^\s*$/; print $OUT "$line\n" if exists $hash{$line}; } close $IN1; close $OUT;
If you want to find the lines of file2 that are in file1 (as you seem to imply later in your post), then just swap file1 and file2.

In both cases, it will be pretty fast because a hash lookup is fast (much faster than scanning an entire array each time through the input loop). You could probably do it without the two chomps, but I feel it's a bit safer to have them.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1206808]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-19 21:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found