Better solution to the code

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

Can I get some better approach to modify the below script, so that execution will be faster.
Currently I am iterating array of large number of records to compare the macthing string occur in a file (30000 lines).
and matching record store in a text file called Result_file.txt


# The array @tag contains around 40000 records
# Input_file.dat contains 30000 lines

open(FH1,"+>Result_file.txt") or die "Cannot create file $!\n";
    foreach my $fkey(@tag) {
        open(FH,"<Input_file.dat") or die "Cannot read $!\n";
                    while(<FH>) {
                         if($_ =~ m/$fkey/g) {
                                    print FH1 "$_\n";
                 
                            }
                    }
                close(FH);
        }
close(FH1);
[download]

Can anyone please help me to improve the performance of the above mentioned code
Thanks

Comment on Better solution to the code Download Code

Replies are listed 'Best First'.
Re: Better solution to the code by moritz (Cardinal) on Jan 25, 2008 at 10:20 UTC
If you don't need to preserve the output order in `Result_file.txt` you can reduce the runtime to a single pass over `Input_file.dat`: `# if @tag contains simple words my $re = join '\|', @tag; # if they can be more complicated: # my $re = join '\|', map { "(?:$_)" } @tag; open my $out, '+>', "Result_file.txt" or die "Can't open file Result_file.txt for writing: $!"; open my $in, '<', 'Input_file.dat' or die "Can't read Input_file.dat: $!"; while(<$in>){ print $out $_ if m/$re/o; } close $in; close $out;` [download] If you use perl 5.10.0, the match against many (constant) alternatives is blazingly fast due to the trie optimizations, demerphq++	[reply] [d/l] [select]
Re: Better solution to the code by Punitha (Priest) on Jan 25, 2008 at 09:53 UTC
Hi, try this `open(FH1,"+>Result_file.txt") or die "Cannot create file $!\n"; open(FH,"<Input_file.dat") or die "Cannot read $!\n"; while(<FH>) { my $data=$_; chomp($data); print FH1 "DATA:$data\n" if(grep/$data/,@tag); } close(FH);` [download] Punitha.	[reply] [d/l]
Re^2: Better solution to the code by hipowls (Curate) on Jan 25, 2008 at 10:28 UTC
Precompiling the regexes should provide a speedup and using List::MoreUtils::any() may do if the chance of a match is good since the test will shortcut on success. Naturally you will benchmark;-) `use List::MoreUtils qw(any); open my $OUT, '>', 'Result_file.txt' or die "Cannot create file: $!\n" +; open my $IN, '<', 'Input_file.dat' or die "Cannot read file: $!\n"; # precompile the regexes. @tag_rx = map {qr/$_/} @tag; while ( my $data = <$IN> ) { print $OUT $data if any { $data =~ /$_/ } @tag_rx; } close $IN; close $OUT;` [download]	[reply] [d/l]
Re: Better solution to the code by Lu. (Hermit) on Jan 25, 2008 at 10:30 UTC
Hi, The poor performance comes from the fact that you are opening and parsing the same (big) file many times. You would be better off reversing your strategy and opening the file, parsing it and comparing each line with the contents of your array @tag. You should also, if possible, consider loading your data into a hash instead of an array. If you do that, you will profit from exists. `# your data is in %tag open (IN, "<Input_file.dat") or die "Cannot read $!\n"; open (OUT,"+>Result_file.txt") or die "Cannot create file $!\n"; while (<IN>) { print OUT $_ if exists $tag{$_}; }` [download] Lu.	[reply] [d/l]
Re^2: Better solution to the code by moritz (Cardinal) on Jan 25, 2008 at 10:35 UTC
The idea with the hash won't work, because the regex match searches for a matching substring, the hash lookup compares the whole string. But that reminds me of another possible optimization: if `@tag` doesn't contain regexes but only constant substrings, index might speed up things. So instead of `if ($_ =~ m/$something/){ ... }`, you can write `if (0 <= index $_, $something)`.	[reply] [d/l] [select]
Re^2: Better solution to the code by cdarke (Prior) on Jan 25, 2008 at 12:56 UTC
BTW, to put @tags into %tags use: `my %tags; @tags{@tags} = undef;` [download] Yes, it's confusing calling a hash and and an array the same thing.	[reply] [d/l]


laziness, impatience, and hubris
	PerlMonks