Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: File handles in regular expressions

by 2teez (Vicar)
on Oct 18, 2012 at 18:53 UTC ( [id://999795]=note: print w/replies, xml ) Need Help??


in reply to File handles in regular expressions

Hi Vikasdawar,
Please for you open function check if it works or display an error message or you use autodie qw(open close)
Please, try the following below:

#!/usr/bin/perl use warnings; use strict; use Tie::File; tie my @array_file, 'Tie::File', "file1.txt" or die "can't tie file: $ +!"; my $matched_lines = ''; open my $fh, '>', "file3.txt" or die "can't open file: $!"; open my $fh2, '<', "file2.txt" or die "can't open file: $!"; while ( defined( my $line = <$fh2> ) ) { chomp $line; foreach my $match (@array_file) { if ( $match eq $line and $match ne "") { $matched_lines .= $match.$/; next; } } } print {$fh} $matched_lines; close $fh2 or die "can't close file:$!"; close $fh or die "can't close file:$!"; untie @array_file;
NOTE: The code above thus work,(or should work) but might not be so effiecient if it has a VERY VERY LARGE files to compare.

If you tell me, I'll forget.
If you show me, I'll remember.
if you involve me, I'll understand.
--- Author unknown to me

Replies are listed 'Best First'.
Re^2: File handles in regular expressions
by Lotus1 (Vicar) on Oct 18, 2012 at 19:19 UTC

    Hi. Did you choose to use Tie::File instead of just reading file1.txt into an array so that very large files could be handled? If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace $matched_lines .= $match.$/; with print $fh $match.$/; Just print incrementally.

      Hi Lotus1,
      If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace $matched_lines .= $match.$/; with print $fh $match.$/; Just print incrementally.

      Not so, am afraid your suggestion will further affect the performance of the script, because the print function would be call as many times as the strings matches, meanwhile with the scalar used no call is placed.
      Using a Profiler (NYTProf) made that very clear.
      Try it.

      If you tell me, I'll forget.
      If you show me, I'll remember.
      if you involve me, I'll understand.
      --- Author unknown to me

        If you run out of memory the performance won't be so good. I was questioning the logic of using a tied file while holding all the output in memory. The OP didn't state the file sizes or performance requirements. But since you seem to be focused on performance wouldn't it perform better to read the file into an array? For larger files the tied file will only keep part of the file in memory so it will end up rereading the file many times.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://999795]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2024-04-25 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found