Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

File handles in regular expressions

by vikasdawar (Initiate)
on Oct 18, 2012 at 17:34 UTC ( #999785=perlquestion: print w/ replies, xml ) Need Help??
vikasdawar has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

I am new to perl, and i am having trouble reading contents from file and comparing i have 2 files, i have to match the lines from two files and if they match then write that line into third file , heres how i do it

open(FILE1,"<", "file1.txt"); open(FILE2,"<", "file2.txt"); open(FILE3,">", "file3.txt"); my @var1=<FILE1>; my @var2=<FILE2>; foreach my $comp1 (@var1) { foreach my $comp2 (@var2) { if ($comp1 =~/$comp2/){ print FILE3 $comp1; } } }
and i know that lines in file 1 matches line 2, if i set up a scalar value for $comp2, then it matches, but from file handles it is not able to match please help me Thanks Vikas

Comment on File handles in regular expressions
Download Code
Re: File handles in regular expressions
by aitap (Deacon) on Oct 18, 2012 at 18:08 UTC
    Did you forget to chomp $comp2? Lines read from file handles can contain newline characters (usually \n).
    Sorry if my advice was wrong.
Re: File handles in regular expressions
by 2teez (Priest) on Oct 18, 2012 at 18:53 UTC

    Hi Vikasdawar,
    Please for you open function check if it works or display an error message or you use autodie qw(open close)
    Please, try the following below:

    #!/usr/bin/perl use warnings; use strict; use Tie::File; tie my @array_file, 'Tie::File', "file1.txt" or die "can't tie file: $ +!"; my $matched_lines = ''; open my $fh, '>', "file3.txt" or die "can't open file: $!"; open my $fh2, '<', "file2.txt" or die "can't open file: $!"; while ( defined( my $line = <$fh2> ) ) { chomp $line; foreach my $match (@array_file) { if ( $match eq $line and $match ne "") { $matched_lines .= $match.$/; next; } } } print {$fh} $matched_lines; close $fh2 or die "can't close file:$!"; close $fh or die "can't close file:$!"; untie @array_file;
    NOTE: The code above thus work,(or should work) but might not be so effiecient if it has a VERY VERY LARGE files to compare.

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me

      Hi. Did you choose to use Tie::File instead of just reading file1.txt into an array so that very large files could be handled? If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace $matched_lines .= $match.$/; with print $fh $match.$/; Just print incrementally.

        Hi Lotus1,
        If so there is a problem with concatenating all the output into a scalar. $matched_lines could end up holding the whole huge file. One possible solution is to replace $matched_lines .= $match.$/; with print $fh $match.$/; Just print incrementally.

        Not so, am afraid your suggestion will further affect the performance of the script, because the print function would be call as many times as the strings matches, meanwhile with the scalar used no call is placed.
        Using a Profiler (NYTProf) made that very clear.
        Try it.

        If you tell me, I'll forget.
        If you show me, I'll remember.
        if you involve me, I'll understand.
        --- Author unknown to me
Re: File handles in regular expressions
by tobyink (Abbot) on Oct 18, 2012 at 19:06 UTC

    What do you mean by "if they match"? Do you mean, "if they are identical strings"? If so, string comparison (using the eq operator) is almost certainly a better idea than using regexes.

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: File handles in regular expressions
by Kenosis (Priest) on Oct 19, 2012 at 07:04 UTC

    Hi, Vikas, and welcome to PerlMonks!

    You've enclosed one loop within another, so you're attempting to compare the first $comp1 value against all elements of @var2, and so on. And tobyink's point about "if they match" is well made, so I suspect you want $comp1 eq $comp2.

    Given this, consider the following:

    use strict; use warnings; my %matchingLines; open my $fh1, '<', 'File1.txt' or die $!; chomp( my @file1Lines = <$fh1> ); close $fh1; open my $fh2, '<', 'File2.txt' or die $!; chomp( my @file2Lines = <$fh2> ); close $fh2; for my $file1Line (@file1Lines) { $matchingLines{"$file1Line\n"}++ if $file1Line ~~ @file2Lines; } open my $fh3, '>', 'FileA.txt' or die $!; print $fh3 $_ for keys %matchingLines; close $fh3;

    If a line in @file1Lines is found in @file2Lines via the smart match operator (works as equality), it's added to the hash %matchingLines for later printing to a file (the hash is used to avoid the possibility of writing multiple instances of the same line to the file).

    Hope this helps!

    Update: Lotus1 correctly brought to my attention that I misunderstood the OP. Have revised the script.

      Your solution only matches if the line at the same line number in both files is the same. The OP was attempting to match each line in file1 against each line in file2. For the files listed below only the first line is matched even though there are four lines that match.

      File1.txt: item1 item2 item3 item4 File2.txt: item1 abc item2 item3 item4 File3.txt: item1

        Wow! I certainly misunderstood the OP. Will strike/revise this. Thank you for pointing this out.

Reaped: Re: File handles in regular expressions
by NodeReaper (Curate) on Oct 19, 2012 at 07:56 UTC
Re: File handles in regular expressions
by Laurent_R (Parson) on Oct 21, 2012 at 13:02 UTC

    First, three comments:

    1. check the status of the open instructions;

    2. chomp the lines you are reading to remove newline characters

    3. use the eq operator instead of a regex, unless you have good reason to use regexes.

    If the files are not too large (or, rather, if at least one of the files is not too large), read one of the files and store it in memory as a hash (using the full chomped line as the key). Once this is done, go through the other file and check if the line exists in the hash. If it exists, juste print it to your output file. This will be much faster than your nested foreach loops.

Re: File handles in regular expressions
by sundialsvc4 (Abbot) on Oct 22, 2012 at 13:36 UTC

    Of course, on a Unix/Linux system you can do this with the diff command, with appropriate options (that might be system specific).

    I say this because this is an extremely common requirement and yet it is also very common to build one-off custom programs to satisfy such requirements.   I say that without specific reference to this particular case or person.   “I need to write a program to do this” is a conclusion that is quickly and easily jumped-to, especially when the prospect of doing so seems daunting.   TMTOWTDI™, and sometimes TOWTDI isn’t Perl or a custom program at all.

      Yes, diff can be useful, and on Windows, you can use Winmerge, a public domain utility to compare files (there are most probably others). But these utilities require the files to be sorted in the same order, which might not be the case. And if you have to start sorting each file before comparing them, then a simple Perl one-liner might do the job faster.

      At my work, we are using daily all kinds of combinations of Unix "power tools", including pipes and redirections to connect diff, sort, wc, cat, grep, find, cut, sed, awk, etc. commands, but Perl offers very often a better, simpler and faster way to do things.

      And when I have to work on VMS or on Windows, where you don't have sed, cut or awk, Perl shows its superiority even more blatantly.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://999785]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2014-11-28 18:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (199 votes), past polls