Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re^3: partial match between 2 files

by Kenosis (Priest)
on Dec 05, 2012 at 09:26 UTC ( #1007244=note: print w/ replies, xml ) Need Help??


in reply to Re^2: partial match between 2 files
in thread partial match between 2 files

The exclusive-or operator (^) between strings returns \x00 for each matching pair of characters, and a different value for non-matching characters. Thus, 'Perl' ^ 'Perl' would return '\x00\x00\x00\x00'. Matching a returned string for [^\x00] will show where the strings differ. In your case, only the first difference is requested. Given this, consider the following that uses your data:

use warnings; use strict; open my $fh1, '<', 'first_file.txt' or die $!; open my $fh2, '<', 'second_file.txt' or die $!; while ( my $s1 = <$fh1> ) { chomp $s1; chomp( my $s2 = <$fh2> ); ( $s1 ^ $s2 ) =~ /[^\x00]/; substr( $s2, $-[0], 0 ) = '+' if defined $-[0]; print $s2, "\n"; } close $fh2; close $fh1;

Output:

amaya+mAn vismaya+mAn soura+mA kamal+An

The variable $-[0] contains the position of the last match, which is passed to substr to insert a + at the location of the first difference between the two strings.


Comment on Re^3: partial match between 2 files
Select or Download Code
Re^4: partial match between 2 files
by lakssreedhar (Acolyte) on Dec 05, 2012 at 10:53 UTC

    the code is working fine for those 2 files but if i am adding more words to file 1 which does not match to any of the file2 words then an error is occuring like Use of uninitialized value $s2 in chomp at triedsplit.pl line 9, <$fh2> line 10. Use of uninitialized value $s2 in bitwise xor (^) at triedsplit.pl line 11, <$fh2> line 10. Use of uninitialized value $s2 in print at triedsplit.pl line 13, <$fh2> line 10

      The script assumes the same number of words in each file, since you showed the same number words in the two different word sets. Your next step is to make a few changes to adapt this script to handle files with different numbers of words.

        I am comparing two files using hash and then checking each character by character.But even now when the position of the words are altered or some words are deleted from either files the error uninitialised value is coming

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1007244]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-09-18 02:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (105 votes), past polls