partial match between 2 files

lakssreedhar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: partial match between 2 files
by Kenosis (Priest) on Dec 05, 2012 at 05:53 UTC

You said that you were interested in "...a partial character by character match between 2 files until a non matching character occurs..." By the desired output, it looks like you want a character-by-character match between two words. Here, I believe, is what you've provided:

first_file      second_file       output
~~~~~~~~~~      ~~~~~~~~~~~       ~~~~~~
amayaM      ->  amayamAn     ->   amaya+mAn
souraM      ->  vismayamAn   ->   soura+mA
kamalZ      ->  souramA      ->    
            ->  kamalAn      ->
[download]

The output from the first pair of words makes sense, but I don't see a pattern between the words and the output after that. Please reformat your data using <code> tags and include enough to show the pattern. Also, please show the code that you have tried.

[reply]
[d/l]

Re^2: partial match between 2 files

by lakssreedhar (Acolyte) on Dec 05, 2012 at 06:26 UTC

the output should be first_file second_file output ~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~ amayaM -> amayamAn -> amaya+mAn vismayaM -> vismayamAn -> vismaya+mAn souraM -> souramA -> soura+mA kamalZ -> kamalAn -> kamal+An .The code i wrote wont make any sense.

#!/usr/bin/perl


#read dictionary
open(RE,"file1");
while(<RE>)
{
    chomp;
    my @tmp =split(/\,/,$_);
    $key="$tmp[0]";
    #print "$key\n ";
    my @words=split(//,$key);
            
}
close(RE);
my $length1 = $#words;
#check for a partial match
open(RE1,"file2");
while(<RE1>)
{
    $inp_word4 = $_;
    my @inp_word1 =split(//,$inp_word4);
    #print "@inp_word1";
}
close(RE1)
    my $length2=$#inp_word1;
    if($length1<$length2)
    {
        compare the array elements in another loop
    }
[download]

[reply]
[d/l]

Re^3: partial match between 2 files

by Kenosis (Priest) on Dec 05, 2012 at 09:26 UTC

The exclusive-or operator (^) between strings returns \x00 for each matching pair of characters, and a different value for non-matching characters. Thus, 'Perl' ^ 'Perl' would return '\x00\x00\x00\x00'. Matching a returned string for [^\x00] will show where the strings differ. In your case, only the first difference is requested. Given this, consider the following that uses your data:

use warnings;
use strict;

open my $fh1, '<', 'first_file.txt'  or die $!;
open my $fh2, '<', 'second_file.txt' or die $!;

while ( my $s1 = <$fh1> ) {
    chomp $s1;
    chomp( my $s2 = <$fh2> );

    ( $s1 ^ $s2 ) =~ /[^\x00]/;
    substr( $s2, $-[0], 0 ) = '+' if defined $-[0];
    print $s2, "\n";

}

close $fh2;
close $fh1;
[download]

Output:

amaya+mAn
vismaya+mAn
soura+mA
kamal+An
[download]

The variable $-[0] contains the position of the last match, which is passed to substr to insert a + at the location of the first difference between the two strings.

[reply]
[d/l]
[select]

Re^4: partial match between 2 files

by lakssreedhar (Acolyte) on Dec 05, 2012 at 10:53 UTC

Re^5: partial match between 2 files

by Kenosis (Priest) on Dec 05, 2012 at 19:56 UTC

Some notes below your chosen depth have not been shown here

Re^3: partial match between 2 files

by Athanasius (Archbishop) on Dec 05, 2012 at 09:26 UTC

The following code shows one way to tackle this problem:

#! perl
use Modern::Perl;

my $file1  = 'amayaM   vismayaM   souraM  kamalZ';
my $file2  = 'amayamAn vismayamAn souramA kamalAni';
my %words1 =  map { $_ => undef } split /\s+/, $file1;
my @words2 =  split /\s+/, $file2;

for my $word2 (@words2)
{
    for my $word1 (keys %words1)
    {
        my $stem = substr($word1, 0, -1);
        my $len  = length $stem;

        if (substr($word2, 0, $len) eq $stem)
        {
            say $word1, ' -> ', $word2, ' -> ', $stem, '+', substr($wo
+rd2, $len);
            last;
        }
    }
}
[download]

Output:

19:21 >perl 415_SoPW.pl
amayaM -> amayamAn -> amaya+mAn
vismayaM -> vismayamAn -> vismaya+mAn
souraM -> souramA -> soura+mA
kamalZ -> kamalAni -> kamal+Ani

19:22 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]


Perl: the Markov chain saw
	PerlMonks