Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: partial match between 2 files

by lakssreedhar (Acolyte)
on Dec 05, 2012 at 06:26 UTC ( #1007213=note: print w/ replies, xml ) Need Help??


in reply to Re: partial match between 2 files
in thread partial match between 2 files

the output should be first_file second_file output ~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~ amayaM -> amayamAn -> amaya+mAn vismayaM -> vismayamAn -> vismaya+mAn souraM -> souramA -> soura+mA kamalZ -> kamalAn -> kamal+An .The code i wrote wont make any sense.

#!/usr/bin/perl #read dictionary open(RE,"file1"); while(<RE>) { chomp; my @tmp =split(/\,/,$_); $key="$tmp[0]"; #print "$key\n "; my @words=split(//,$key); } close(RE); my $length1 = $#words; #check for a partial match open(RE1,"file2"); while(<RE1>) { $inp_word4 = $_; my @inp_word1 =split(//,$inp_word4); #print "@inp_word1"; } close(RE1) my $length2=$#inp_word1; if($length1<$length2) { compare the array elements in another loop }


Comment on Re^2: partial match between 2 files
Download Code
Re^3: partial match between 2 files
by Athanasius (Prior) on Dec 05, 2012 at 09:26 UTC

    The following code shows one way to tackle this problem:

    #! perl use Modern::Perl; my $file1 = 'amayaM vismayaM souraM kamalZ'; my $file2 = 'amayamAn vismayamAn souramA kamalAni'; my %words1 = map { $_ => undef } split /\s+/, $file1; my @words2 = split /\s+/, $file2; for my $word2 (@words2) { for my $word1 (keys %words1) { my $stem = substr($word1, 0, -1); my $len = length $stem; if (substr($word2, 0, $len) eq $stem) { say $word1, ' -> ', $word2, ' -> ', $stem, '+', substr($wo +rd2, $len); last; } } }

    Output:

    19:21 >perl 415_SoPW.pl amayaM -> amayamAn -> amaya+mAn vismayaM -> vismayamAn -> vismaya+mAn souraM -> souramA -> soura+mA kamalZ -> kamalAni -> kamal+Ani 19:22 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re^3: partial match between 2 files
by Kenosis (Priest) on Dec 05, 2012 at 09:26 UTC

    The exclusive-or operator (^) between strings returns \x00 for each matching pair of characters, and a different value for non-matching characters. Thus, 'Perl' ^ 'Perl' would return '\x00\x00\x00\x00'. Matching a returned string for [^\x00] will show where the strings differ. In your case, only the first difference is requested. Given this, consider the following that uses your data:

    use warnings; use strict; open my $fh1, '<', 'first_file.txt' or die $!; open my $fh2, '<', 'second_file.txt' or die $!; while ( my $s1 = <$fh1> ) { chomp $s1; chomp( my $s2 = <$fh2> ); ( $s1 ^ $s2 ) =~ /[^\x00]/; substr( $s2, $-[0], 0 ) = '+' if defined $-[0]; print $s2, "\n"; } close $fh2; close $fh1;

    Output:

    amaya+mAn vismaya+mAn soura+mA kamal+An

    The variable $-[0] contains the position of the last match, which is passed to substr to insert a + at the location of the first difference between the two strings.

      the code is working fine for those 2 files but if i am adding more words to file 1 which does not match to any of the file2 words then an error is occuring like Use of uninitialized value $s2 in chomp at triedsplit.pl line 9, <$fh2> line 10. Use of uninitialized value $s2 in bitwise xor (^) at triedsplit.pl line 11, <$fh2> line 10. Use of uninitialized value $s2 in print at triedsplit.pl line 13, <$fh2> line 10

        The script assumes the same number of words in each file, since you showed the same number words in the two different word sets. Your next step is to make a few changes to adapt this script to handle files with different numbers of words.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1007213]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2014-07-13 01:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (244 votes), past polls