Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: partial match between 2 files

by Kenosis (Priest)
on Dec 05, 2012 at 05:53 UTC ( [id://1007210]=note: print w/replies, xml ) Need Help??


in reply to partial match between 2 files

You said that you were interested in "...a partial character by character match between 2 files until a non matching character occurs..." By the desired output, it looks like you want a character-by-character match between two words. Here, I believe, is what you've provided:

first_file second_file output ~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~ amayaM -> amayamAn -> amaya+mAn souraM -> vismayamAn -> soura+mA kamalZ -> souramA -> -> kamalAn ->

The output from the first pair of words makes sense, but I don't see a pattern between the words and the output after that. Please reformat your data using <code> tags and include enough to show the pattern. Also, please show the code that you have tried.

Replies are listed 'Best First'.
Re^2: partial match between 2 files
by lakssreedhar (Acolyte) on Dec 05, 2012 at 06:26 UTC

    the output should be first_file second_file output ~~~~~~~~~~ ~~~~~~~~~~~ ~~~~~~ amayaM -> amayamAn -> amaya+mAn vismayaM -> vismayamAn -> vismaya+mAn souraM -> souramA -> soura+mA kamalZ -> kamalAn -> kamal+An .The code i wrote wont make any sense.

    #!/usr/bin/perl #read dictionary open(RE,"file1"); while(<RE>) { chomp; my @tmp =split(/\,/,$_); $key="$tmp[0]"; #print "$key\n "; my @words=split(//,$key); } close(RE); my $length1 = $#words; #check for a partial match open(RE1,"file2"); while(<RE1>) { $inp_word4 = $_; my @inp_word1 =split(//,$inp_word4); #print "@inp_word1"; } close(RE1) my $length2=$#inp_word1; if($length1<$length2) { compare the array elements in another loop }

      The exclusive-or operator (^) between strings returns \x00 for each matching pair of characters, and a different value for non-matching characters. Thus, 'Perl' ^ 'Perl' would return '\x00\x00\x00\x00'. Matching a returned string for [^\x00] will show where the strings differ. In your case, only the first difference is requested. Given this, consider the following that uses your data:

      use warnings; use strict; open my $fh1, '<', 'first_file.txt' or die $!; open my $fh2, '<', 'second_file.txt' or die $!; while ( my $s1 = <$fh1> ) { chomp $s1; chomp( my $s2 = <$fh2> ); ( $s1 ^ $s2 ) =~ /[^\x00]/; substr( $s2, $-[0], 0 ) = '+' if defined $-[0]; print $s2, "\n"; } close $fh2; close $fh1;

      Output:

      amaya+mAn vismaya+mAn soura+mA kamal+An

      The variable $-[0] contains the position of the last match, which is passed to substr to insert a + at the location of the first difference between the two strings.

        the code is working fine for those 2 files but if i am adding more words to file 1 which does not match to any of the file2 words then an error is occuring like Use of uninitialized value $s2 in chomp at triedsplit.pl line 9, <$fh2> line 10. Use of uninitialized value $s2 in bitwise xor (^) at triedsplit.pl line 11, <$fh2> line 10. Use of uninitialized value $s2 in print at triedsplit.pl line 13, <$fh2> line 10

      The following code shows one way to tackle this problem:

      #! perl use Modern::Perl; my $file1 = 'amayaM vismayaM souraM kamalZ'; my $file2 = 'amayamAn vismayamAn souramA kamalAni'; my %words1 = map { $_ => undef } split /\s+/, $file1; my @words2 = split /\s+/, $file2; for my $word2 (@words2) { for my $word1 (keys %words1) { my $stem = substr($word1, 0, -1); my $len = length $stem; if (substr($word2, 0, $len) eq $stem) { say $word1, ' -> ', $word2, ' -> ', $stem, '+', substr($wo +rd2, $len); last; } } }

      Output:

      19:21 >perl 415_SoPW.pl amayaM -> amayamAn -> amaya+mAn vismayaM -> vismayamAn -> vismaya+mAn souraM -> souramA -> soura+mA kamalZ -> kamalAni -> kamal+Ani 19:22 >

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1007210]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2024-04-16 19:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found