Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Diff between two arrays

by fletcher_the_dog (Friar)
on Jul 01, 2003 at 15:22 UTC ( #270509=perlquestion: print w/ replies, xml ) Need Help??
fletcher_the_dog has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am trying to come up with a way to compare two arrays. I don't want to just find the intersection and such, I want to end up with map of where each element in the second array was derived from in the first array. Something like this:
my @array1=qw(I like to eat tacos); my @array2=qw(Do you like to eat burritoes); my @diff=get_diff(\@array1,\@array2); @diff=( ['Do',0,0], ['you',1,0], ['like',2,1], ['to',3,2], ['eat',4,3], ['burritoes',5,4] );
Note that if one element appears to have replaced another such as "burritoes" replaces "tacos", those elements are mapped together. I was wondering if there was some module out there that did something like this. I have been playing around with different algorthims, but I wanted to be sure I am not just reinventing the wheel.

Comment on Diff between two arrays
Download Code
Replies are listed 'Best First'.
Re: Diff between two arrays
by tilly (Archbishop) on Jul 01, 2003 at 16:13 UTC
    I would suggest starting with Algorithm::Diff and see if you can get its output into the form that you want.
      I looked at this module once before and didn't think it would work, but your response made me look again, and on further inspection I think I can get it do exactly what I want. Thanks!
Re: Diff between two arrays
by pboin (Deacon) on Jul 01, 2003 at 16:42 UTC
    I believe the following is a working solution.

    However, the problem of equating burritos to tacos is a different question IMO. I have not attempted that at all, but you'd need a 'dictionary' of equivalents I think.

    This is my first attempt to reply with a working solution, so thank you for your patience. (Comments requested and appreciated.

    my @array1=qw(I like to eat tacos); my @array2=qw(Do you like to eat burritoes); my @diff = &get_diff(\@array1,\@array2); sub get_diff(){ my $slave = shift(@_); # reference to @array1 my $master = shift(@_); # reference to @array2 # loop through @master for (my $m = 0; $m < scalar(@$master); $m++){ my $matchpos = 0; # loop through @slave for (my $s = 0; $s < scalar(@$slave); $s++){ # is this word found? if (@$master[$m] eq @$slave[$s]){ $matchpos = $s; last; } } @diff = (@$master[$m],$m,$matchpos); print "@$master[$m],$m,$matchpos\n"; } }
Re: Diff between two arrays
by apsyrtes (Beadle) on Jul 01, 2003 at 16:06 UTC
    How are you to determine that "burritoes" (sic) is a replacement for "tacos" ?? What are the rules that determine that, without also determining that "Do" is a replacement for "I"?

    I'm sure you have an END RESULT in mind (ie. why are you trying to do this?), and you are currently trying to solve a problem that is getting in the way of your solution... if you can say what you are trying to achieve, likely you'll find there somebody has a better SOLUTION so that you won't have to solve this problem you are dealing with.

    Jason W.
      This is the end result I have in mind. What I need is a map of the connection between the two arrays. The arrays represent a certain set of tokens in two different files. What I want is very similiar to doing a diff between two files, except I don't want it on a character by character basis or a line by line bases, I want it on a token by token bases where a token matches m/(\w+|\S)/. What I need the map for is not important (well, it is to me, but I already know that I need it for what I am doing). In response to your other question, "How are you to determine that "burritoes" (sic) is a replacement for "tacos" ?", I would determine that the same way that I would determine "a" is a replacement for "e" in the between "tall" and "tell". The algorithm would be something like "IF everything in sequence A is the same as everything in sequence B except for the token at index N, THEN token B[N] replaced A[N].
Re: Diff between two arrays
by dree (Monsignor) on Jul 01, 2003 at 19:20 UTC
    A way to compare strings (and in this case you can join the arrays into strings) is to use Text::PhraseDistance that <<provides a way to compare two phrases and to give a measure of their proximity>>
Re: Diff between two arrays
by Skeeve (Vicar) on Jul 02, 2003 at 06:29 UTC
    I think you should think about what do you really want to find?

    1. the first match and a mapping from that on?
    2. the longest math and a mapping from that on?
    3. the most matches and a mapping between those?
    4. the best mapping (define "best")
    To give examples for each of my points, I use these arrays
    a b c d e f g h i j k l m
    z b y d e x h i j k w v m
    how would that map?
    1. this would find the first match at "b" and result in
      a|b|c|d|e|f|g|h|i|j|k|l|m
      z|b|y|d|e|x|h|i|j|k|w|v|m
    2. this would find the longest match at "h i j k" and result in something like
      a b|c|d|e|f|g|h|i|j|k|l|m
      z  |b|y|d|e|x|h|i|j|k|w|v m
    3. this would find
      a|b|c|d|e|f g|h|i|j|k|l  |m
      z|b|y|d|e|x  |h|i|j|k|w v|m
    4. I have no idea about that From what I recall from my studies, at least the "most-matches" sounds to me like a problem that has a complexity O(n)=n**2.

      You will have to compare each possible mapping, score it and compare the scores to decide which fits your needs best.

      Non-trivial, i think

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://270509]
Approved by broquaint
Front-paged by arthas
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2015-07-08 02:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls