Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Diff between two arrays

by fletcher_the_dog (Friar)
on Jul 01, 2003 at 15:22 UTC ( #270509=perlquestion: print w/replies, xml ) Need Help??
fletcher_the_dog has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, I am trying to come up with a way to compare two arrays. I don't want to just find the intersection and such, I want to end up with map of where each element in the second array was derived from in the first array. Something like this:
my @array1=qw(I like to eat tacos); my @array2=qw(Do you like to eat burritoes); my @diff=get_diff(\@array1,\@array2); @diff=( ['Do',0,0], ['you',1,0], ['like',2,1], ['to',3,2], ['eat',4,3], ['burritoes',5,4] );
Note that if one element appears to have replaced another such as "burritoes" replaces "tacos", those elements are mapped together. I was wondering if there was some module out there that did something like this. I have been playing around with different algorthims, but I wanted to be sure I am not just reinventing the wheel.

Replies are listed 'Best First'.
Re: Diff between two arrays
by tilly (Archbishop) on Jul 01, 2003 at 16:13 UTC
    I would suggest starting with Algorithm::Diff and see if you can get its output into the form that you want.
      I looked at this module once before and didn't think it would work, but your response made me look again, and on further inspection I think I can get it do exactly what I want. Thanks!
Re: Diff between two arrays
by pboin (Deacon) on Jul 01, 2003 at 16:42 UTC
    I believe the following is a working solution.

    However, the problem of equating burritos to tacos is a different question IMO. I have not attempted that at all, but you'd need a 'dictionary' of equivalents I think.

    This is my first attempt to reply with a working solution, so thank you for your patience. (Comments requested and appreciated.

    my @array1=qw(I like to eat tacos); my @array2=qw(Do you like to eat burritoes); my @diff = &get_diff(\@array1,\@array2); sub get_diff(){ my $slave = shift(@_); # reference to @array1 my $master = shift(@_); # reference to @array2 # loop through @master for (my $m = 0; $m < scalar(@$master); $m++){ my $matchpos = 0; # loop through @slave for (my $s = 0; $s < scalar(@$slave); $s++){ # is this word found? if (@$master[$m] eq @$slave[$s]){ $matchpos = $s; last; } } @diff = (@$master[$m],$m,$matchpos); print "@$master[$m],$m,$matchpos\n"; } }
Re: Diff between two arrays
by apsyrtes (Beadle) on Jul 01, 2003 at 16:06 UTC
    How are you to determine that "burritoes" (sic) is a replacement for "tacos" ?? What are the rules that determine that, without also determining that "Do" is a replacement for "I"?

    I'm sure you have an END RESULT in mind (ie. why are you trying to do this?), and you are currently trying to solve a problem that is getting in the way of your solution... if you can say what you are trying to achieve, likely you'll find there somebody has a better SOLUTION so that you won't have to solve this problem you are dealing with.

    Jason W.
      This is the end result I have in mind. What I need is a map of the connection between the two arrays. The arrays represent a certain set of tokens in two different files. What I want is very similiar to doing a diff between two files, except I don't want it on a character by character basis or a line by line bases, I want it on a token by token bases where a token matches m/(\w+|\S)/. What I need the map for is not important (well, it is to me, but I already know that I need it for what I am doing). In response to your other question, "How are you to determine that "burritoes" (sic) is a replacement for "tacos" ?", I would determine that the same way that I would determine "a" is a replacement for "e" in the between "tall" and "tell". The algorithm would be something like "IF everything in sequence A is the same as everything in sequence B except for the token at index N, THEN token B[N] replaced A[N].
Re: Diff between two arrays
by dree (Monsignor) on Jul 01, 2003 at 19:20 UTC
    A way to compare strings (and in this case you can join the arrays into strings) is to use Text::PhraseDistance that <<provides a way to compare two phrases and to give a measure of their proximity>>
Re: Diff between two arrays
by Skeeve (Vicar) on Jul 02, 2003 at 06:29 UTC
    I think you should think about what do you really want to find?

    1. the first match and a mapping from that on?
    2. the longest math and a mapping from that on?
    3. the most matches and a mapping between those?
    4. the best mapping (define "best")
    To give examples for each of my points, I use these arrays
    a b c d e f g h i j k l m
    z b y d e x h i j k w v m
    how would that map?
    1. this would find the first match at "b" and result in
    2. this would find the longest match at "h i j k" and result in something like
      a b|c|d|e|f|g|h|i|j|k|l|m
      z  |b|y|d|e|x|h|i|j|k|w|v m
    3. this would find
      a|b|c|d|e|f g|h|i|j|k|l  |m
      z|b|y|d|e|x  |h|i|j|k|w v|m
    4. I have no idea about that From what I recall from my studies, at least the "most-matches" sounds to me like a problem that has a complexity O(n)=n**2.

      You will have to compare each possible mapping, score it and compare the scores to decide which fits your needs best.

      Non-trivial, i think

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://270509]
Approved by broquaint
Front-paged by arthas
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2017-03-28 05:45 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (327 votes). Check out past polls.