Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Reconciling multiple lists (similar to "merge" in CVS?) (diffN)

by tye (Sage)
on Jan 06, 2007 at 09:44 UTC ( [id://593281]=note: print w/replies, xml ) Need Help??


in reply to Re: Reconciling multiple lists (similar to "merge" in CVS?) (diffN)
in thread Reconciling multiple lists (similar to "merge" in CVS?)

And here is rather ugly but working code:

#!/usr/bin/perl -w use strict; use Algorithm::Diff 1.19 (); Main( @ARGV ); exit( 0 ); { package DiffToMerge; sub new { my $class= shift @_; my $diff= Algorithm::Diff->new( @_ ); $diff->Next(); # Start out at first chunk my $me= bless { diff => $diff, offset => 0, }, $class; if( $diff->Diff() ) { $me->{sublist}= [ $diff->Items( 1 ) ]; $me->{suboff}= $diff->Min( 1 ); } return $me; } sub Same { my( $me, $off )= @_; for( $me->{offset} ) { die if $off < $_; die if $_+1 < $off; $_= $off; } my $diff= $me->{diff}; if( $diff->Max(2) < $off ) { die if ! $diff->Next(); if( $diff->Diff() ) { $me->{sublist}= [ $diff->Items( 1 ) ]; $me->{suboff}= $diff->Min( 1 ); } if( ! $diff->Range(2) ) { die if $off <= $diff->Max(2); die if ! $diff->Next(); } die if $off < $diff->Min(2); die if $diff->Max(2) < $off; } return $diff->Same(); } sub Shift { my( $me )= @_; my $diff= $me->{diff}; die if ! $diff->Same(); return $diff->Min(1) + $me->{offset}++ - $diff->Min(2); } sub Sublist { my( $me )= @_; my $sublist= $me->{sublist}; return if ! $sublist || ! @$sublist; return $sublist; } sub SublistOffset { my( $me, $offset )= @_; return undef if ! defined $offset; shift @{ $me->{sublist} }; return $me->{suboff}++; } sub SublistIdxs { my( $me )= @_; my $beg= $me->{suboff}; $me->{suboff} += @{ $me->{sublist} || [] }; $me->{sublist}= []; return $beg .. $me->{suboff}-1; } } sub FlushSublist { my( $avOut, $d, $diff )= @_; for( $diff->SublistIdxs() ) { my @row; $row[$d]= $_; push @$avOut, \@row; } } sub DiffMerge { my( $finish, @seq )= @_; my @diff= map { DiffToMerge->new( $_, $seq[-1] ) } @seq[ 0 .. $#seq-1 ]; my @out; for my $i ( 0 .. $#{ $seq[-1] } ) { my @row; my %sublists; my @flush; for my $d ( 0 .. $#diff ) { my $same= $diff[$d]->Same($i); for( $diff[$d]->Sublist() ) { $sublists{$d}= $_ if $_; } if( $same ) { push @flush, $d if $sublists{$d}; $row[$d]= $diff[$d]->Shift(); } } if( @flush ) { for my $row ( DiffMerge( 0, values %sublists ) ) { my @subrow; for my $d ( keys %sublists ) { $subrow[$d]= $diff[$d]->SublistOffset( shift @$row + ); } push @out, \@subrow; } FlushSublist( \@out, $_, $diff[$_] ) for @flush; } $row[@diff]= $i; push @out, \@row; } if( $finish ) { for( 0 .. $#diff ) { FlushSublist( \@out, $_, $diff[$_] ) if $diff[$_]->Sublist(); } } return @out; } sub Main { my @seq= @_; @seq= qw( abdefhi azcdfghi abcjkgnhi ) if ! @seq; @seq= map [ /./gs ], @seq; my @out= DiffMerge( 1, @seq ); for( @seq ) { push @$_, '-'; } for( @out ) { for( @$_ ) { $_= '-' if ! defined $_; } print "( @$_ )\n"; } print $/; for( @seq ) { my @l= map shift @$_, @out; for( @l ) { $_= -1 if ! defined $_ || '-' eq $_; } print "( @$_[@l] )\n"; } }

can be used as:

% perl diffN.pl ( 0 0 0 ) ( 1 - 1 ) ( 2 - ) ( 3 - ) ( 4 - ) ( - 1 ) ( - 2 2 ) ( - - 3 ) ( - - 4 ) ( - 3 ) ( - 4 ) ( - 5 5 ) ( - - 6 ) ( 5 6 7 ) ( 6 7 8 ) ( a b d e f - - - - - - - - h i ) ( a - - - - z c - - d f g - h i ) ( a b - - - - c j k - - g n h i ) % perl diffN.pl tye says japhy asked ( - - 0 ) ( - 0 ) ( - 1 1 0 ) ( - - 2 ) ( - - 3 ) ( 0 - - ) ( 1 2 4 ) ( - 3 - 1 ) ( - - - 2 ) ( 2 - - 3 ) ( - - - 4 ) ( - - - - - t y - - e - ) ( - s a - - - y s - - - ) ( j - a p h - y - - - - ) ( - - a - - - - s k e d )

Update: And here is a case with a bug to be fixed:

% perl diffN.pl encyclopedia cyclops enclosure wikipedia cyclone wick +ed lonely ( - - - 0 - 0 ) ( - - - 1 - 1 ) ( 0 - 0 - - - ) ( 1 - 1 - - - ) ( 2 0 2 - 0 2 ) ( - - - 2 - 3 ) ( 3 ) ( 4 ) ( - 1 ) ( - 2 ) ( - - - - 1 ) ( - - - - 2 ) ( 5 3 3 - 3 - 0 ) ( 6 4 4 - 4 - 1 ) ( - - - - 5 - 2 ) ( - - - 3 ) ( 7 5 - 4 ) ( - 6 5 - ) ( - - 6 - ) ( - - 7 - ) ( 8 - 8 5 6 4 3 ) ( - - - - - - 4 ) ( - - - - - - 5 ) ( 9 ) ( 10 ) ( 11 ) ( - - - 6 ) ( - - - 7 ) ( - - - 8 ) ( - - - - - 5 ) ( - - e n c - y c - - - - l o - - p - - - e - - d i a - - - - ) ( - - - - c - - - y c - - l o - - p s - - - - - - - - - - - - ) ( - - e n c - - - - - - - l o - - - s u r e - - - - - - - - - ) ( w i - - - k - - - - - - - - - i p - - - e - - - - - d i a - ) ( - - - - c - - - - - y c l o n - - - - - e - - - - - - - - - ) ( w i - - c k - - - - - - - - - - - - - - e - - - - - - - - d ) ( - - - - - - - - - - - - l o n - - - - - e l y - - - - - - - ) # ^ ^ ^ ^ ^ ^ < should merge > ^ ^ ^ ^ ^ ^ ^

- tye        

Replies are listed 'Best First'.
Re^3: Reconciling multiple lists (similar to "merge" in CVS?) (diffN)
by BrowserUk (Patriarch) on Jan 09, 2007 at 10:02 UTC

    Playing around with some code for this, there are often multiple ways in which differences could be merged. For example, from your test above:

    encyclo pedia enc losur e #or encyclop edia enc lo sure

    To me, these are both equally legitimate, but is there any criteria upon which the one could be determined to be preferrable to the other?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://593281]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-24 12:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found