http://www.perlmonks.org?node_id=326891

ngomong has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for a way to join multiple arrays, but in a way that attempts to preserve the order of the elements. Here's what I mean:
my @array1 = (1, 3, 4, 6); my @array2 = (1, 2, 4, 6); my @array3 = (1, 2, 3, 5); my %seen; my @union = grep { not $seen{$_}++ } (@array1, @array2); undef %seen; @union = grep { not $seen{$_}++ } (@union, @array3); print join(' ', @union);
The output from this script is:
1 3 4 6 2 5

But, as you can see from my original arrays, 2 should come before 3, and 4 should come after 2. Is there a way to intelligently attempt to preserve the proper order of the elements? There will, of course, be data sets that can't be preserved, eg/
@array1 = (1, 2, 3, 4); @array2 = (1, 3, 2, 4);
... but the code should handle this gracefully and just choose one over the other.

Any ideas?

Thanks!
clay.

Replies are listed 'Best First'.
Re: "Intelligent" array joining (topological sort)
by tye (Sage) on Feb 05, 2004 at 22:56 UTC
    #!/usr/bin/perl -w use strict; sub tsort { my( %pred, %succ ); for my $av ( @_ ) { for my $i ( 1 .. $#$av ) { $pred{$av->[$i]}{$av->[$i-1]} ||= 1; $succ{$av->[$i-1]}{$av->[$i]} ||= 1; } $succ{$av->[-1]} ||= {}; } my @output; while( %succ ) { my( $best, $count ); for my $item ( keys(%succ) ) { my $preds= keys %{$pred{$item}}; if( ! defined($count) || $preds < $count ) { $best= $item; $count= $preds; last if 0 == $count; } } warn "Data contains a cycle, breaking at $best.\n" if 0 < $count; push @output, $best; for my $succ ( keys %{$succ{$best}} ) { delete $pred{$succ}{$best}; } delete $succ{$best}; } return wantarray ? @output : \@output; } my @array1= ( 1, 3, 4, 6 ); my @array2= ( 1, 2, 4, 6 ); my @array3= ( 1, 2, 3, 5 ); my @output= tsort( \( @array1, @array2, @array3 ) ); print "@output\n"; @output= tsort( [1,2,3,4], [1,3,2,4] ); print "@output\n";
    output is
    1 2 3 4 5 6 Data contains a cycle, breaking at 2. 1 2 3 4

    - tye        

Re: "Intelligent" array joining
by dragonchild (Archbishop) on Feb 06, 2004 at 03:40 UTC
    This sounds like a directed graph. I'd recommend looking at Graph::Directed. The following worked for me:
    use Graph::Directed; my @array1 = qw(dog cat rat mouse); my @array2 = qw(dog rat mouse bird); my @array3 = qw(cat rat fish mouse); my $graph = Graph::Directed->new; $graph->add_path(@array1); $graph->add_path(@array2); $graph->add_path(@array3); my @toposort = $graph->toposort; print "@toposort\n"; ------ dog cat rat fish mouse bird

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      As a follow-up... I've noticed that any duplicates in a given array will mess up the topological sort. So, I just added a quick duplicate search:
      use Graph::Directed; my @array1 = qw(dog rat rat mouse); my @array2 = qw(dog rat mouse bird); my @array3 = qw(cat rat fish mouse); my @elements; push(@elements, \@array1, \@array2, \@array3); my $graph = Graph::Directed->new; for (0..$#elements) { # Check for duplicate GUIDs in this set my %seen = (); my @dup = (); foreach my $item (@{$elements[$_]}) { if ($seen{$item}++) { push(@dup, $item); } } unless ($#dup == -1) { print "Duplicate elements: @dup"; exit; } # If all's well, add to the path $graph->add_path(@{$elements[$_]}); } my @elements_ordered = $graph->toposort; print "@elements_ordered\n";
      Hey, wow! That's precisely what I needed. Pretty straightforward, too. I just didn't know what to look for.

      Just goes to show... with Perl, if you've got a problem, somebody's already figured it out and written a module.

      On to reading more about directed graphs and topological sorting...

      Thanks!
Re: "Intelligent" array joining
by jeffa (Bishop) on Feb 05, 2004 at 21:25 UTC

    Unless i am missing something (which i usually am), why not just sort the results?

    @union = sort grep { not $seen{$_}++ } (@union, @array3);

    UPDATE: i think i see what you want now ... let's try using CHARS instead of INTS. In order to have "intelligent" sorting, you have to provide the "intelligence" ... in this case, let's give weights to each of the items we are dealing with:

    my @array1 = ([b=>1], [d=>3], [z=>4], [e=>5]); my @array2 = ([a=>2], [b=>1], [z=>4]); my @array3 = ([d=>3], [e=>5]);
    The rest of the code is mostly the same (i did not bother to see if this could be refactored for efficiency), but since our arrays hold more arrays, we need to code appropriately:
    my %seen; my @union = grep { not $seen{$_->[0]}++ } (@array1, @array2); undef %seen; @union = map { $_->[0] } sort { $a->[1] <=> $b->[1] } grep { not $seen{$_->[0]}++ } (@union, @array3); use Data::Dumper; print Dumper \@union;
    Hope this helps. :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      Well, you see... I'm not trying to sort based on numerical order. I'm trying to sort based on the original order of the arrays. Let's try this again:
      my @array1 = qw(dog cat rat mouse); my @array2 = qw(dog rat mouse bird); my @array3 = qw(cat rat fish mouse);
      The output would then be:
      dog cat rat mouse bird fish
      Here, fish should come before mouse, and before bird.
      So, I'm not sorting numerically, and I'm not sorting alphabetically... I'm trying to preserve the order of the original arrays...

      To make this (perhaps) a little more clear, here's my specific problem. I'm taking data sets, with questions, and each question has a unique identifier, a GUID. For the most part, for each data set, the GUIDs will be in the same order, but every now and then, they change a question and assign a new GUID. Or, they insert a new question into the middle.

      I need to combine all the data sets, and attempt to preserve the natural order of the questions.

      Does that make a bit more sense?
Re: "Intelligent" array joining
by valentin (Abbot) on Feb 05, 2004 at 21:50 UTC
    Let Tie::IxHash keep your order. I did not understand which order is prefered for your example, but put them in your order into the hash. That's what you get back.
    my @array1 = (1, 3, 4, 6); my @array2 = (1, 2, 4, 6); my @array3 = (1, 2, 3, 5); tie my %seen, 'Tie::IxHash'; for (0..$#array1) { my @v = sort ( $array1[$_], $array2[$_], $array3[$_] ); @seen{ @v } = 1; } print join ' ', keys %seen;
      Hmmm... this appears to be a step in the right direction, though I don't need that "sort" in there. You see, numerical order is of no importance... only the order in which it appears in the array. It's like this:

      my @array1 = (1, 3, 4, 6); my @array2 = (1, 2, 4, 6); my @array3 = (1, 2, 3, 5);
      So, the program looks as the first array, and notes its order. Then, it looks at the second array, and notices that "2" hasn't been seen before. It notes that it goes after "1" and before "4". At this point, the logical order could be either

      (1, 2, 3, 4, 6) or
      (1, 3, 2, 4, 6)

      It doesn't matter which it chooses. Then, it examines the third array and realizes that "2" comes before "3". Plus, the "5" is new, and comes after "3". So, now the order could be:

      (1, 2, 3, 4, 5, 6) or
      (1, 2, 3, 4, 6, 5)

      Does that make sense?
Re: "Intelligent" array joining
by runrig (Abbot) on Feb 05, 2004 at 23:40 UTC
    This is not very thought out or tested, probably not very efficient on large lists, but I think its interesting, and it seems to work:
    #!/usr/bin/perl use strict; use warnings; my @array1 = qw(dog cat rat mouse); my @array2 = qw(dog rat mouse bird); my @array3 = qw(cat rat fish mouse); #my @array1= ( 1, 3, 4, 6 ); #my @array2= ( 1, 2, 4, 6 ); #my @array3= ( 1, 2, 3, 5 ); init_cmp($_) for \@array1, \@array2, \@array3; my %animals; undef @animals{@array1, @array2, @array3}; print join(",", keys %animals), "\n"; my @sorted = sort { my_cmp($a, $b) } keys %animals; print join(",", @sorted), "\n"; { my %lt; sub init_cmp { my ($this, @rest) = @{$_[0]}; while ( @rest ) { $lt{$this}{$rest[0]} = 1; $this = shift @rest; } } sub is_lt { my ($first, $next) = @_; return unless exists $lt{$first}; return 1 if $lt{$first}{$next}; for my $mid ( keys %{$lt{$first}} ) { return 1 if is_lt($mid, $next); } return; } sub my_cmp { my ($first, $next) = @_; return is_lt($first, $next) ? -1 : 1; } }
Re: "Intelligent" array joining
by QM (Parson) on Feb 06, 2004 at 01:47 UTC
    I think the OP wants something like this:
    #!/your/perl/here use strict; use warnings; my @array1 = qw(dog cat rat mouse); my @array2 = qw(dog rat mouse bird); my @array3 = qw(cat rat fish mouse); my %seen; my @collection; foreach my $item ( @array1, @array2, @array3 ) { unless ( exists( $seen{$item} ) ) { push @collection, $item; $seen{$item}++; } } print "@collection\n"; __END__
    Which gives:
    dog cat rat mouse bird fish
    I'll leave it to OMAR to come up with an obfuscated compact form.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Doesn't work. Try the following test:
      my @array1 = qw(dog cat rat mouse); my @array2 = qw(dog rat mouse bird); my @array3 = qw(cat rat fish mouse); my %seen; my @collection; foreach my $item ( @array1, @array2, @array3 ) { unless ( exists( $seen{$item} ) ) { push @collection, $item; $seen{$item}++; } } print "@collection\n"; @collection = (); %seen = (); foreach my $item ( @array2, @array1, @array3 ) { unless ( exists( $seen{$item} ) ) { push @collection, $item; $seen{$item}++; } } print "@collection\n"; __END__

      Prints:

      dog cat rat mouse bird fish dog rat mouse bird cat fish

      Theoretically, it should always give the same order, no matter what order the initial arrays are in.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Theoretically, it should always give the same order, no matter what order the initial arrays are in.
        Ah, I missed that in the specification. I suppose if the OP could have stated the problem better, s/he might have known where to find it in the first place, no? ;)

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: "Intelligent" array joining
by Anonymous Monk on Feb 05, 2004 at 21:34 UTC
    topological sort, aka tsort
Re: "Intelligent" array joining
by thospel (Hermit) on Feb 07, 2004 at 13:09 UTC
    Several good answers all based on basically doing a topological sort have already been given.

    Can this problem also be solved without essentially doing a topological sort ?

    When you consider as input a sequence of arrays where each consists of two elements, that simply says the first element must come before the second, and you actually have the standard way to ask for a topological sort.

    So no, in general you can't. Any solution must also be able to do a topological sort.

Re: "Intelligent" array joining
by Not_a_Number (Prior) on Feb 06, 2004 at 19:02 UTC

    Bit late, I'm afraid, but this seems to do what you want:

    use strict; use warnings; my @array1 = qw ( ant bee ant cow ant dog ); my @array2 = qw ( ant cat bee cow rat ); my @array3 = qw ( ant 1 3 5 7 9 1 2 3 4 5 6 7 8 9 X ); my @arrays; my $length = 0; for ( \@array1, \@array2, \@array3 ) { $length = @$_ if @$_ > $length; push @arrays, $_; } my %seen; foreach my $i ( 0 .. $length - 1 ) { for ( @arrays ) { print "$_->[$i] " if $_->[$i] and not $seen{$_->[$i]}++; } }

    Output:

    ant bee cat 1 3 cow 5 rat 7 dog 9 2 4 6 8 X

    Maybe there's a way of avoiding two loops, though?

    dave

    Update: Single loop:

    my @arrays = \( @array1, @array2, @array3 ); my %seen; my $i = 0; { my $count = 0; for ( @arrays ) { ++$count and next unless $_->[$i]; print "$_->[$i] " unless $seen{$_->[$i]}++; } $i++; redo unless $count >= @arrays; }
      Doesn't work. Your method favors/regurgitates the order of the first array, but according to the second array, 'cat' should come between 'ant' and 'bee', not after.

        You're right, thanks. I should have read the later specs more carefully.

        dave

Re: "Intelligent" array joining
by jdporter (Chancellor) on Aug 12, 2004 at 04:36 UTC
    Well, I don't know if this is a topical sort -- I kind of doubt it -- but it seems to me to give satisfactory results, and handles "loops" quite gracefully. It's pretty efficient, but doesn't necessarily give the optimal result. Pass a list of arrayrefs.
    sub concensus_sort { my( %pre, %suf ); for my $ar ( @_ ) { my @a = @$ar; my @pre; while ( @a ) { my $k = shift @a; $pre{$k} ||= {}; $suf{$k} ||= {}; $pre{$k}{$_}++ for @pre; $suf{$k}{$_}++ for @a; push @pre, $k; } } if ( $main::DEBUG ) { for ( keys %pre ) { for my $p ( keys %{ $pre{$_} } ) { if ( exists $pre{$p}{$_} ) { print "$_-$p AND $p-$_ !!!\n"; } } } } my @result; for my $k ( sort { keys(%{$pre{$a}}) <=> keys(%{$pre{$b}}) or keys(%{$suf{$a}}) <=> keys(%{$suf{$b}}) } keys %suf ) { push @result, $k; # now remove all trace of it in the data structures: delete $pre{$_}{$k} for keys %{ $suf{$k} }; delete $suf{$_}{$k} for keys %{ $pre{$k} }; delete $pre{$k}; delete $suf{$k}; } @result }
    But it still doesn't give a pretty result when the input lists are disjoint.
Re: "Intelligent" array joining
by ngomong (Sexton) on Mar 11, 2004 at 17:00 UTC
    Following up on this a bit more...

    I found that duplicates in each input array would cause problems, so I put in a check for this (above, in a previous post).

    What I'm finding now, is that problems arise when the input arrays are totally unique. That is, an interection of all the arrays would yield no elements.

    Consider this code:
    #!/usr/bin/perl use strict; use warnings; use Graph::Directed; my @array1= ( 1, 2, 3, 4); my @array2= ( 5, 6, 7, 8); my @array3= ( 9, 10, 11, 12); my $graph = Graph::Directed->new; $graph->add_path(@array1); $graph->add_path(@array2); $graph->add_path(@array3); my @ordered = $graph->toposort(); print "@ordered";
    Note that, for this example, I'm using numerals. However, the desired order is not alphabetical, ASCIIbetical, numerical, or anything similar. What's important is the order that the elements appear in the arrays.

    The problem here, is that the vertices in this example are not strongly connected. So, rather than getting:
    1 2 3 4 5 6 7 8 9 10 11 12

    I get:
    5 1 2 9 10 3 4 11 12 6 7 8

    Is there a way have the topological sort expect unique arrays, with no connections, and weight them, based on the order they are received?

    You see, sometimes the arrays will be connected, sometimes they won't... so I need a system that will account for both cases, without being explicitly told which is which.