Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Reconstructing List Order From Partial Subsets

by QM (Parson)
on Jul 26, 2006 at 15:24 UTC ( #563819=perlquestion: print w/replies, xml ) Need Help??

QM has asked for the wisdom of the Perl Monks concerning the following question:

I need an algorithm for determining the order of a list of items, based on their occurrence in a file, without knowing the elements of the list ahead of time. The list is repeated multiple times in the file, but each time some elements may be missing. Order is always preserved. Each instance of the list in the file has a start marker. Here is an example list:
Alpha Beta Gamma Delta Epsilon Zeta
If each list instance is marked with Start, here is an example sequence in a file:
Start Alpha Beta Start Epsilon Zeta Start Beta Gamma Zeta Start Alpha Gamma Delta Epsilon
After I read the file, I just need to dump the list in order. It is possible that some files may be indeterminate. For example, given:
Start Alpha Beta Start Alpha Gamma
the order of Beta and Gamma cannot be determined. I just need to make note of this, issue a warning, and then assign an arbitrary order.

I'm not looking for working code, just some ideas to get started.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Replies are listed 'Best First'.
Re: Reconstructing List Order From Partial Subsets
by blokhead (Monsignor) on Jul 26, 2006 at 16:11 UTC
    You want to implement some kind of topological sort.

    Every time an item A occurs before an item B in one of your sublists, that is a constraint that A must occur before B in the global list. So add the edge A->B to your graph.

    Now, you can do several things to your graph. If it has a cycle, you can give an error. Otherwise you can perform a topological sort using your favorite algorithm or graph library. This gives an ordering of the vertices that is consistent with the meaning of each of these A->B constraints.

    If you want to know if there is more than one topological sort possible, you can do something similar to what I outlined here. Just rank the vertices in the same way as I described. Suppose there are N vertices in your graph. If at the end of this process you have N ranks, then there's only one topological ordering possible. Otherwise, some rank must have more than one vertex, and the graph has more than one valid topological ordering.

    It looks like Graph.pm does everything you need except for testing for multiple topological sorts. But along with the simple algorithm I outlined for that, you should be pointed in the right direction.

    Update: It just occured to me that my "ranking" procedure in fact gives a topological sort (indirectly). Just list the things in order of rank. Within ranks, you can assign the order arbitrarily. If there's a cycle, you will fail in the first step (finding rank-0 nodes).

    blokhead

Re: Reconstructing List Order From Partial Subsets
by liverpole (Monsignor) on Jul 26, 2006 at 15:32 UTC
    Hi QM,

    How about using a hash, with the key being the string item, and the value being an index, which gets incremented each time you find a new string.

    Then you can sort numerically on the values of the hash to get the original order.

    Is that what you're looking for?

    Update:  After rereading your question more carefully, it looks like that doesn't work.  But perhaps if you use a hash where the value corresponding to each string is a hash, within which each key is the number of times "Start" has been found, and the value is the index within that sublist.  You'll know an item isn't in a given sublist, because the corresponding key won't be defined.

    For example:

    Start # This is sublist #1 Alpha # $p->{'Alpha'}->{'1'} => 1 Beta # $p->{'Beta'}->{'1'} => 2 Start # This is sublist #2 Epsilon # $p->{'Epsilon'}->{'2'} => 1 Zeta # $p->{'Zeta'}->{'2'} => 2 Start # This is sublist #3 Beta # $p->{'Beta'}->{'3'} => 1 Gamma # $p->{'Gamma'}->{'3'} => 2 Zeta # $p->{'Zeta'}->{'3'} => 3 Start # This is sublist #4 Alpha # $p->{'Alpha'}->{'4'} => 1 Gamma # $p->{'Gamma'}->{'4'} => 2 Delta # $p->{'Delta'}->{'4'} => 3 Epsilon # $p->{'Epsilon'}->{'4'} => 4 would give a hash like: $p = { 'Alpha' => { # Note 'Alpha' doesn't appear in sublist #2 or #3 1 => 1, # Alpha is #1 in the 1st sublist 4 => 1, # Alpha is #1 in the 4th sublist }, 'Beta' => { 1 => 2, 3 => 1, }, 'Epsilon' => { 2 => 1, 4 => 4, }, 'Zeta' => { 2 => 2, 3 => 3, }, 'Gamma' => { 3 => 2, 4 => 2, }, 'Delta' => { 4 => 3, }, };

    Does something like that help you?


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: Reconstructing List Order From Partial Subsets
by japhy (Canon) on Jul 26, 2006 at 16:12 UTC
    Using liverpole's hash idea, I've come up with this (updated with comments):
    use strict; use warnings; my (%pos, @order); # collect the position information { local $/ = ""; while (<DATA>) { chomp; my $i = 0; # $pos{TERM}{SET #} = POSITION $pos{$_}{$. - 1} = $i++ for split /\n/; } } # extract order from the positions for my $i (1 .. keys %pos) { # get all terms who appear ONLY in position 0 # across all the sets they're in my ($first, @extra) = grep { my $t = $_; my %p = map { $_ => 1 } values %{ $pos{$t} }; $p{0} and keys(%p) == 1 } keys %pos; # if there was more than one term found, cause a fuss warn "iteration #$i: multiple candidates [$first @extra]\n" if @extr +a; # uncomment for debugging to see how %pos changes # use Data::Dumper; print Dumper(\%pos); # get the sets this term appeared in and alter # the positions of terms found in those sets for my $set (keys %{ delete $pos{$first} }) { $pos{$_}{$set} and $pos{$_}{$set}-- for keys %pos; } # store this term in the ordered list push @order, $first; } print "<@order>\n"; __DATA__ alpha beta epsilon zeta beta gamma zeta alpha gamma delta epsilon

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      my %values; my %order; while (<DATA>) { chomp; my @l = split /\n/; @values{@l} = (); for my $i ( 1 .. $#l ) { for my $j ( 0 .. $i-1 ) { $order{"$l[$j]\t$l[$i]"} = -1; $order{"$l[$i]\t$l[$j]"} = 1; } } } my @ordered = sort { $order{"$a\t$b"} or 0 } keys %values;

      Update: Don't use this. It does not work. It may appear to work for some test cases, but it is fundamentally flawed. Sorry.

      We're building the house of the future together.
        <deniro>You. You! You... you're good.</deniro>

        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
        Excellent.

        How do you detect indeterminate conditions?

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

        Update: Don't use this. It does not work. It may appear to work for some test cases, but it is fundamentally flawed. Sorry.
        Do you have an example of a failing test case? It may still be useful for what I had in mind...

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: Reconstructing List Order From Partial Subsets
by ruoso (Curate) on Jul 26, 2006 at 17:50 UTC
    I need an algorithm for determining the order of a list of items, based on their occurrence in a file, without knowing the elements of the list ahead of time.

    Well... as you asked to do it without knowing it ahead of time... This is my shot, you can stop feeding it anytime and it will give the order stablished until then.

    #!/usr/bin/perl use strict; use warnings; use List::MoreUtils 'first_index'; my @terms = (); # Instructions my @list = (); # AoA my $last = undef; # last read. while (my $line = <STDIN>) { chomp($line); # We need to reprocess all instructions to keep old orders push @terms, $line; foreach my $item (@terms) { if ($item eq 'Start') { # nothing before start. $last = undef; next; } my ($last_position,$item_position); if (defined $last) { ($last_position) = grep { grep { $_ eq $last } + @{$_} } @list; } ($item_position) = grep { grep { $_ eq $item } @{$_} } + @list; if ($last_position && !$item_position) { my $idx = first_index { $_ == $last_position } + @list; $list[$idx+1] ||= []; push @{$list[$idx+1]}, $item; } elsif ($last_position && $item_position) { my $idx = first_index { $_ == $last_position } + @list; my $idx2 = first_index { $_ == $item_position +} @list; if ($idx == $idx2) { # disambiguation my $idx = first_index { $_ == $last_po +sition } @list; @{$last_position} = grep { $_ ne $item + } @{$last_position}; $list[$idx+1] ||= []; push @{$list[$idx+1]}, $item; } elsif ($idx > $idx2) { # complex disambiguation @{$item_position} = grep { $_ ne $item + } @{$item_position}; $list[$idx+1] ||= []; push @{$list[$idx+1]}, $item; } } elsif (!$last_position && !$item_position) { $list[0] ||= []; push @{$list[0]}, $item; } $last = $item; } } my @ambiguous = grep {defined $_->[1]} @list; if (@ambiguous) { warn 'Ambiguous items: '.join ', ', map { join '|', @{$_} } @a +mbiguous; } print join ', ', map { join '|', @{$_} } @list; print "\n";
    daniel
Re: Reconstructing List Order From Partial Subsets
by planetscape (Chancellor) on Jul 27, 2006 at 09:53 UTC
Re: Reconstructing List Order From Partial Subsets
by rbsgoat (Novice) on Jul 26, 2006 at 15:41 UTC
    set up an array of values that define the order.
    if the index of the current value is less than the previous value warn and inc the hash counter.
    when doing the final list just foreach over the array and print out the number of items in the correct order.
    print "$item\n" x $hash{$item};
Re: Reconstructing List Order From Partial Subsets
by pajout (Curate) on Jul 26, 2006 at 17:11 UTC
    Am I true when expecting that every item has zero or one direct predecessor (in the 'quasiorder' meaning) ?

    If this is true, it seems to be easy to read the sequences and build the structure of kind

    {$name => [$count_of_my_predecessorship, $my_direct_predecessor_name]}

    , and after end of building to check count of predecessorship for each key. Count == 0 => this is root, count == 1 => common item, count > 1 => anounced ambiguity. If you have huge data, it will not be so easy :>)

    Update: Count == 0 => this is leave

    Update2: Count == 0 => this is leaf, sorry for my English :>)

Re: Reconstructing List Order From Partial Subsets
by hv (Parson) on Jul 27, 2006 at 09:23 UTC

    Interesting, I do a similar thing for database upgrades - I extract related changes for a particular upgrade from the individual table descriptions, but they may express additional constraints with a "do this after that one", or "do this before that one". It looks like this:

    [version 92 replace: itemtype enum('publication') not null default 'publication' with: flavour int(11) not null using (do before publication): # database upgrade code for this change # which relies on the old version of the 'publication' table ... ]

    I resolve the ordering using an approach similar to japhy's solution above, but at each pass I look for both things that can happen first and things that can happen last; when all constraints are satisfied, anything else can go in the middle. (If constraints remain, and a pass through the data finds neither a new 'first' nor a new 'last', there is a dependency loop.)

    Hugo

Re: Reconstructing List Order From Partial Subsets
by GrandFather (Saint) on Jul 27, 2006 at 23:10 UTC

    The following works for the test cases I've tried:

    use warnings; use strict; my %afterLists; my %currItems; my @itemList; my $setNum = 0; while (<DATA>) { chomp; next if ! length; if (/start/i) { @itemList = (); %currItems = (); ++$setNum; next; } if (! exists $currItems{$_}) { my %unique; $afterLists{$_} = [] if ! exists $afterLists{$_}; @unique{@{$afterLists{$_}}} = (); @unique{@{$afterLists{$_}}} = () for @itemList; @unique{@itemList} = () if @itemList; if (exists $unique{$_}) { print "$_ order is inconsistent in set $setNum with a prev +ious set\n"; delete $unique{$_}; } $afterLists{$_} = [keys %unique] if keys %unique; $currItems{$_} = 1; push @itemList, $_; } elsif ($currItems{$_}++ == 1) { print "$_ found multiple times in set $setNum\n"; } } my @ordered = sort {$#{$afterLists{$a}} <=> $#{$afterLists{$b}}} keys +%afterLists; my %lengths; push @{$lengths{scalar @{$afterLists{$_}}}}, $_ for @ordered; my @indeterminates = grep {scalar @{$lengths{$_}} != 1} keys %lengths; print "The order of @{$lengths{$_}} can not be determined\n" for @inde +terminates; print "@ordered"; __DATA__ Start Alpha Beta Start Epsilon Zeta Start Beta Gamma Zeta Start Alpha Epsilon Gamma Start Zeta Gamma

    Prints:

    Gamma order is inconsistent in set 5 with a previous set The order of Zeta Gamma can not be determined The order of Beta Epsilon can not be determined Alpha Beta Epsilon Zeta Gamma

    DWIM is Perl's answer to Gödel
Re: Reconstructing List Order From Partial Subsets
by rsiedl (Friar) on Jul 27, 2006 at 04:25 UTC
      ...How?

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: Reconstructing List Order From Partial Subsets
by sfink (Deacon) on Jul 28, 2006 at 06:04 UTC
    I'm not sure if this really adds to what has already been said, but when I thought about the problem, I thought it'd be nice to at all times have a minimal graph representing everything known so far.

    So the sequences (AB, AC, BC) would produce the series of graphs

    A B
    A B C
    A B C
    To do this, consider the intermediate step where you have a minimal graph for all the previous sublists seen, and you are walking through a new sublist. Keep a pointer to your current position in the graph, and move it down along edges as long as the next element is a child of the current position. If the next element is not a child, you're going to add it as a child, which might create redundant links in the graph: (after adding B->C)
    A |\ B | |/ C
    To remove them, loop through all parents of the new node and delete the ones that are reachable via the new link (in the above example, you're adding B->C, so you look at the parents of C and discover that A is reachable via B, so you delete that link). Do the same in the other direction to prune redundant children:
    sub reachable { my ($links, $start, $dest) = @_; return 1 if $start eq $dest; foreach (keys %{ $links->{$start} }) { return 1 if reachable($links, $_, $dest); } return 0; } my %next; my %prev; my $ptr; while(<DATA>) { chomp; if ($_ eq 'Start') { undef $ptr; next; } if (defined $ptr) { if (! $next{$ptr}{$_}) { for my $parent (keys %{ $prev{$_} }) { if (reachable(\%prev, $ptr, $parent)) { delete $prev{$_}{$parent}; } } for my $child (keys %{ $next{$_} }) { if (reachable(\%next, $ptr, $child)) { delete $next{$_}{$child}; } } $next{$ptr}{$_} = 1; $prev{$_}{$ptr} = 1; } } $ptr = $_; }
    (Because this is all done via hashes, you're not really keeping a pointer to where you are in the dag, because the string itself takes you directly to the right place via %next and %prev.)

    Then you just need to look through %prev to find all keys with no values. They are the roots. If there is more than one, you have a problem. Otherwise, walk through %next to find the linear chain. If you ever have more than one child, you have a problem.

    One way to make an arbitrary decision and resolve these problems is to, whenever faced with a choice (multiple roots or multiple children), add edges from one to another and rerun the same minimization fixup.

    This isn't the fastest algorithm, because you're constantly doing full reachable() queries that could be truncated in many cases, but it's pretty straightforward.

    It might even work.

Re: Reconstructing List Order From Partial Subsets
by Moron (Curate) on Jul 27, 2006 at 15:14 UTC
    This just prints out what is found in file that is declared in the predefined list, although I suspect (but don't know) you want more than this, e.g. to compare the two orderings and organise the result of that in some way.
    my @baselist = qw ( alpha beta etc ); # or from wherever my %baselist = (); $baselist{ $baselist[$_] } = $_ for ( 0..$#baselist ); # find baselist in the order it appears in input my %rank = (); my $rank = 0; while(<>) chomp; defined( $baselist( $_ ) ) or next; if ( defined( $rank{ $_ } ) { warn "duplicate entry at line $.\n"; } else { $rank{ $_ } = ++$rank; } } # print order actually found print "$_\n" for sort { $rank{$a} <=> $rank{$b} } keys %rank;

    -M

    Free your mind

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://563819]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (1)
As of 2021-01-24 08:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?