Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Perl: the Markov chain saw
 
PerlMonks  

Patience Sorting To Find Longest Increasing Subsequence

by Limbic~Region (Chancellor)
on May 03, 2006 at 16:32 UTC ( #547199=perlmeditation: print w/ replies, xml ) Need Help??

All,
A few weeks ago, TedPride posted an interesting puzzle. kvale provided a hint by linking to Patience Sorting that no one apparently used. bart mentioned yesterday he would like to see an implementation of both the sorting algorithm as well as the way to divine the longest increasing subsequence from the result.

Patience Sorting

Each new card is placed on the left most pile where the top card in that pile is greater than the new card. If no pile meets this criteria, the new card is placed in a new pile to the right of the existing piles. When all cards are exhausted, the number of piles defines the length of the longest increasing subsequence. It is that simple. You only need to add 1 more thing to retrieve the sequence from the piles. <update> As bart points out below, this may be all that is required to find the longest increasing subsequence but the sort still isn't finished. I have added a merge sort to his reply to demonstrate how that is possible.</update>

Longest Increasing Subsequence

After each card is placed, a note is taken of the top card in the previous pile. If it is the left most pile no note is needed. When all the cards are exhausted, starting in the right most pile we can work backwards finding the cards that make up the longest increasing subsequence. The reverse of this is the sequence is what is desired.

The following is a straight forward implementation. I took advantage of the fact that you know the length of the sequence before actually finding it to alleviate the need for reverse as I fill the array.

#!/usr/bin/perl use strict; use warnings; use List::Util 'shuffle'; use constant VAL => 0; use constant PREV => 1; my @list = shuffle(1..20); print "@list\n"; print join " ", Long_Inc_Sub(@list); sub Long_Inc_Sub { my @list = @_; my (@pile, @seq); NUM: for my $num (@list) { for (0 .. $#pile) { if ($num < $pile[$_][-1][VAL]) { my $prev = $_ ? $#{$pile[$_ - 1]} : undef; push @{$pile[$_]}, [$num, $prev]; next NUM; } } my $prev = @pile ? $#{$pile[-1]} : undef; push @pile, [[$num, $prev]]; } my ($prev, $len) = ($#{$pile[-1]}, scalar @pile); while ($len--) { $seq[$len] = $pile[$len][$prev][VAL]; $prev = $pile[$len][$prev][PREV]; } return @seq; }

Things To Consider

Do the algorithms still work if the original list sorted in ascending order is non-continguous (! 1..N)? Do the algorithms still work if the original list contains duplicates? Are there ways to maintain the same speed and decrease the memory consumption? Would there be a benefit in replacing the linear card placement search with a binary one or would the overhead outweigh the benefit. If the answer is "it depends", is there a way to know ahead of time to dispatch the appropriate method?

These and other interesting things to ponder are all left as an exercise for the reader.

Cheers - L~R

Comment on Patience Sorting To Find Longest Increasing Subsequence
Download Code
Re: Patience Sorting To Find Longest Increasing Subsequence
by TedPride (Priest) on May 03, 2006 at 17:34 UTC
    use strict; use warnings; my (@arr, @sequence, @result, $p, $node); @arr = 1..20; randomize(\@arr); print "@arr\n"; for (@arr) { $p = simplefind(\@sequence, $_); $node = {'key' => $_}; if (!$p) { $sequence[1] = $node; } else { $node->{'last'} = $sequence[$p]; $sequence[$p+1] = $node; } } $node = $sequence[-1]; while ($node) { push @result, $node->{'key'}; $node = $node->{'last'}; } print join ' ', reverse @result; sub simplefind { my ($p, $key, $i) = @_; for ($i = 1; $i <= $#$p; $i++) { last if $key < $p->[$i]{'key'}; } return $i-1; } sub randomize { my ($p, $size, $swap, $key); $p = $_[0]; $size = $#$p; for (0..($size-1)) { $swap = int rand($size - $_ + 1) + $_; $key = $p->[$_]; $p->[$_] = $p->[$swap]; $p->[$swap] = $key; } }
    You build an array of sequences..

    1-sequence
    2-sequence
    3-sequence
    etc.

    Where each n-sequence is the n-sequence with the smallest possible end item. By the nature of the structure, the end items will always be listed in increasing order - the end item for the 2-sequence will always be less than or equal to the end item for the 3-sequence.

    Now, for each new item, just find the sequence with the largest end item that is smaller than or equal to the current item. Link your current item to this sequence and link the slot for the next largest sequence to your current item. Rinse and repeat until all items are processed.

    For instance, given 3, 5, 2, 1, 7:

    3: (3)
    5: (3, 3-5))
    2: (2, 3-5))
    1: (1, 3-5))
    7: (1, 3-5, 3-5-7)

    The only really hard part is coming up with a binary find for the largest item that's smaller than or equal to the current item. I know it's possible, but the exact algorithm eludes me. I'm currently using simplefind() for test purposes - an O(n) algorithm rather than the proper O(lg n). Fixing this will make the entire algorithm O(n lg n). (EDIT: Actually, it's O(n lg s), where s is the longest sequence, but worst-case is s = n)

    (You may also notice, by the way, that the entire sequence is reconstructed from data stored along the way. It's also possible to store just the the subscript of the end item in each n-length sequence, then do a linear find backwards from that point, recreating the sequence as you go from all the items smaller than or equal to the current part of the sequence.)

      TedPride,
      Per our /msg conversation, here is a version of my implementation that uses a binary search:
      sub Long_Inc_Sub { my @list = @_; my (@pile, @seq); for my $num (@list) { if (@pile) { if ($num < $pile[-1][-1][VAL]) { my ($beg, $end) = (0, $#pile); while ($beg <= $end) { my $mid = int(($beg + $end) / 2); if ($num < $pile[$mid][-1][VAL]) { $end = $mid - 1; } else { $beg = $mid + 1; } } my $prev = $beg ? $#{$pile[$beg - 1]} : undef; push @{$pile[$beg]}, [$num, $prev]; } else { push @pile, [[$num, $#{$pile[-1]}]]; } } else { push @pile, [[$num, undef]]; } } my ($prev, $len) = ($#{$pile[-1]}, scalar @pile); while ($len--) { $seq[$len] = $pile[$len][$prev][VAL]; $prev = $pile[$len][$prev][PREV]; } return @seq; }
      It does not currently handle exact matches since there should not be any duplicates in the list (1..N). I did leave this open as a question to ponder and it should be fairly trivial to adapt if you decide it is safe to do so ;-)

      Cheers - L~R

Re: Patience Sorting To Find Longest Increasing Subsequence
by bart (Canon) on May 05, 2006 at 07:51 UTC
    Your code doesn't do the complete patience sorting, it does more or less sort, but you're left with some sorted piles of cards after it finishes — typically 11 piles for a deck of 52 cards, I read. You still need to do the second step: pick the lowest card from every pile, but you don't know what pile that is — except when you just start, then it's the leftmost one. It would be some form of merge sort.

    I found one of the original scientific papers in PDF format (30 pages, 277k), in year 1999, item 2.

    On to your questions.

    Do the algorithms still work if the original list sorted in ascending order is non-continguous (! 1..N)?
    Of course they do. All you use for the sorting action, is compare two items. Apart from that, their actual value is never used.

    Do the algorithms still work if the original list contains duplicates?
    Yes, but depending on what you do when they compare the same, your sort might be a stable sort, or not. You would get a stable sort if you treat the current card as the larger one, when they do agree.

    Are there ways to maintain the same speed and decrease the memory consumption?
    If so, I haven't found one.

    Would there be a benefit in replacing the linear card placement search with a binary one or would the overhead outweigh the benefit.
    No idea... In practice, I found binary search often to be slower than linear search, for few items — sometimes not even that few, as I've seen linear search being faster for a search in 500 items.

    But we're still stuck with an incomplete sorting.

    Perhaps binary search, or rather, a binary tree, could be benificial to complete the sorting: when you have constructed the piles, put (just) the top cards in the binary tree, remove the lowest value from it and from the pile, and insert the next one from the same pile that it came from. Repeat until all piles are expleted and the tree is empty.

      bart,
      Your code doesn't do the complete patience sorting, it does more or less sort, but you're left with some sorted piles of cards after it finishes typically 11 piles for a deck of 52 cards, I read. You still need to do the second step: pick the lowest card from every pile, but you don't know what pile that is except when you just start, then it's the leftmost one. It would be some form of merge sort.

      No additional sorting is required to find the longest increasing subsequence. As far as picking the lowest card from every pile - that is easy, it is the top card in each pile.

      print "$_->[-1][VAL]\n" for @pile;
      I think you tripped over your wording. I believe you intended to say that at the end of the game, the cards are still not in complete order. While you can get the 1st card for free (the top card on the left most pile), the second card still requires scanning the lowest top card from each pile.

      As I said, this is unnessary for finding the longest increasing subsequence which is what this meditation was about. The straight forward approach to finish the sorting would indeed be a merge sort. To make it even more efficient, the piles could be part of a balanced btree. You always select from the left-most pile and then re-insert that pile back into tree based off the card underneath.

      The questions to ponder were from the perspective of still finding the longest increasing subsequence.

      As far as the binary search. I have 2 versions here to get to the partial sort (sufficient to find the longest increasing subsequence) so benchmarking should be trivial. As far as to complete the sorting - I mentioned one possible way of making it more efficient then a merge sort above but I am not planning on taking it that far. IOW - left as an exercise for the reader. Here is the merge sort though so you have a complete answer to your original question in the CB:

      Cheers - L~R

      bart,
      With regards to your follow up in the CB concerning efficiency. Without resorting to complex data structures, the partial patience sort can be done in O(N Log N) assuming the binary search. Finding the LIS is at max an additional N worst case so O(N Log N + N). The question you posed is if the merge sort O(N^2) was the most efficient way to finish the sort.

      If you take advantage of the fact that the top cards are already in ascending order, you can select the top card of the left most pile and then move that pile to keep the new top card in ascending order. To find the new location using a binary search you have O(Log N). To insert in the middle is O(N). Since you have to do this for N items, you result in O(N^2 Log N). A merge sort is only O(N^2) worst case so no, I don't think so.

      As I said in the other reply, using a different datastructure could make the finishing of the sort more efficient but it also adds a great deal more complexity. You are welcome to use a Van_Emde_Boas_tree which claims to be able to do the whole thing in O(N Log N) but that is an exercise left for the reader.

      Cheers - L~R

        A merge sort is only O(N^2) worst case so no,

        No, a merge sort is worst case O(n log n).

        You are welcome to use a Van_Emde_Boas_tree which claims to be able to do the whole thing in O(N Log N)

        Actually the paper by Bespamyatnikh & Segal contains a proof that you can do it in O(N log log N) time. I havent verified it tho.

        But I doubt that the vEB based algorithm would in practice beat a simpler algorithm to do patience sorting. Unless I guess if you were dealing with a deck with tens of thousands or even millions or billions of cards. The overhead of maintaining a vEB tree is prohibitive for small datasets. The cost of doing binary operations on the keys, maintaining the vEB tree and etc, would most likely outweigh that of a simpler less efficient algorithm.

        As bart said, sometimes a binary search algorithm is not as fast a scan, even though one is O(log N) and the other is O(N). The reason of course is that big-oh notation glosses over issues like cost per operation, and only focuses on the "overwhelming factor" involved. So in a binary search if it takes 4 units of work to perform an operation and in linear search it takes 1 then binary search only wins when 4 * log N < N, so for lists shorter than 13 elements there would be no win to a binary search. And I'd bet that in fact the ratio is probably something like 20:1 and not 4:1. Apply this kind of logic to a deck of 52 cards, and IMO its pretty clear that vEB trees are not the way to go for this, regardless of the proof.

        ---
        $world=~s/war/peace/g

Re: Patience Sorting To Find Longest Increasing Subsequence
by demerphq (Chancellor) on May 08, 2006 at 06:53 UTC

    This is just a simple implementation of Patience Sorting, finding the LIS using backrefs, and doing the final merge phase to produce a sorted list. I didnt bother with anything special like binsearch or what not.

    use strict; use warnings; use List::Util qw(shuffle); use Text::Wrap qw(wrap); sub patience { my ($deck)=@_; my @piles; CARD: foreach my $card (@$deck) { foreach my $pid ( 0..$#piles ) { if ( $card < $piles[$pid][-1][0] ) { push @{$piles[$pid]}, [ $card, $pid ? $piles[$pid-1][-1] : undef ]; next CARD; } } push @piles,[ [ $card, @piles ? $piles[-1][-1] : undef ] ]; } return \@piles } sub lis { my ($piles)= @_; my $iter= $piles->[-1][-1]; my @lis; while ($iter) { push @lis, $iter->[0]; $iter= $iter->[1]; } @lis=reverse @lis; return \@lis } sub inorder { my ($piles)= @_; my @inorder=(pop(@{$piles->[0]})->[0]); while (@$piles) { my $min=0; foreach my $pid (0..$#$piles) { if (@{$piles->[$pid]} && $piles->[$pid][-1][0] < $piles->[$min][-1][0] ){ $min=$pid; } } push @inorder,(pop(@{$piles->[$min]})->[0]); shift @$piles while @$piles && !@{$piles->[0]}; pop @$piles while @$piles && !@{$piles->[-1]}; } return \@inorder } sub print_ary { my ($head,$list)=@_; print wrap($head,"",join ", ",@$list),"\n---\n"; } my @deck= shuffle(1..52); my $piles=patience(\@deck); my $lis=lis($piles); my $inorder=inorder($piles); print_ary("DECK : ",\@deck); print_ary("LIS : ",$lis); print_ary("INORDER: ",$inorder);

    Ive been trying to put together a vEB tree implementation to play with, but so far the documentation on it that I've found hasnt been as great as it looks at first glance. Notably ive been going a little barmy working out how you update the "max" property correctly when deleting the maximum element in a tree. Probably i just tried to work on it too long and if i rereview the documentation again it will clarify itself...

    ---
    $world=~s/war/peace/g

      cool thing to do is using binary..search.. if you have word cat means 3 alpha...2^m=2^3=8 so 000= 0 now sub sequences. 001=t 010=a 011=at 100=c 101=at 110=ca 111=cat

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://547199]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-04-18 00:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (460 votes), past polls