Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

knapsack problem solved by regex

by rubasov (Friar)
on Mar 14, 2010 at 14:42 UTC ( #828578=perlmeditation: print w/ replies, xml ) Need Help??

On an evening some days ago when I was bored, I've challanged myself to solve something non-trivial by regexes. The resulting code follows, it is a solution to the 0-1 knapsack problem.
use strict; use warnings; use re 'eval'; use List::Util 'sum'; my $max_weight = 400; my %items = ( # value, weight: positive integers 'map' => { v => 150, w => 9 }, 'compass' => { v => 35, w => 13 }, 'water' => { v => 200, w => 153 }, 'sandwich' => { v => 160, w => 50 }, 'glucose' => { v => 60, w => 15 }, 'tin' => { v => 45, w => 68 }, 'banana' => { v => 60, w => 27 }, 'apple' => { v => 40, w => 39 }, 'cheese' => { v => 30, w => 23 }, 'beer' => { v => 10, w => 52 }, 'suntan_cream' => { v => 70, w => 11 }, 'camera' => { v => 30, w => 32 }, 't_shirt' => { v => 15, w => 24 }, 'trousers' => { v => 10, w => 48 }, 'umbrella' => { v => 40, w => 73 }, 'waterproof_trousers' => { v => 70, w => 42 }, 'waterproof_overclothes' => { v => 75, w => 43 }, 'note_case' => { v => 80, w => 22 }, 'sunglasses' => { v => 20, w => 7 }, 'towel' => { v => 12, w => 18 }, 'socks' => { v => 50, w => 4 }, 'book' => { v => 10, w => 30 }, ); my $str = 'v' x sum( map $_->{v}, values %items ) . '0' . 'w' x $max_w +eight; #print $str, "\n"; my $i; my $left = my $right = ''; for ( keys %items ) { $left .= sprintf "(?<%s>(?:%s)?)\n", $_, 'v' x $items{$_}{v}; $right .= sprintf "(?(?{ \$%d })%s|)\n", ++$i, 'w' x $items{$_}{w}; } my $re = sprintf "%s0\n(?=\n%s)\n", $left, $right; #print $re; my $sum = join '+', map length, grep length, $str =~ /$re/x; print $sum, '=', eval $sum, "\n"; print join "\n", grep { length $+{$_} } keys %+; print "\n"; __END__

The input data came from rosettacode.org. With this input it runs for circa five minutes on my machine. If you want to play with it, just replace the input with something simpler, for instance:

# a partition problem as a special case my $max_weight = 136; my %items = ( a => { w => 9, v => 9 }, b => { w => 12, v => 12 }, c => { w => 14, v => 14 }, d => { w => 17, v => 17 }, e => { w => 23, v => 23 }, f => { w => 32, v => 32 }, g => { w => 34, v => 34 }, h => { w => 40, v => 40 }, i => { w => 42, v => 42 }, j => { w => 49, v => 49 }, );

To understand what it does, just uncomment the print lines, supply it with some simple input, and see the contents of $str and $re for yourself. Probably you also want to replace the named captures with standard capturing parentheses in the $left .= ... line as the naming is not essential, they're just there for the pretty printing of the output.

I'm still not fully satisfied with this solution, because the regex is not pure. (By saying pure I do not mean the compsci sense, but that it does not contain embedded code.) Anyway it's close to it, as the only non-pure part of it is something like this: (?(?{ $1 })ww|). But I wasn't able to use (?(1)ww|) instead, because my captures are always matching, though sometimes matching the empty string.

For the educated monks there's probably nothing new in this since much of my inspiration came from Abigail's previous works:

Also there's much more on this topic in the monastery and elsewhere, however I haven't read them (yet):

If you've found bugs in the code above, please let me know.

Cheers,
rubasov

p.s. I've hesitated to post it to the Cool Uses for Perl section instead of Meditations, but all the similar posts were in Meditations so I posted here. Feel free to move it, if it fits better in the CUfP section.

Comment on knapsack problem solved by regex
Select or Download Code
Re: knapsack problem solved by regex
by blokhead (Monsignor) on Mar 14, 2010 at 16:02 UTC
    Update: answered my own question, see below.

    Can you explain a little more how it works? I can see how it selects items based on value, and then enforces that there is enough weight for the combined items selected. But what I'm not seeing is how it only selects the highest-weight solution.

    Here is what I thought would happen with the following instance:

    my $max_weight = 10; my %items = ( # value, weight: positive integers 'a' => { v => 1, w => 2 }, 'b' => { v => 11, w => 9 }, );
    The regex engine would try the following:
    first half second half $1 $2 attempt 1: v vvvvvvvvvvv --> ww + wwwwwwwww = 11, not enough w's! backtrack attempt 2: v <empty> --> ww = 2, enough w's for this, so accept +ed
    In other words, I would have thought that ($1, $2, $3, $4, ...), each of which has two possible assignments, would get tried out in a canonical ordering like 11111, 11110, 11101, ...

    But that doesn't seem to be the case. I modified the code as follows, to print the value of $1, $2, etc, upon each backtracking branch:

    # used to be: # my $re = sprintf "%s0\n(?=\n%s)\n", $left, $right; use Data::Dumper; my $re = sprintf qq[%s0\n(?{ print "trying " . Dumper \\%%+})\n(?=\n%s +)\n], $left, $right; print $re;
    Then with the example above, I always get the following output, independent of whether item 'a' or 'b' comes first in the matching:
    STRING: vvvvvvvvvvvv0wwwwwwwwww REGEX: (?<b>(?:vvvvvvvvvvv)?) (?<a>(?:v)?) 0 (?{ print "trying " . Dumper \%+}) (?= (?(?{ $1 })wwwwwwwww|) (?(?{ $2 })ww|) ) OUTPUT: trying $VAR1 = { 'a' => 'v', 'b' => 'vvvvvvvvvvv' }; trying $VAR1 = { 'a' => '', 'b' => 'vvvvvvvvvvv' };
    STRING: vvvvvvvvvvvv0wwwwwwwwww REGEX: (?<a>(?:v)?) (?<b>(?:vvvvvvvvvvv)?) 0 (?{ print "trying " . Dumper \%+}) (?= (?(?{ $1 })ww|) (?(?{ $2 })wwwwwwwww|) ) OUTPUT: trying $VAR1 = { 'a' => 'v', 'b' => 'vvvvvvvvvvv' }; trying $VAR1 = { 'a' => '', 'b' => 'vvvvvvvvvvv' };
    In both cases, its first backtracking change is to change the choice of 'a', no matter whether the 'a' alternative or 'b' alternative is last!

    Can you help demystify me? Somehow you are getting the regex engine to go through possible matches in the v-string not in some canonical ordering of alternatives (an easy local way to backtrack), but in order of their longest combined length (a global property of the set of alternatives)? This is the only interpretation under which the regex would actually be guaranteed to output the correct answer!

    Update: I think I have demystified myself. It has to do with the string being anchored at the central '0':

    • Regex engine backtracks through a ton of choices, trying to match many combinations of v's starting at pos=0, followed immediately by a '0' character. Suppose there are N v's, then if no combination of alternatives sums up to N, then this can't match. So...
    • Regex backtracks tries to match the regex from pos=1, thus searching for a combination of alternatives that sums up to N-1. If that doesn't work,
    • Backtrack to match from pos=2, ...
    Thus, thanks to this anchoring at the '0' character, the regex engine does indeed backtrack through combinations of v's in descending order of their sum. As I mentioned above, this must happen if you have any chance of outputting the optimal solution.

    That is clever!

    My confusion above would have been appropriate for a regex that matched like this:

    ^ (?<a>(?:v)?) (?<b>(?:vvvvvvvvvvv)?) v* 0
    but not the one used in the OP, where the choice of v's must be snug up against the '0' character:
    (?<a>(?:v)?) (?<b>(?:vvvvvvvvvvv)?) 0

    blokhead

      It has to do with the string being anchored at the central '0'.
      Thus, thanks to this anchoring at the '0' character, the regex engine does indeed backtrack through combinations of v's in descending order of their sum.
      Exactly. That "middle anchoring" is one of the main tricks in this regex. I also had to go through your previous to last regex example to find the last while crafting this. :-)
Re: knapsack problem solved by regex
by BrowserUk (Pope) on Mar 14, 2010 at 16:37 UTC

    This in no way detracts from your very nice regex abu-solution :)

    I read the wikipedia link you gave to the 0-1 knapsack problem, and as usual, the algorithmic and complexity descriptions left me cold. They talk about dynamic programming and pseudo-polynomial solutions and give an algorithm which they assess as O(nW) (time & space).

    But, I think the following very straighforward algorithm, which produces the same output as yours in less than a second, is O(n)?

    #! perl -slw use strict; use List::Util qw[ sum ]; my $max_weight = 400; my %items = ( # value, weight: positive integers 'map' => { v => 150, w => 9 }, 'compass' => { v => 35, w => 13 }, 'water' => { v => 200, w => 153 }, 'sandwich' => { v => 160, w => 50 }, 'glucose' => { v => 60, w => 15 }, 'tin' => { v => 45, w => 68 }, 'banana' => { v => 60, w => 27 }, 'apple' => { v => 40, w => 39 }, 'cheese' => { v => 30, w => 23 }, 'beer' => { v => 10, w => 52 }, 'suntan_cream' => { v => 70, w => 11 }, 'camera' => { v => 30, w => 32 }, 't_shirt' => { v => 15, w => 24 }, 'trousers' => { v => 10, w => 48 }, 'umbrella' => { v => 40, w => 73 }, 'waterproof_trousers' => { v => 70, w => 42 }, 'waterproof_overclothes' => { v => 75, w => 43 }, 'note_case' => { v => 80, w => 22 }, 'sunglasses' => { v => 20, w => 7 }, 'towel' => { v => 12, w => 18 }, 'socks' => { v => 50, w => 4 }, 'book' => { v => 10, w => 30 }, ); for my $key ( keys %items ) { my $r = $items{ $key }; $r->{ score } = $r->{ v } / $r->{ w }; } my @orderedKeys = sort{ $items{ $b }{ score } <=> $items{ $a }{ score } } keys %items; my $weight = sum map $_->{ w }, values %items; $weight -= $items{ pop @orderedKeys }{ w } while $weight > $max_weight +; my $value = 0; $value += $items{ $_ }{ v } for @orderedKeys; printf "%22s : %4d %4d\n", $_, @{ $items{ $_ } }{ 'v', 'w' } for @orde +redKeys; printf "%22s : %4d %4d\n", ' ', $value, $weight; __END__ C:\test>knapsack.pl map : 150 9 socks : 50 4 suntan_cream : 70 11 glucose : 60 15 note_case : 80 22 sandwich : 160 50 sunglasses : 20 7 compass : 35 13 banana : 60 27 waterproof_overclothes : 75 43 waterproof_trousers : 70 42 water : 200 153 : 1030 396

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I've just skimmed your code, but if I'm right then your algorithm is lucky to find the correct output for this particular input, but does not give the correct answer in general.

      You're filling the knapsack with the items in the order of their relative value (v/w), so consider the following case when your knapsack is almost full (let's suppose there's still room for w=4):

      nr w v v/w 1x 3 3.1 31/30 # case a 2x 2 2 1 # case b

      Your algorithm will choose case 'a', increasing the full value by 3.1, however the correct choice is to choose case 'b' increasing the full value by 4. That's why you have to backtrack.

        Cool. I knew (guessed) it was too easy, but it worked for several test sets. Thanks for the explanation.

        Actually, scrap the above, because according to the wikipedia page,

        The most common formulation of the problem is the 0-1 knapsack problem, which restricts the number xj of copies of each kind of item to zero or one.

        So you can't have 2 of one item?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: knapsack problem solved by regex
by jwkrahn (Monsignor) on Mar 15, 2010 at 18:19 UTC

    On a hunch I changed:

    for ( keys %items ) {

    To:

    for ( sort { $items{ $b }{ v } <=> $items{ $a }{ v } } keys %items ) {

    Which produced a nice speed-up compared to the default hash order.

    Note that the converse of:

    for ( sort { $items{ $a }{ v } <=> $items{ $b }{ v } } keys %items ) {

    Was a lot slower than the default hash order.

Re: knapsack problem solved by regex
by Dominus (Parson) on Mar 15, 2010 at 18:44 UTC
Re: knapsack problem solved by regex
by Anonymous Monk on Jul 12, 2010 at 10:47 UTC
    I have a Polynomial time algorithm for the Knapsack problem that handles 1 and 2 dimensional data, unbounded, decimal or integer and that Iíve tested on 10,000 values. I have sample data sets on my website(http://homepage.ntlworld.com/walter.barker2/Knapsack%20Problem.htm) in case anyone would like to compare their results. All comments welcome! Regards, Walt

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://828578]
Approved by Corion
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2014-11-28 01:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (191 votes), past polls