Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

The greedy change-making problem using regexes

by blokhead (Monsignor)
on Mar 10, 2005 at 00:59 UTC ( #438118=perlmeditation: print w/replies, xml ) Need Help??

Here's another problem related to money systems and making change that I've found very interesting.

In the American system of coinage (1c, 5c, 10c, 25c), we can make change for any amount of money with the fewest number of coins possible by using a greedy algorithm: starting with the largest coin amount, take as many as you can without exceeding the amount, then move to the next largest amount, and so on. In fact, proving the optimality of this algorithm (with these coin amounts) is often a first example or exercise in introductory algorithms classes.

However, the fact that this algorithm is optimal is a property of the coin amounts, not of the algorithm in general. To see a case where the greedy algorithm is suboptimal, consider the slight modification to the American system where we replace the 5c piece with a 6c piece. Now to make change 12c, the greedy algorithm uses three coins (10+1+1), but 12c can be made using only two coins (6+6).

This suggests the obvious algorithmic decision problem:

Given a set of coin denominations, does the greedy change-making algorithm use the fewest possible number of coins on all amounts?
This is known in the algorithms community as the (greedy) change-making problem. We can assume that a 1-unit denomination is included, otherwise we can't represent all amounts. We can also assume that at least 3 denominations are included, otherwise the greedy algorithm is always optimal. Other than these simple cases, this is a lot harder than it might first appear. In fact, this problem as stated above is NP-hard.

About all that is known about the mathematical structure of this problem is that if c1 < c2 < ... < cm are the coin denominations, and the greedy algorithm is not optimal, then there is a counterexample (an amount for which the greedy algorithm gives more coins than are necessary) between (c3+2) and (cm+cm-1-1), inclusive. If you don't believe me, you can read all about it in this paper.

So given a set of coins, all we have to do to see if it is optimal is see if any amount in that range is a counterexample. How on earth do we do that? There are a lot of ways to make change for a given amount, how can we be sure what is the fewest number of coins necessary? As with my previous post about change-making, many kinds of problems on money systems can be formulated quite naturally in terms of unary finite automata, making regexes a good way to check solutions. So I went about building some regex machinery for this problem. What follows is an account of my journey into backtracking and exploitation of the regex engine...

First here's the code. Keep in mind that this is purely regex fun for the sake of regex fun. This solution is very slow, and a solution dealing natively with integers (i.e, manipulating the integer N instead of a string of N characters) will most certainly be faster. However, exploiting the backtracking mechanism allows us to express this in a more logic/rules-based manner. For hard brute-search problems, this is a nice convenience. Additionally, I am well aware that there is still room for improvement in the regex that is generated..

use re 'eval'; use Test::More 'no_plan'; sub greedy_is_optimal { my @coins = @_; my $greedy = join "", map { "( 1{$_} (?{ \$^R+1 }) )* (?!1{$_}) +" } reverse @coins; my $one_coin = join "|", map { "1{$_}" } @coins; for ( $coins[2]+2 .. $coins[-1]+$coins[-2]-1 ) { local $x; (1 x $_) =~ / ^ $greedy (?{ $x = $^R }) x | ^ ( ($one_coin) (?{ $^R+1 }) )* (?(?{ $^R < $x }) $ | x ) /x and return 0; } return 1; } is greedy_is_optimal(1,5,10,25), 1; is greedy_is_optimal(1,6,10,25), 0; is greedy_is_optimal(1,5,7), 0;
Let's look at what the heck is going on with this insane regex. Here's the (reformatted) regex when the coins are (1,5,7). Remember we are trying to match a string of all 1's.
^ ( 1{7} (?{ $^R+1 }) )* (?!1{7}) ( 1{5} (?{ $^R+1 }) )* (?!1{5}) ( 1{1} (?{ $^R+1 }) )* (?!1{1}) (?{ $x = $^R }) x | ^ ( (1{1}|1{5}|1{7}) (?{ $^R+1 }) )* (?(?{ $^R < $x }) $ | x )
Notice that the whole regex is one large alternation. Let's look at the first alternation first, since that's what the regex engine will do: The first three lines accomplish a greedy change-making procedure -- here's how. Ignoring the (?{code}) blocks, we simply have
^ (1{7})* (?!1{7}) (1{5})* (?!1{5}) (1{1})* (?!1{1})
This says, match characters 7 at a time, as many as possible (i.e, until there aren't 7 more characters to be had). Then match them 5 at a time until there aren't 5 left, etc. That's precisely the greedy algorithm. Notice that this pattern can only match (any string of 1s) in one way.

Now, what were the (?{ $^R+1 }) things doing? Well, within a regex, $^R returns the value of the last-executed (?{code}) block. It is also automatically localized properly so that backtracking "rolls back" the values. So a bunch of code blocks that return $^R+1 simply accomplish setting $^R to the number of code-blocks that were executed (along this matching path). Since we have such a (?{code})-block each time we use a coin, when we reach the end of these three lines, the value of $^R must be the number of coins used by the greedy algorithm.

Finally, the fourth line of the regex says

(?{ $x = $^R }) x
This says to set $x to $^R (the number of coins used by the greedy algorithm), then try to match a literal "x" character .. There are no x's in this string, so it must fail. But we did not localize $x, so even when we backtrack past the code-block, the value of $x remains. As we mentioned before, these first lines can only match in one way, so we don't have to worry about getting possibly multiple values for $x. So now, these entire 4 lines must fail (with the side-effect of setting $x), so the regex will try to match the second alternation. It looks like this:
^ ( (1{1}|1{5}|1{7}) (?{ $^R+1 }) )* (?(?{ $^R < $x }) $ | x )
The first line repeatedly chooses a coin (either 1, 5, or 7) and uses the $^R trick to count how many are used (remember that we have backtracked all the way to the beginning to try to match this alternation, so $^R restarts at zero). The second line is a regex conditional. It says if this value of $^R is less than the value we stored in $x, match the end of the string, otherwise try to match a literal "x" character. Matching a literal "x" isn't possible, so we will backtrack and try to match again, with a different combination of "coins."

All in all, if there is a way to match all the characters in the string using less than $x coins, then the conditional $^R < $x can eventually succeed, and we will match the end of the string and the entire regex will succeed. Otherwise, every time we get to the conditional at the end of the input, we can never get $^R < $x and the regex fails. This means that the greedy algorithm for making change gave us the fewest number of coins possible.

Final Thoughts: I think this solution is very slick, but slow. How would you have approached the problem? Also, how would a solution that just used the backtracking of the regex engine without string-matching (i.e, just localized integer manipulation) stack up?


Replies are listed 'Best First'.
Re: The greedy change-making problem using regexes
by japhy (Canon) on Mar 10, 2005 at 01:47 UTC
    I'd approach it inside out. I'd make my regex try to match like so:
    ("1" x $money) =~ /^(1{10}|1{6}|1{5}|1{1}){1,}?$/
    Then I'd add to that regex code blocks that store the "winning" combination. The concept is, it tries to match the string using only one coin. Once that fails, it tries to match it using two coins. Etc. This is probably slow.

    Hrm, this probably won't work easily with Perl's regexes. My first efforts have proved fruitless.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
      Ooh, that is quite elegant indeed. One of my lessons here was "Know in which order thine regex engine tries things" but I completely missed this little optimization. ;) It really matches up temporaly with the spirit of the problem.

      I suppose I could also use (?>pattern) to prevent backtracking and perform the greedy match without the negative lookaheads?

      (1 x $money) =~ /^(?>(1{10})*)(?>(1{6})*)(?>(1{5})*)(?>(1{1})*)$/
      Although I can't say I've ever really used (?>pattern)...

      Anyway, the pain in the butt is counting the number of coins. I like my little (?{$^R+1}) idiom!


      Personally I would avoid the 1{10} stuff. For two reasons, first is that 1{10} is less efficient to match than "1" x 10, and second reason is that the speed of the regex once my trie patch gets applied will be massively faster as without the curlies it will be optimised and with them it wont. :-)


        ...which suggests that the RE engine should convert a trivial match with a trivial curly in to the equivlent $match x $curly atoms. Of course, somehow I suspect that wouldn't really be a help for /x{2047}/.

        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      No need for code blocks, and it works very fast as well (as you'd expect from a greedy algorithm).
      my $dime = '(?:' . ('1' x 10) . ')'; my $halfdozen = '(?:' . ('1' x 6) . ')'; my $nickel = '(?:' . ('1' x 5) . ')'; my $cent = '(?:' . ('1' x 1) . ')'; sub greedy_change { my $change = shift; my ($c10, $c6, $c5, $c1) = ('1' x $change) =~ /^($dime*)($halfdozen*)($nickel*)($cent*)$/; return (length ($c10 || "") / 10, length ($c6 || "") / 6, length ($c5 || "") / 5, length ($c1 || "") / 1); }
        No, that doesn't work: it ends up using three coins (a dime and two pennies) for 12 cents, instead of two six-cent coins.
        Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
        How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: The greedy change-making problem using regexes
by Anonymous Monk on Mar 10, 2005 at 11:06 UTC
    Here are two subs which solve the make change problem using regexes. First one is the greedy algorithm, second one is the best change. Both subs take as first argument the amount to make change for, then a list of denominations. First sub uses 'pure' regexes, second sub uses a code fragment.
    sub greedy_change { my $change = shift; my @coins = sort {$b <=> $a} @_; my $re; $re .= sprintf '((?:%s)*)', '1' x $_ for @coins; if (my @chunks = ('1' x $change) =~ /$re/) { return map {length($chunks[$_])/$coins[$_]} 0 .. $#coins; } else { return; } } sub best_change { my $change = shift; my @coins = sort {$b <=> $a} @_; our $S = $change + 1; our @C = (); my $re; $re .= sprintf '((?:%s)*)', '1' x $_ for @coins; $re .= '$(?{my @c = map {length($$_)/$coins[$_-1]} 1 .. @coins; my $s = 0; $s += $_ for @c; if ($s < $S) {$S = $s; @C = @c}})'; $re .= '(?!)'; use re 'eval'; ('1' x $change) =~ /^$re/; @C; }
Re: The greedy change-making problem using regexes
by tall_man (Parson) on Mar 10, 2005 at 22:22 UTC
    What about implementing David Pearson's algorithm instead? It will find the smallest counterexample in polynomial time O(N^3) when N is the number of coins.

    Determining whether or not a given coin system can be done by the greedy algorithm is not NP-complete. Finding optimum change in a given coin system that does not allow the greedy algorithm is NP-complete.

    It looks quite easy to implement. I will try it when I get a chance.

    Update: Code added below. Update2: Rewrote greedy_change with a simple loop instead of the regular expression one I borrowed from AnonyMonk.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://438118]
Approved by Tanktalus
Front-paged by grinder
[marto]: good morning all, TCIF
[gnosti]: I'm reading from ffmpeg lengths of video clips in H:M:S and needing to add them and format them back to H:M:S for feeding back to ffmpeg
[marioroy]: kcott Thanks
[Corion]: gnosti: I would cheat and use my approach instead of dealing with durations as objects, or a module
[Corion]: You could even do the calculation directly - my $duration = $H * 3600+ $M*60 + $S; strftime '%H:%M:%S', timelocal($duratio n)
[Corion]: (and that strftime call is largely superfluous too, converting back to H:M:S isn't hard either)

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2017-08-18 09:00 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (297 votes). Check out past polls.