http://www.perlmonks.org?node_id=508698

SmugX has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

An interesting-sounding (to me!) problem has landed on my desk.

I have a number of photographs, each of which are either of landscape or portrait orientation. The order of these photos is important, and is for the purposes of the excercise fixed.

I also have a number of potential page layouts (or templates), each of which contain one or more spaces for photos; the spaces are either landscape or portrait in nature. (e.g. layout 1 may contain one "Landscape" ,layout 2 "Portrait then Portrait", layout 3 "Landscape then Portrait then Portrait" etc.)

I need to find a combination of page layouts that holds the photos in their correct order, which each photo fitting into a space with matching orientation.

There are a couple of additional contraints; the number of layouts that can be in the solution is fixed, because I must produce a book containing exactly X layouts. Also, I want to minimise the re-use of layouts where possible, and I definitely don't want to use the same layout consecutively unless absolutely necessary.

My first thought was this sounded like a tree-type problem, so I knocked up the following code using a simple recursive algorithm. My scoring algorithm is a little arbitrary at the moment, but should suffice for testing. (Also of note is my short-cut assumption that the score will never get better as subsequent pages are added.)

The good news is the code appears to work. However, the problem is that as more potential page layouts are added, the time taken increases vastly. I gave the program 40 possible layouts and 40 images, and ... well, let's just say it's still running. :-)

So, I now need to figure out ways to improve the speed of this program. (For the record, whilst a good solution is required, the best-possible solution isn't really necessary.)

Can anyone suggest a better methodology, either by changing my existing code, or perhaps by using one of the many Algorithm:: modules on CPAN, virtually all of which very quickly go right over my head as soon as I start to read the documentation? :-)

Any help will be greatly appreciated.

Many thanks,
Neil.

use strict; my $desired_page_count = 10; # Structure holding all the different page layouts. ('l' = Landsca +pe, 'p' = Portrait) my $pages = [ {'ll' => '01'}, {'pp' => '02'}, {'lp' => '03'}, {'pl' => '04'}, {'lpp' => '05'}, {'pll' => '06'}, {'plp' => '07'}, {'lpl' => '08'}, {'lll' => '09'}, {'ppp' => '10'}, {'ppl' => '11'}, {'llp' => '12'}, {'ppll' => '13'}, {'llpp' => '14'}, ]; # String of orientation of photos to be used, in the correct order +. my $photos = "llppllpplpppllplpplpplpllplpll"; my $best_score = 999999; my $best_solution = []; my $solution = []; do_rest_of_book($photos,$solution); print "\nBest: " . join (',', @$best_solution); print "\nWith score of $best_score\n"; sub do_rest_of_book { my $cur = shift; my $solution = shift; # If there's still more book to process, and we haven't alread +y surpassed the desired page count if ($cur and scalar(@$solution) <= $desired_page_count) { # Consider each page layout foreach my $page_object (@$pages) { my $page_object_page = [keys %$page_object]->[0]; my $page_object_solution = [values %$page_object]->[0] +; # If the book we have left starts with the page layout + we're considering, let's proceed if ($cur =~ /^$page_object_page/) { my $this_solution = $page_object_solution; my $local_solution = [@{$solution}]; push @{$local_solution}, $this_solution; my $rest_of_book = substr($cur,length($page_object +_page)); # If score SO FAR is lower than minimum, let's car +ry on with this book. # (This makes an assumption that the score will ne +ver lower as we add more pages.) if (score_solution($local_solution) < $best_score) + { do_rest_of_book($rest_of_book, $local_solution +); } } } # Otherwise, if there's no more book, we have a solution, of s +orts. } elsif (!($cur)) { if (scalar(@$solution) == $desired_page_count) { my $score = score_solution($solution); if ($score < $best_score) { $best_score = $score; $best_solution = [@$solution]; } print "SOLUTION FOUND! Score " . $score . "\n"; } } return; } sub score_solution { # Solution Scorer. 0 is perfect, higher is worse. my $solution = shift; my $score = 0; my %refs = (); my $lastsol = ""; foreach my $sol (@$solution) { # 20 Points For Using The Same Layout Twice In A Row if ($sol eq $lastsol) { $score += 20; } # 1 Point For Using The Same Layout Previously, 2 Points F +or Twice, 4 Points For Three Times, 8, 16 ... if ($refs{$sol}) { $score += $refs{$sol}^2; } $lastsol = $sol; $refs{$sol} ++; } return $score; }
P.S. This does beg the questions of how can one determine if a selection of layouts could deal with every possible selection of photos, and similarly, what is the smallest possible selection of layouts that are similarly "complete"? I don't need to know the answer, but it is intriguing!

Replies are listed 'Best First'.
Re: Algorithm to fit photos into spaces on pages
by Roy Johnson (Monsignor) on Nov 15, 2005 at 18:49 UTC
    Here's a regex solution. It will find a solution if one exists, and will run reasonably fast, all things considered.
    my $desired_page_count = 10; # Structure holding all the different page layouts. ('l' = Landsca +pe, 'p' = Portrait) my $pages = { ll => '01', pp => '02', lp => '03', pl => '04', lpp => '05', pll => '06', plp => '07', lpl => '08', lll => '09', ppp => '10', ppl => '11', llp => '12', ppll => '13', llpp => '14', }; my $regex = join '|', keys %$pages; print "Regex is $regex\n"; my $photos = "llppllpplpppllplpplpplpllplpll"; my $pages_left = $desired_page_count - 1; while ($photos =~ s/^($regex)(?=(?:$regex){$pages_left}$)//) { print "$1\n"; # Reorder the regex so that the most-recently-used is the last o +ption $regex = join '|', (grep {$_ ne $1} keys %$pages), $1; --$pages_left; } print "Failed to layout the whole set: $photos does not match $rege +x\n" if $photos;

    Caution: Contents may have been coded under pressure.

      Thanks for your reply, Roy.

      This is a really good solution, and I like it a lot. It's also _really_ fast.

      However, I was feeding a couple of "real life" books through it, and curiously, it does seem a little "keen" to use the same layouts consecutively.

      If I feed in a book of "llpllplppllpplplpplpplpllplplp" with $desired_page_count = 13, your regexp routine returns a book featuring layouts "12,01,07,06,02,08,02,03,04,04,03,03,03" - valid, certainly, but the same layouts repeated at the end.

      However, my sloooooow routine returns "01,04,03,05,01,02,08,11,02,03,01,07,03", without repeating layouts.

      I assume (guess alert!) it's because the regexp routine is happy to proceed along a certain route as long as solutions exist - not necessarily "good" (by my definition!) solutions. Also, as you get nearer the end of the book, there are fewer valid layout combinations.

      Does anyone have any suggestions for tweaking this behaviour?

      Thanks again,
      Neil.

        Couldn't think how to do this yesterday, but it came to me this morning. Generate a regex for the whole thing at once, with the insurance that there be no two-in-a-row. The code actually gets simpler and faster, but on some very long random layouts, it does get hung up (meaning it's doing a lot of backtracking and probably not going to find a solution).

        Update: The reason it's prone not to find a solution is that the no-two-in-a-row check is more strict than it needs to be. It will not allow 'pl' to be used if the next two chars are 'pl', even if you have a 'pll' layout available. The fix for this is to use an assertion ((??{ })) to generate each match as "any available pattern except the previous match". The 2nd code block below implements that.

        my $desired_page_count = 10; # Structure holding all the different page layouts. ('l' = Landscape, +'p' = Portrait) my $pages = { ll => '01', pp => '02', lp => '03', pl => '04', lpp => '05', pll => '06', plp => '07', lpl => '08', lll => '09', ppp => '10', ppl => '11', llp => '12', ppll => '13', llpp => '14', }; my $photos = "llppllpplpppllplpplpplpllplpll"; # Make a really long layout -- at much over 100, it tends to hang up $photos .= substr 'lp', rand 2 for 1..100; $desired_page_count = int( length($photos)/ 2.7 ); my $page_regex = join '|', keys %$pages; my $regex = "($page_regex)"; $regex .= "(?!\\$_)($page_regex)" for 1..($desired_page_count - 1); print "\$regex is $regex\n"; if (my @pages = $photos =~ /^$regex$/) { # Added the anchors as per S +mugX's reply print "$pages->{$_}\n" for @pages; } else { print "Could not come up with no-repeat layout in $desired_page_cou +nt pages\n"; }
        use warnings; use strict; my $desired_page_count = 10; # Structure holding all the different page layouts. ('l' = Landscape, +'p' = Portrait) my $pages = { ll => '01', pp => '02', lp => '03', pl => '04', lpp => '05', pll => '06', plp => '07', lpl => '08', lll => '09', ppp => '10', ppl => '11', llp => '12', ppll => '13', llpp => '14', }; # Construct a hash of regexen: each value is all the patterns except f +or the # key. This way, they don't need to be constructed repeatedly during t +he matching my @patterns = keys %$pages; my %all_except = map {my $k=$_; ($k => join '|', grep {$_ ne $k} @patt +erns)} @patterns; my $photos = "llppllpplpppllplpplpplpllplpll"; # Make a really long layout $photos .= substr 'lp', rand 2 for 1..100; $desired_page_count = int( length($photos)/ 2.7 ); my $page_regex = join '|', @patterns; my $regex = "($page_regex)"; $regex .= "((??{\$all_except{\$$_}}))" for 1..($desired_page_count - 1 +); print "\$regex is $regex\n"; use re 'eval'; if (my @pages = $photos =~ /$regex/) { print "$pages->{$_}\n" for @pages; } else { print "Could not come up with no-repeat layout in $desired_page_cou +nt pages\n"; }

        Caution: Contents may have been coded under pressure.
Re: Algorithm to fit photos into spaces on pages
by Fletch (Bishop) on Nov 15, 2005 at 17:54 UTC

    You might see Geometric Optimisation and Perl for a similar problem (though stated more generically than yours). And I want to say this is a kind of "knapsack problem" if you want to do more googling, but I vaguely remember doing that before and getting corrected that it's more of a "fooblesquznitz problem" (FSVO fooblesquznitz which I can't remember at the moment).

Re: Algorithm to fit photos into spaces on pages
by samtregar (Abbot) on Nov 15, 2005 at 17:37 UTC
    I don't have time to really work up a solution but your problem is screaming regex to me. I would look at the possibility of compiling a regex representing all the possible solutions. Then match that regex against the input string to obtain the shortest solution. If a regex isn't flexible enough you might look at a parser module like Parse::Yapp or Parse::RecDescent.

    -sam

Re: Algorithm to fit photos into spaces on pages
by blokhead (Monsignor) on Nov 15, 2005 at 18:52 UTC
    I'm still not sure about an efficient (polynomial-time) algorithm for your original question (minimizingmaximizing the number of distinct layouts used in representing a sequence of photos), but I can address your postscript:
    P.S. This does beg the questions of how can one determine if a selection of layouts could deal with every possible selection of photos,
    I'm in a formal-languages frame of mind right now, and this question is easily answerable using regular languages & automata. If your layouts are, say, {pll,lp,pp}, then the possible photo sequences you can represent are exactly those sequences which match /^(pll|lp|pp)*$/.

    If you want to know whether this allows you to get all possible photo sequences, then you are asking whether /^(pll|lp|pp)*$/ is equivalent to /^[pl]*$/. This is called the "regex universality" problem (does a given regex match all strings?), and it is PSPACE-complete in the general case. But for reasonable sized examples, you can check this by building a DFA from the regex, complementing it, and checking to see if it accepts any strings. I even happen to know of an upcoming Perl project that would make these regular expression & automata operations very easy (which happens to be why I'm in the formal-language frame of mind to begin with) {nudge nudge wink wink}.

    and similarly, what is the smallest possible selection of layouts that are similarly "complete"? I don't need to know the answer, but it is intriguing!
    Well, the smallest is {p,l} of course ;) To make the problem more interesting, if you are given a set of layouts (say, {pll,lp,pp} again), can you make this set smaller without losing expressivity? What you can do is calculate the minimal generating set of /^(pll|lp|pp)*$/. This can also be done through some manipulation of the automata for the associated language.

    Of course, this only address the question of whether or not you can represent a sequence of photos in a given basis of layouts. It does not address the optimality of the layouts according to your criteria (fewestmaximum number of distinct layouts used). This is where it gets a little harder....

    blokhead