Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

"Divide" challenge

by grizzley (Chaplain)
on Mar 10, 2009 at 10:36 UTC ( [id://749548]=perlquestion: print w/replies, xml ) Need Help??

grizzley has asked for the wisdom of the Perl Monks concerning the following question:

Well, in a half an hour I have to go to some boring 3h meeting and have to have something to think about. If you want, you can think about it, too. And here goes the challenge:

Let's say we have 8 machines, each communicates with every other one. Amount of data, which is exchanged between every two machines is defined (

perl -le "for$x(0..7){for$y($x+1..7){$f[$x][$y]=$f[$y][$x]=int rand 10 +0}} $f[$_][$_]='-' for 0..7; print join' ', @$_ for @f"
):

 - 38 72 79 58 88 59 33
38  - 70 71 27 47 77 14
72 70  - 90 42 63 56 90
79 71 90  - 60 57 21 95
58 27 42 60  - 28 33 52
88 47 63 57 28  - 11 85
59 77 56 21 33 11  - 59
33 14 90 95 52 85 59  -

Now, we have to find such division of these 8 machines into two groups of 4 machines, that exchange of data between these two groups is minimal.

Original problem was, as my friend gave it to me, for 64 machines, and then result: two 32-machines groups had to be divided further to 4 16-machines groups, etc.

Brute force algorithm (C++) had problems even with 8 machines, now, after 6 years it should be little better. But it's not about brute force, is it? As I remember, we did something heuristic, we didn't found optimal solution. But as I have 6 years more experience and a meeting will be long, and nothing better to do, let's find some solution :)

Update 2009-03-12 08:47 GMT: "Divide" challenge app added

Replies are listed 'Best First'.
Re: "Divide" challenge
by moritz (Cardinal) on Mar 10, 2009 at 10:50 UTC
    Brute force algorithm (C++) had problems even with 8 machines
    That surprises me. Unless I'm very much mistaken there are 8!/(4! * 2!) == 840 possibilities, which doesn't look like a terribly high number, and looks very brute-forceable to me. Is there a deeper reason why it should be so hard to brute force?

      Yes, it seems you are right. Maybe it was about dividing 16-machines into two 8-machines groups, which is 16!/(8!*2!) = 259.459.200 possibilities (correct me if I am wrong). This, too, doesn't look like apocalypse, but hey, there is still 64-machines group :)

      Anyway, at the meeting, the speaker was talking so loud I couldn't focus on the problem! :) Only AI::Genetic came to my mind, what I tried yesterday, but this package doesn't work well with 'find best permutation' problems. I found only some C implementations on the web, which must be adopted to perl. When I develop working example, I'll paste it in monastery.

Re: "Divide" challenge
by bellaire (Hermit) on Mar 10, 2009 at 12:21 UTC
    moritz is right, brute forcing this is straightforward, unless we've completely misunderstood the challenge. Here's my code which, with the data you gave, finds the split to be groups 0, 1, 4, 6 and 2, 3, 5, 7. (Machine numbers starting with zero).
    use strict; use warnings; my @costs; push @costs, [split(/\s+/)] for (<DATA>); my @combos; gen_set([0..7],[],4,\@combos); my $min_cost = 999999; my $min_set_num; for (0..$#combos) { my $set_cost = get_cost(@{$combos[$_]}); if ($set_cost < $min_cost) { $min_cost = $set_cost; $min_set_num = $_; } } print "Set number $min_set_num, cost $min_cost: ". join(q{ },@{$combos[$min_set_num]}) . "\n"; sub get_cost { my @list = @_; my %list = map { $_ => 1 } @list; my $cost; for my $li (@list) { for my $oi (grep {not exists $list{$_}} (0..7)) { $cost += $costs[$li]->[$oi]; } } return $cost; } sub gen_set { my ($from, $to, $num, $results) = @_; for my $item (@$from) { my @new_to = @$to; push @new_to, $item; if ($num == 1) { push @$results, \@new_to; } else { my @new_from = grep { $_ != $item } @$from; gen_set(\@new_from,\@new_to,$num-1,$results); } } } __DATA__ - 38 72 79 58 88 59 33 38 - 70 71 27 47 77 14 72 70 - 90 42 63 56 90 79 71 90 - 60 57 21 95 58 27 42 60 - 28 33 52 88 47 63 57 28 - 11 85 59 77 56 21 33 11 - 59 33 14 90 95 52 85 59 -

      Brute forcing is not quite as straight forward as one might think :-)

      The solution above is working a lot harder than is really necessary, even for a brute force solution. If I understand the problem correctly, we are trying to minimize the data exchanged between the groups. Therefore what matters is the sum of data exchanged between the two groups, not the internal composition of each group.

      This lets us significantly reduce the potential solution space. The combination generator above generates every single permutation of 4 elements drawn from 8, not just every single combination. For example, this is the list of combinations produced by gen_set([0..3],[],2, \@combos);. There are 12 items, but if we disregard order only 6 would count.

      <0 1> <0 2> <0 3> <1 0> <1 2> <1 3> <2 0> <2 1> <2 3> <3 0> <3 1> <3 2>

      In actual fact, though we need only half of the combinations, even after ignoring order. This is because our second bag is always the complement of our first bag. Since sums are commutative, trying out the division (bag 1, complement of bag 1) is going to result in the exact same data flow sum as the division (complement of bag 1, bag 1)

      Best, beth

        Thanks, beth, I'll ++ tomorrow when I have more votes. :) This was definitely working harder than necessary to obtain a solution, and no attempt was made at optimization.

        However, even doing things the "dumb" way, the whole operation still runs in the blink of an eye, which detracts from the OP's assertion that this would be a difficult problem to brute force, in principle.
        Hmmm... I wonder why moritz mentioned 8! / (4! * 2!) == 840 possibilities, when we only need to traverse 8! / 2 * (4! * 4!) == 35 possibilities at minimum. Am I missing something?

        Output:
        Lowest cost (803) groups are: $VAR1 = [ 0, 1, 4, 6 ]; $VAR2 = [ 2, 3, 5, 7 ];
Re: "Divide" challenge
by ELISHEVA (Prior) on Mar 10, 2009 at 18:20 UTC

    I decided to check to see if a brute force search scales well, so I've written a little script that generates an arbitrary matrix of N elements, solves it by brute force, and spits out the lapsed time. The conclusion? Even on a modern machine, a brute force solution doesn't scale particularly well after about 20 nodes. Here are the results from the script for 4,8,16,20 and 32 nodes:

    Number of nodes: 4 Number of combinations tried: 3 solution: <0 2> <1 3> dataflow: 176 lapsed time=0 s -------------------------- Number of nodes: 8 Number of combinations tried: 35 solution: <0 2 3 6> <1 4 5 7> dataflow: 460 lapsed time=0.01 s -------------------------- Number of nodes: 16 Number of combinations tried: 6435 solution: <0 5 6 7 9 12 13 15> <1 2 3 4 8 10 11 14> dataflow: 2722 lapsed time=0.860013 s -------------------------- Number of nodes: 20 Number of combinations tried: 92378 solution: <0 1 2 4 5 8 10 11 12 18> <3 6 7 9 13 14 15 16 17 19> dataflow: 3658 lapsed time=30.240456 s -------------------------- Number of nodes: 32 Number of combinations tried: 300540195 solution: <0 3 4 6 9 12 14 18 19 22 23 25 26 27 28 31> <1 2 5 7 8 10 11 13 15 16 17 20 21 24 29 30> dataflow: 10701 lapsed time=116933.583916 s (32.48 hrs)

    To make this a fair comparison, I wrote a combo generator that takes advantage of the structural points (order doesn't matter, we only need to look at half the combos) discussed in my response to Bellaire above. I also eliminated all but one call to grep per combo and removed the recursion.

    The script used to generate this takes two parameters: scriptname node_count show_matrix where node_count is the number of nodes and show_matrix causes the matrix to be printed out if it is set to any true value. The code is

    Best, beth

      Well, it scales like N! which is exponential. Which doesn't scale well at all. It basically means that if your CPU speed doubles, you can increase your problem space by 1 to get an answer in the same time.

      And I don't think you can avoid a brute force approach. This sounds very much like an NP complete problem.

        I don't think you can avoid a brute force approach

        Although I agree that the general problem is likely NP complete, there are probably at least some things one might do,singly or in combination, to pick out more or less likely solutions (and hence improve the likelihood that the first of N solutions contains something close to the optimal solution - i.e. we limit he number of solutions tried to N since we don't have infinite computing resources). I don't have the time to really think these through (or relearn my linear algebra), but someone else might like to pick up the ball on these ideas:

        • There are probably some special groups of data-flow matrices that could be quickly solved analytically by simplifying the dataflow matrix using using eigenvectors and other linear algebra techniques.
        • It might help to calculate the distribution of data flow values and use the distribution to prioritize which combinations of rows and columns should be tried first.

        Best, beth

        Update: in response to JavaFan's "I don't see how that could help" note below, added clarification of reason for picking more or less likely solutions.

        Here's how it "scales". First column is the number of machines. Second column is the number of times you can divide that number of machines in two equally sized groups:
        use Math::BigInt; use 5.010; my $line = "+----+------------------------+"; say $line; printf "| %2d | %22s |\n", 2 * $_, Math::BigInt->new(2 * $_)->bfac / (2 * Math::BigInt->new($_)->bfac +) for 1 .. 16; say $line; __END__ +----+------------------------+ | 2 | 1 | | 4 | 6 | | 6 | 60 | | 8 | 840 | | 10 | 15120 | | 12 | 332640 | | 14 | 8648640 | | 16 | 259459200 | | 18 | 8821612800 | | 20 | 335221286400 | | 22 | 14079294028800 | | 24 | 647647525324800 | | 26 | 32382376266240000 | | 28 | 1748648318376960000 | | 30 | 101421602465863680000 | | 32 | 6288139352883548160000 | +----+------------------------+
      With respect to scaling, I've tried some simple analysis to see what we're dealing with in terms of optimization. Using ELISHEVA's matrix generation code (and her observation that my code was doing twice as much work as necessary), I ran many random iterations asking the question: How much of an improvement is the optimal solution over the average of all possible solutions?

      I then decided to try an utterly simplistic solution algorithm that should scale as N^2 with respect to the number of machines (N), or linearly with respect to the number of data flow paths. That is, I simply pick N^2 combinations completely at random, and select the minimum cost from that set. I observed the improvement of this approach over the average as well. Here's my output for 100 iterations for N from 8 to 14:
      N=8 Over 100 iterations, optimal is better than average by 21.07 percent random is better than average by 20.61 percent N=10 Over 100 iterations, optimal is better than average by 20.38 percent random is better than average by 18.64 percent N=12 Over 100 iterations, optimal is better than average by 19.35 percent random is better than average by 16.82 percent N=14 Over 100 iterations, optimal is better than average by 18.00 percent random is better than average by 15.08 percent
      From this data, there are two immediate assertions that I can make, even though my data set is still too small to be totally conclusive:
      • The improvement of the optimal solution over the average decreases as N increases.
      • The improvement of the random solution decreases as well, at a slightly higher rate than the optimal solution.
      However, if I had to go out on a limb, I think that the randomized solution, because it scales well, might be an acceptable substitute for brute force, especially as N grows larger and the brute force solution becomes prohibitive. At the very least, any heuristic approach would have to be at least as good as the random approach to be a serious candidate.
Re: "Divide" challenge
by Bloodnok (Vicar) on Mar 10, 2009 at 13:15 UTC
    IIRC, this sounds like a problem typically solved using combinatorial analysis (aka graph theory) ... but it's that long ago, I'd have to RTFM to work out how !?

    A user level that continues to overstate my experience :-))

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://749548]
Approved by ELISHEVA
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2024-03-28 10:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found