SmugX has asked for the
wisdom of the Perl Monks concerning the following question:
Hello all.
If I have many people (sales reps, say), and many geographical locations, both represented by grid references (i.e. with a little bit of Pythagoras I can figure out the distance between any two), does anyone know a good way of figuring out the most efficient way of assigning them, resulting in the least overall movement of people?
This sounds to me like an AItype problem, but my AI skills are veeeeeery limited. However, the pseudocode that's popped into my head is something like:
 Assign people at random.
 Calculate overall distance moved.
 Swap two people at random.
 Recalculate overall distance moved.
 If it's improved, keep the change, and repeat swap again. If it's worse, discard the change, and swap again.
 After some vaguelydefined period, decide I can't improve on things and stop.
However, it occurs to me that this sounds like a common problem, and therefore there might be a name for this kind of thing, maybe a much better method than the one I've listed above, and, moreover, hopefully a CPAN module to do the actual work for me. :)
I've had a quick look through CPAN, but it seems I'm a bit stymied if I don't know any real AI theory (which I don't), and I don't know the name of the algorithm I'm searching for (which I certainly don't!)
Any advice will be much appreciated.
Many thanks,
SmugX
Re: Efficient Assignment of Many People To Many Locations? by Random_Walk (Parson) on Feb 25, 2005 at 09:46 UTC 
It sounds a bit like the knapsack problem which is sure to give plenty of search hits. The knapsack is about packing various volume articles efficiently into a set of knapsacks. merlyn was looking at a related problem recently (assigning classrooms, teachers and students to maximise the utility to each student). He was doing it with a genetic algorithm. Going supersearcing, back soon...
OK have a look at this thread Looking for help with AI::Genetic and classroom scheduling. Some of the ideas may be applicable to your problem.
Update
After a bit of googling I think I was confusing the knapsack problem with the Bin Packing problem. Another that may be closer to your problem is the Vehicle Routing problem.
Cheers, R.
Pereant, qui ante nos nostra dixerunt!
 [reply] 

Random_Walk was pretty much correct: this problem lies in the focus of Operations Research and it is called the assignment problem. Here is a nice recipe on how to solve it.
You have n people and m jobs and each person can do only one job and each job can be done by exactly one person. You can also (intuitively or by some other policy) assign a cost c(i,j) of person j doing job i. Now, we make an integer linear program (ILP) out of this problem formulation.
Let x(i,j) denote the event that person j does job i. x(i,j) is a binary variable  it can either be zero or 1. We have the following constraints for x(i,j):
 Exactly one person can do a job, that is, for each job i=1..m we have a distinct constraint: sum_{j=1..n} x(i,j) = 1.
 Each person can do no more than one job, so for each person j=1..n we have yet another constraint: sum_{i=1..m} x(i,j) <= 1.
 And finally, for each (i,j) pair we have to make sure that x(i,j) is either zero or 1.
If you think after this constraint set, you will be surprized to find out that any setting of x(i,j) that satisfies all the requirements above gives rise to a possible assignment of people to jobs, this is why it is called the feasible region. You can also minimize the total cost of the assignment (in terms of c(i,j)) by setting the objective function:
minimize: sum_{i=1..m} sum_{j=1..n} c(i,j) x(i,j)
What you obtained (a set of variables, a set of constraints applied to the variables and an objective function to minimize) constitutes an integer linear program. So, here is again a perfect chance for me to advertise my module, which I coded and maintain (unfortunately, out of CPAN) to solve ILPs: Math::GLPK . Math::GLPK is an objectoriented Perl wrapper module for the GNU Linear Programming Toolkit (GLPK). Just feed the ILP to Math::GLPK as a matrix (it is pretty easy to do and the documentation is extensive) and leave the job of solving it to GLPK...
Good luck with OR!
Update: correct typos
 [reply] 

I'm unfamiliar with ILPs, but this sounds interesting. Thank you, I will take a look into this approach.
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by kvale (Monsignor) on Feb 25, 2005 at 09:58 UTC 
I don't know if this is a problem in AI, but it certainly is a problem in operations research.
Off the top of my head, there are a few approaches you could take. Your stochastic search method seems reasonable and I would add to it that you should run it several times with differing initial conditions. Another approach would be a genetic algorithm with a permutationbased chromosome and crossover scheme (Goldberg talks about these methods in his book on genetic algorithms).
Often times, simple heuristics can get you a pretty good solution in a minimal amount of time. I think a greedy heurisitc would work well here. The idea is to create the N*M matrix of distances between N salespeople and M locations. Then find the smallest element in the matrix. At the smallest element, assign that salesperson to that location and delete the corresponding row and column. Then find the smallest element in the reduced matrix and make another assignment, etc. Continue until all salespeople are assigned.
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by heezy (Monk) on Feb 25, 2005 at 11:00 UTC 
The psuedocode you have listed above is the basic principle of a genetic algorithm (GA) but to really go down the GA route you need to initially constuct MANY random solutions as follows...
 Define a way of encoding your solutions into a data set e.g. Binary, hex, tree...
 Create 20 random candidate solutions using your encoding.
 Measure how well each of these work (the 'fitness')
 Mutate the populations at random.
 Crossover different candidate solutions. (many different methods can be applied here to choose which to cross)
 Measure the new fitness of the new 20 candidate colutions.
 Is this fitness satisfactory? If not go back to step 4 (or stop if you are on the Nth itteration)
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by CountZero (Bishop) on Feb 25, 2005 at 13:32 UTC 
This method does not guarantee that you will get the optimal solution. You may get a good solution but it can be at a local minimum.Also I think that if the set of salesmen and locations is sufficiently large, there is such a large number of solutions to try that your "swap, test and keep if better" method will do no better than picking a truly random combination, unless you can prove that once you have a better result after a swap all following solutions in the tree underneath this swap are always better than all other possibilities without this swap (in other words you avoid the "local minimum" trap).
CountZero "If you have four groups working on a compiler, you'll get a 4pass compiler."  Conway's Law
 [reply] 

 [reply] 

That is a very interesting comment. If you can live with "good enough" rather than "maximum" it is indeed worth trying.Is the number "30" somehow related to the number of elements in the set of all possibilities or is it a value which is valid over a large spread of sets? Perhaps someone more versed in statistics than me can look into the following: given a problem which has 1,000,000 solutions, generate 30 random numbers between 1 and 1,000,000. If the score of the solution is equal to the value chosen, what are the chances that one of these random values is within 10% of the maximum value (1,000,000)?
CountZero "If you have four groups working on a compiler, you'll get a 4pass compiler."  Conway's Law
 [reply] 



Re: Efficient Assignment of Many People To Many Locations? by Roy Johnson (Monsignor) on Feb 25, 2005 at 13:39 UTC 
Partitioning: divide the reps in half based on (say) eastwest location; divide the cities similarly. Continue subdividing the rep and city cells, keeping the same number of reps and cities in corresponding cells, until you have one rep corresponding to one city. Each division should cut across the longer dimension of the cell (or the dimension which has the greatest range of either cities or reps, but corresponding cells should be divided in the same direction).
This is based on Heckbert's "median cut" color quantization method.
Caution: Contents may have been coded under pressure.
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by dragonchild (Archbishop) on Feb 25, 2005 at 13:55 UTC 
I'm not quite sure what you're attempting to optimize. Do you have
 N people in M locations and they need to, instead, be dispersed to X locations? This sounds like the problem with assigning flight crews  there's a lot of literature on the topic. It's somewhat similar to the travelling salesman problem, but with enough constraints to make it merely hard, not NPcomplete.
 N people and M (>2N) locations and you need to assign coverage of said locations to said people? This sounds like a local minimization problem  you identify N groups of locations that are easy to cover, then assign people at random.
In addition, are you attempting to minimize the average distance or the mean distance? What I mean is that if you minimize the average distance, you will have someone who has a much greater distance relative to the average. But, if you minimize the mean distance, everyone will have roughly equal distances, but it's going to be higher for most than it could have been.
I think you need to define your problem better. There's a number of different problems you could be trying to solve and each requires a different algorithm.
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
 [reply] 

Good questions, dragonchild. Thank you for being alert, and reminding me that I can never define my problems as well as I think I can ... :)
To (hopefully) answer:
I have more (or the same number of) locations than people. We (hopefully) wouldn't book more sales calls than we have sales people for a given day. Therefore, I guess this might equate to your flight crew situation, where M>=X.
I think I want to minimise the mean distance. (I guess the sales team won't be happy with one poor guy having to travel the length of the country, even if the overall distance travelled for the whole group is less.)
Presumably this latter item will be defined in my scoring/fitness routine  whether the score is per individual or for the whole group. Incidentally, it did occur to me that my "score" might not simply be the distance travelled (lower being obviously better). Perhaps I could consider returning the distance travelled squared, so that greater distances get penalised even more? (e.g. two reps travelling 50 miles each is better than one travelling 25 miles and one travelling 75.) I get a nagging feeling that this "offthetopofmyhead" solution may be troublesome, however. Perhaps instead I might introduce a large penalty if the distance travelled for one individual exceeds a certain threshold? Any thoughts?
To be honest, I was envisaging trying to get the core routine working, and then "playing" with the scoring routine later. (After, for example, I run it, and find one rep travelling the length of the country. :D)
I hope this explains things a little better.
SmugX
 [reply] 

Now, I think I'm understanding your problem better. The actual problem isn't so much the assignment of sales calls as the scheduling of sales calls. Let me rephrase it, if you don't mind:
 You have a set of resources, each with a starting location.
 You have a set of locations those resources have to cover, with both an expected arrival time and a duration at that location.
 You have a set of estimated travel costs between locations. (This is the hardest part.)
The question is how to optimize the assignment of the arrival times at each location. For example, let's say you have two sales calls in the same building, each expected to take an hour. You can assign them both to the same salesrep and schedule one for 9am and the other for 10.30 am.
Of course, I haven't seen a good definition for the cost of travel. For example, it takes longer to go from Albany to New York vs. Philadelphia to New York, even though Albany is closer. The reason is that there is excellent train service to/from Philly, but only highways to/from Albany. And, if you want to be useful, you need to take into account plane schedules (though you could assume that everyone takes the redeye, but you won't be popular after a few months).
You may also want to estimate the expected ROI for a sales call. For example, if you expect $1000 of business from a call, but the salesrep's time is worth $320 ($40/hr) and it costs $600 to get there, is that salescall worth it? But, I think we're getting too complicated.
Essentially, this is really starting to approach the travelling salesman problem in complexity.
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
 [reply] 

Re: Efficient Assignment of Many People To Many Locations? by itub (Priest) on Feb 25, 2005 at 15:01 UTC 
If you modify "If it's worse, discard the change, and swap again" to sometimes accepting "unfavorable" moves depending on a random number and a parameter that decreases during the simulation (usually called "temperature"), you have the simulated annealing algorithm, which has been applied to this problem. See for example,
 http://en.wikipedia.org/wiki/Simulated_annealing
 S. Kirkpatrick and C. D. Gelatt and M. P. Vecchi, Optimization by Simulated Annealing, Science, Vol 220, Number 4598, pages 671680, 1983.
 V. Cerny, A thermodynamical approach to the travelling salesman problem: an efficient simulation algorithm. Journal of Optimization Theory and Applications, 45:4151, 1985
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by perlfan (Curate) on Feb 25, 2005 at 17:10 UTC 
You'll find the "best" answer only through a bruteforce search of all solutions. This is not an "AI" problem, though you can use some AI techniques to figure it out. At the end of the day, if you don't have too many people/places and need the optimal solution, I'd go with brute force. Using random methods won't help you arrive at a solution any more efficiently. Linear programming lends it self to these types of problems, but you may not want to have to learn it, and it will still need brute force to find the best soln :)...  [reply] 
Re: Efficient Assignment of Many People To Many Locations? by SmugX (Beadle) on Feb 25, 2005 at 17:33 UTC 
Thanks to all the contributors so far. What I seem to be sensing it that my idea wasn't too bad (hooray!), and if I try with different starting conditions, and occasionally allow "unfavourable" moves (as described by itub), I might be close to the right track. (Although see my response to dragonchild above, where I ponder whether simple distance travelled might be a good enough scoring mechanism for me.)
In my simplistic mind, the genetic algorithm solution sounds like a more complex method of roughly the same approach. (Although I need to investigate this further, really.)
I haven't really got my head around the suggested ILP approach, to be frank, but I can look into this further.
For the record, I'm talking of hundreds of sales reps in hundreds of locations, so I don't think the brute force method won't work with the computing power I have available.
Also for the record, I'm not required to find the optimum solution to a situation  merely a good one!
Thanks again, all,
SmugX
P.S. Incidentally, I'm by no means a regular on perlmonks  certainly not as much as I'd like to be  but I'm always so impressed by everyone's knowledge, helpfulness and courtesy. You're all great. Have a drink on me.
P.P.S. I'm just kidding about the drink.
 [reply] 

In my simplistic mind, the genetic algorithm solution sounds like a more complex method of roughly the same approach. (Although I need to investigate this further, really.)
It is not the same approach at all. The genetic algorthm tests combination for fitness, or in this case how much each schedule "costs"  obviously the less it costs, the better. Once you determine the "fittest" N solutions, you combine them in hopes they will retain the best traits and eliminate the bad traits in the "offspring". Check out blokhead's page  he has some good links to theyse types of approaches. In a nut shell, you are reducing your search space to find the local optima  you are not throwing darts.
For the record, I'm talking of hundreds of sales reps in hundreds of locations, so I don't think the brute force method won't work with the computing power I have available.
I guess that depends on how many are actually changing locations each time. If only a small percentage change locations, then this reduces your problem drastically.
Good luck with things, and if you end up using a genetic algorithm or neural net, let me know! :)
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by BrowserUk (Pope) on Feb 26, 2005 at 10:20 UTC 
 [reply] 
Re: Efficient Assignment of Many People To Many Locations? by johndageek (Hermit) on Feb 28, 2005 at 21:49 UTC 
Just a left handed approach.
I would select a point in a third dimension over the center of my sales rep grid, calculate the distance to each sales rep and store the results in an array as a cross reference.
e.g.
dist from 3rd dim  map coordinates
20  0,0
15  1,2
sort this array by distance.
Now any destination you choose on the grid can be have it's distance calculated from the 3rd dimension point. A binary search of the sorted cross reference for distances equal to, or just above and below your destination distance should give you your top choices.
This assumes of course you have enough data to make the set up worth the effort and that the sales reps are based from the same place repeatedly.
HTH!
 [reply] 

