laziness, impatience, and hubris PerlMonks

### Looking for help with AI::Genetic and classroom scheduling

by merlyn (Sage)
 on Feb 20, 2005 at 00:24 UTC Need Help??
merlyn has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to help "Captain Neil" out with scheduling the future geekcruises. I figure I can use AI::Genetic with it, but if someone has a better module, I'll use that instead. (If I can get this to work in time, we'll actually use it live for Perl Whirl 05, and I can give a talk about how it worked...)

The problem is that essentially a schedule consists of time slots with various rooms within the slots, and instructors (and hopefully students) to fill the rooms. The obvious constraints apply: rooms can't hold two things at once, and instructors can't be in two places at once. There's also some "better than others" solutions, like having the first and second place class requested by attendees both be available and in different time slots, and possibly some ordering of classes based on "part 1" "part 2" information. And, if I can get past all that, there's actually different classroom sizes that can bias the result as well.

So, where I'm stuck is how to best represent a particular schedule. Should the genes represent the timeslots for each classroom, with the values being the class for that slot? Or should they represent the classes, with the values representing the timeslot-classroom identification? Both of them seem to have advantages for mutation and crossover, but with the cross-constraints, it seems like nearly any crossover is going to generate a schedule that immediately gets stillborn as an impossible deal.

Does anyone have any experience with setting up this sort of thing?

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

update: It was asked in the CB about why I just don't search the entire solution set. I have 2.5 sea days on a typical cruise, which translates into 5 to 8 timeslots, with 2 to 4 classrooms available. Then there's evening slots for the larger events, usually on 3 or 4 nights. This means I have a potential 25 or so time-places to schedule (crossing time against place). Now, add to this that we're drawing from an offered class list of about 40 items or so (many of which will not be offered), and 150 to 300 attendees all giving "first choice" and "second choice" classes. It'd take "a while" to do this all by exhaustive search, I suspect. I'm trying to shortcut that a bit.
• Comment on Looking for help with AI::Genetic and classroom scheduling

Replies are listed 'Best First'.
Re: Looking for help with AI::Genetic and classroom scheduling
by blokhead (Monsignor) on Feb 20, 2005 at 05:22 UTC
Note: I started writing this before you posted your followup w/ code. It seems like you have a decent handle on a lot of this so far...
it seems like nearly any crossover is going to generate a schedule that immediately gets stillborn as an impossible deal.
Then don't make impossible/infeasible schedules "stillborn." Chances are you will start out with a population of impossible course schedules anyway. So if your fitness function just checks whether the schedule is feasible or not (and gives any non-feasible schedule zero fitness), you are just waiting for mutations to take you into the realm of valid schedules. That's even worse than a brute-force solution.

Your fitness function should have a good gradient over the entire search space. Some impossible schedules are more impossible than others. Take off X points for each instructor conflict, Y points for each room conflict, etc. You'll get continually closer to a valid schedule.

As for satisfying the students' course ranking, I would have each student give a satisfaction rating to each schedule. 1 point for each of his/her top 5 choices that he/she can attend (i.e, that are scheduled and don't overlap). Then take the average over all the students and add that to the fitness for the schedule (this will be an expensive fitness function to compute). Obviously weight the big room/teacher conflicts more than the student preferences (which it seems you have already done).

As for representation, my preference would be a function mapping classroom timeslots to courses (i.e, an array: course assignmed to 1st slot, course assignmed to 2nd slot, etc). This way you are always sure that all the classroom slots are filled. Then you just take off points for classes scheduled twice, or teachers with two slots at the same time. Crossover is a snap now, as you can have different population members specialize in a good schedule for the beginning/end of the week and combine them to get a really good full-week schedule.

but if someone has a better module, I'll use that instead.
I mentioned in the CB that I have written Algorithm::Evolve. You can be the judge of whether it's better (I have never looked at AI::Genetic) -- and I'm always open to suggestions. Check out the distribution's examples/ dir.

Your fitness function should have a good gradient over the entire search space. Some impossible schedules are more impossible than others. Take off X points for each instructor conflict, Y points for each room conflict, etc. You'll get continually closer to a valid schedule.
And that's also the "aha!" that I had when I wrote the code I pasted earlier. I was thinking too literally in terms of genetics. I would have considered "impossible" solutions as "stillborn" offspring that should never exist to roam around and mate, so I was stumped trying to figure out how to prevent such an individual from ever ending up in the gene pool.

Once I realized that I could just put them in the "undesirable" category by dinging them for unworkabilities, the GA did the right thing, and kept mixing up the "least undesirable", and within a dozen generations, I had a breeding population that was "in the black" (all hard constraints satisified, and thus now working on optimizing the soft constraints).

Cool.

Well, I learned something. This is always a good thing.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Re: Looking for help with AI::Genetic and classroom scheduling
by merlyn (Sage) on Feb 20, 2005 at 04:29 UTC
Well, so far, this looks promising. After spending a couple of hours just blindly rushing into it, I'm getting some pretty fast convergence on the basic criteria (rooms as occupied as possible, no room overbooked, no teacher overbooked, rudimentary bias) with the following code. I'm generally seeing convergence in about 20 generations, or about 15 seconds of real time.
This is what I get for going a weekend without reading all of perlmonks - I miss out on discussions like this.

Note that another way to deal with the problem of disallowed configurations is to narrow the search space to exclude them by having the individuals in your population be things that generate an acceptable layout, rather than being a layout directly.

For example, instead of a listvector gene sequence based on slots, you could do a rangevector gene based on available courses - these genes would then represent priority values for the courses. In the scoring phase, you'd use the gene values to sort the courses and you'd then fill in the slots one after another, each time using the highest priority acceptable course.

Well that explanation was clear as mud. Let me attempt to code it up and I'll post a reply - the code is probably clearer.

Update: This approach won't work immediately, because of the perl bug described in 433559. There's a way to patch AI::Genetic to work around the perl bug, but I don't think that there's any way to work around it short of tweaking AI::Genetic. (It's a one-line patch, though)

--
@/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/;
map{y/X_/\n /;print}map{pop@$_}@/for@/ [download] This would make a great article for$magazine ;-)
Well, oddly enough, I was already thinking the same thing. Neil says that he'll cut me a deal on future cruises if I solve this problem for him, but then I can turn around and write up the experience as a class to teach on PerlWhirl, and also turn that into an article for which I get paid, and then take that paid-for article and put it on my website for free, and then the author of AI::Genetic can put that as an example in his distro.

Everybody wins.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Okay, I've finally managed to get working the approach I mentioned in my other reply, that of having the genes be precedence vectors for class assignments. However, first this requires a hack to AI::Genetic to work around a perl bug.

The fix is to change sortIndividuals in AI/Genetic.pm to read: (this is tweaking one line)

sub sortIndividuals {
my ($self,$list) = @_;

return [sort {$b->score <=>$a->score} map {$_->score;$_} @$list]; } [download] Update: You also have to change the two sort commands in AI/Genetic/OpSelection.pm if you want this to work reliably for small populations. In both cases, change sort {$b->score <=> $a->score} [download] to sort {$b->score <=> $a->score} map {$_->score;$_} [download] An alternative to patching AI::Genetic internals is to get a version of perl that doesn't suffer from the bug described here: http://www.nntp.perl.org/group/perl.perl5.porters/37481. To my knowledge, such a version of perl does not yet exist, but hey, you are merlyn. Once that's done, you can try my approach. However, to make it easier to understand and compare side-by-side with your original approach, I rewrote your approach so that both ways of coding this had as much identical code as possible. This means that neither approach is implemented as cleanly or as efficiently as it could be - the goal was side-by-side comparison, not efficiency. In fact, the two scripts are now so similar that I'm just going to present yours as I recoded it and then explain how to transform it into mine: Now, to change that into my approach, where the genes represent class priorities, first change the setup code to: my$ga = AI::Genetic->new
(
-fitness => \&my_fitness,
-type => 'rangevector',
-terminate => \&my_terminate
);

$ga->init([map {[0,100]} @CLASSES]); [download] Then, change the make_config sub: sub make_config { my (@genes) = @_; # sort classes in order by genes: use Data::Dumper; my (@order) = sort {$genes[$a] <=>$genes[$b];} (0..$#CLASSES);
my (@sclasses) = @CLASSES[@order];
push(@sclasses, (qw{XXX}) x ($SLOTS *$ROOMS));
my ($config) = []; for my$s (0..$SLOTS-1) { push @$config, [];
for my $r (0..$ROOMS-1) {
my $i=0; while (!is_acceptable($sclasses[$i],$s,$r,$config)) {$i++;} push @{$config->[-1]}, splice(@sclasses,$i,1); } }$config;
}
[download]
And that's it.

The interesting thing is that this priority-driven approach converges almost instantly with the scoring function you supplied - the first or second generation usually has a configuration that scores at 20.5 or 21. (21 is the max. possible here)

I would be interested to know what more advanced scoring functions you come up with later and whether this priority-driven approach ends up being significantly better in practice than your initial approach of having a gene for each room/slot combination.

--
@/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/;
map{y/X_/\n /;print}map{pop@$_}@/for@/ [download] As halley pointed out to me in node 433726, there is in fact a way to avoid the necessity of changing AI::Genetic at all, though it requires a small use of prototyping, which I know you abhor, to make it work. In my version, you just have to write the make_config sub like this: sub make_config { my (@genes) = @_; # sort classes in order by genes: # stupid #!@#!#$! sort bug
my $sortsub = sub ($$) { my ($a,$b) = @_;$genes[$a] <=>$genes[$b]; }; my (@sclasses) = @CLASSES[sort$sortsub (0..$#CLASSES)]; push(@sclasses, (qw{XXX}) x ($SLOTS * $ROOMS)); my ($config) = [];
for my $s (0..$SLOTS-1) {
push @$config, []; for my$r (0..$ROOMS-1) { my$i=0;
while (!is_acceptable($sclasses[$i],$s,$r,$config)) {$i++;}
push @{$config->[-1]}, splice(@sclasses,$i,1);
}
}
$config; } [download] -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
[download]
Re: Looking for help with AI::Genetic and classroom scheduling
by rg0now (Chaplain) on Feb 20, 2005 at 13:37 UTC
After reading through the thread, I think its timely to call your attention that Genetic Algorithm (GA) is not the one and only choice when solving the classroom scheduling problem. While I am not proficient in the field, in my understanding of combinatorial optimization, your problem is a special form of the broader range of assignment problems, and as such, it quite naturally lends itself to be formulated as an integer linear program (ILP). Here you can read a pretty good introduction to the topic with the most important backgrounds.

Now, the only thing that remained to be done is to actually solve the resultant ILP. For this I coded and maintain (unfortunately, out of CPAN) a nice Perl module, Math::GLPK, which wraps the GNU Linear Programming Toolkit (GLPK). Based on my thorough experience with GLPK and ILPs, you have pretty good chance to obtain 100 percent optimal solution in all cases as long as the size of the problem remains of similar range as yours.

It is clear that both approaches have its pros and cons. However, for me at least, the ILP approach proved itself much more natural and concise than any other formulations of hard combinatorial optimization problems...

rg0now

One thing that may surprise most people is that my programming skills are self-taught. I know only enough programming and enough writing and enough math to have solved whatever problem was facing me in the trench that day. Luckily, I learn really fast, but I just have to have the right problem to chew on, and see the proper "next steps" so I don't get completely swirled.

So, those of you with letters after your names have an advantage, because you got some book learnin' that I haven't gotten around to doing yet. That's what I was hoping to leverage from here.

It occurred to me that the problem might be small enough that some sort of more direct (non-genetic) algorithm might do the job, but I knew that this "bear of very little brane" would get a faster result by applying things that I could learn in an afternoon, not a week. {grin}

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

I admit I was not specific enough. To serve as a real help, I would have come up with a concrete methodology to formulate your problem as an ILP, which I missed to do. The problem is that I am not familiar enough with timetabling and currently, I do not have time to do the essential research. Hopefully, some other monks will do it...:-) By all means, I pretty much recommend you to take the time and look into linear programming. You will be surprized, how generic problem solving technique it gives to your disposal and might very well open up a completely new way of thinking about optimization.

The only thing that I could find, which could get you started is this paper, or better to say, Section 2 of the paper... It gives the basic notation (which might look like a bit bloated for the first sight, and yes, for the second too), but once you get the point, it is much simpler than you might have expected.

rg0now

Re: Looking for help with AI::Genetic and classroom scheduling
by toma (Vicar) on Feb 20, 2005 at 08:00 UTC
Here is how I understand the terminology, so that we don't talk past each other. Please let me know if I get it wrong.

A 'gene' has a 'name' and a 'value'. A set of genes values is a 'person'. This person has a single number for 'fitness'. Fitness can be computed from the values of the genes. The end result of evolution process is the most fit person.

In this problem a person represents an entire schedule. The genes are the individual schedule entries. There are stringent restrictions on the data types for the genes. One hard part of using a genetic algorithms is the work needed to map the problem space into the limited data types of the GA package.

The fitness evaluator looks at the person and sees how well it satisfies the student sign-ups and other constraints. Efficienty of the fitness evaluator is important. Any tests that disqualify a particular person can be checked first. In these cases the fitness is zero and the routine can use a short-cut return, without evaluating all the other fitness criteria.

I tried your first suggested solution, where the genes represent the timeslots for each classroom, and the values are the class for that slot. This way of doing it moves the complexity away from the mapping problem into the fitness evaluation function, where I think it is much easier to deal with.

In my trivial example a single instructor sequentially teaches two different classes out of a possible three classes. One class will be cancelled due to 'lack of interest'. The students can sign up for one or two classes and can optionally specify a priority of 1 or 2. The three classes are called 'genetic', 'neural', and 'fuzzy'.

use strict;
use warnings;
use AI::Genetic;

my %signup;
$signup{bob}{genetic} = 2;$signup{ted}{genetic}   = 1;
$signup{alice}{genetic} = 1;$signup{carol}{genetic} = 0;
$signup{joe}{genetic} = 0;$signup{biff}{genetic}  = 0;
$signup{sneezy}{genetic}= 0;$signup{bob}{neural}   = 0;
$signup{ted}{neural} = 0;$signup{alice}{neural} = 1;
$signup{carol}{neural} = 2;$signup{joe}{neural}   = 1;
$signup{biff}{neural} = 1;$signup{sneezy}{neural}= 0;

$signup{bob}{fuzzy} = 0;$signup{ted}{fuzzy}   = 1;
$signup{alice}{fuzzy} = 0;$signup{carol}{fuzzy} = 1;
$signup{joe}{fuzzy} = 0;$signup{biff}{fuzzy}  = 0;
$signup{sneezy}{fuzzy}= 2; my$generations= 0;

sub fitness
{
my ($classes)= @_; my$class1= $classes->[0]; my$class2= $classes->[1]; return 0 if$class1 eq $class2; my$fitness=0;
foreach my $student_name (keys %signup) { if ($signup{$student_name}{$class1} > $signup{$student_name}{$clas +s2}) {$fitness += $signup{$student_name}{$class1}; } else {$fitness += $signup{$student_name}{$class2}; } } return$fitness;
}

my $ga = new AI::Genetic( -fitness => \&fitness, -type => 'listvector', -population => 500, -crossover => 0.9, -mutation => 0.01, -terminate => sub {$generations++;
return 1 if $generations > 100; return 0 }, );$ga->init([
[qw/genetic neural fuzzy/ ],
[qw/genetic neural fuzzy/ ],
]);
$ga->evolve('rouletteTwoPoint', 100); print "Best score = ",$ga->getFittest->score, ".\n";
print "Best genes = ", join(' ',$ga->getFittest->genes), ".\n"; [download] It should work perfectly the first time! - toma Re: Looking for help with AI::Genetic and classroom scheduling by salonmonk (Beadle) on Feb 20, 2005 at 14:55 UTC Yeah I did my dissertation on this subject - using Perl. Without getting into all the intricate details, a really nice representation is to use a 3D array.$array[ROOM_NUMBER][WEEKDAY][TIMESLOT] = CLASS You are susceptable to the label replacement problem (i.e. classes can be booked more than once, or not at all), but that can be easily fixed by a genetic repair operator which just books them again in the next available slot. To give you an example - take the room capacity violation, it's a hard contraint, the pseudo code may look something like this:
Procedure room_capacity_violation

For R in 0 .. totalRooms
For D in 0 .. totalDays
For H in 0 .. totalHours
If (chromosome[R][D][H] != 0)
If (chromosome[R][D][H]->CLASS_SIZE > room[R]->CLASSROOM_SIZE)
+            count := count + 1
End If
End If
End For
End For
End For

Return(count)
End room_capacity_violation
[download]
So you return the count which represents how many times that hard constraint has been violated. This allows you to perform selection based on the best individuals (those who violate the contraints least) - or whatever selection method you use, because sometimes it's good to keep some traits in your population for the sakes of genetic diversity. One benefit that comes from this encoding method is that one contraint is alreadt assured; it is impossible to double book a room. I hope that is of some help (direct or otherwise).
Re: Looking for help with AI::Genetic and classroom scheduling
by nmerriweather (Friar) on Feb 20, 2005 at 06:24 UTC
actual a quesiton, not a comment -- i've seen a few open source applications that do this sort of thing for college and high school class and room schedules over the past few years.

Did you look at the logic they use on any of those existing systems and are trying to improve on that, or are you setting out to develop your own solution/approach from scratch?
I googled a few times for "genetic classroom scheduling", and all I ever found were papers on how others had done it. I did get some inspiration fro reading those papers, but I was still stuck about how to specifically represent my problem. I had no idea that you could get healthy solutions by breeding sick ones though, simply by breeding the least sick. {grin} That was my mental block.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Re: Looking for help with AI::Genetic and classroom scheduling
by Anonymous Monk on Feb 20, 2005 at 02:03 UTC
The problem is that the result is not simply a "valid" solution. I'm looking for a "best" solution. I know enough Prolog to code the search space for a valid solution, even using AI::Prolog (thanks Ovid!). What I need is for a way to move from a valid solution to a "better" solution, based on soft things like attendee preference, natural ordering of materials, and room size, after first matching hard things like a single-purpose room and a single-location instructor.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Interestingly, I was just using AI::Prolog to try and solve this. Unfortunately, my attempted solution required math and I haven't built that in yet. Then I tried to build math logically for just the small problem set, but I hadn't yet defined the "is" primitive. Sigh. Unfortunately, the solutions at hand started getting difficult even when I used SWI-Prolog due to how exhaustive the search space is.

Cheers,
Ovid

New address of my CGI Course.

Re: Looking for help with AI::Genetic and classroom scheduling
by perlfan (Curate) on Feb 21, 2005 at 00:38 UTC
Interesting problem, though I think that no AI or genetic algorith will give you the optimal solution.

It could be done as a linear programming case (as already stated), but it would be easier to just do a brute force type of search - i.e., test all the possible combinations for the optimal schedule. This is not an elegent approach, but it will work, and I doubt you have enough variables to make your pc choke.

If you want just a "good" and not necessarily and the optimal solution, you could evolve something pretty easily.
Re: Looking for help with AI::Genetic and classroom scheduling
by digiryde (Pilgrim) on Feb 22, 2005 at 12:54 UTC

Of course, there is always the other answer for brute force approach - Grid Computing. Just get a bunch of the PMers to put the application on their computers running different parts of the data. Put all the findings together and voila. Answer! (tounge in cheek).

But, seriously, this would be neat to do with perl as well (CPAN module? - If I have the time. lol). I have already done this in a document creation shop. It was a very powerful tool. Creating millions of documents per day, some of them multiple times. By putting the work into a grid, we were able to cut down the end user seen latency from as much as 30 minutes to no more than a minute (except in rare cases where a person was needed). Since the whole thing was in perl, it made many people rethink their views on perl. The solution blew a java based solution away, and it was done so quickly that the C++ solution (scheduled to replace the Java solution) was never finished.

Slashdot poked my funny bone...
Visions of the future of grid computing

-digiryde

Create A New User
Node Status?
node history
Node Type: perlquestion [id://432801]
Approved by Corion
Front-paged by Ovid
help
Chatterbox?
 erix . o O( "what fools the french are, Jeeves" ) [choroba]: Also some Достое́вс кий [Discipulus]: mmh windows understand something like: cd c:\\\\path\\\\to weird..

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2017-05-24 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My favorite model of computation is ...

Results (183 votes). Check out past polls.