hossman has asked for the wisdom of the Perl Monks concerning the following question:
Does anyone know of a nice simple algorithm (Or better yet: a perl implimentation) for picking which students should be in which sections, based on the preferences of the students?
There's lots of discussions about "Matching Algorithms" out there, but most of them either address the "Stable Marriage" problem (a onetoone mapping where the men and women all express prefrences) or the "Hospitals/Residents" problem (a manytoone mapping where the doctors and hospitals all express prefrences)
What I'm looking for, is a manytoone solution where only the students express their prefrences, and the class sections specify a max number of allowed students (which may vary by section). Something like...
%sections = (sec1 => 2, sec2 => 4, sec3 => 4, sec4 => 3, ...);
%students = (studentA => [sec3, sec1, sec2],
studentB => [sec1],
studentC => [sec4, sec3, sec1],
studentD => [sec2, sec3],
studentE => [sec2, sec4, sec3, sec1],
...);
%matches = best_permutation(\%sections, \%students);
suggestions?
Re: [OT] simple algorithm for assigning students to class sections
by kvale (Monsignor) on Nov 30, 2004 at 09:37 UTC

A simple, fair algorithm would be to draw students from the pool, one at a time without replacement. For each student, put them into their first choice section, if it is not full. If the first choice is full, try the second, and so on. If all the choices are full, pick the section with the greatest number of empty spaces.
To do better than this simple lottery scheme, you must first decide what your optimality critera are. Do you want to maximize the number of first choices satisfied? The number of any choice satisfied?
 [reply] 

 [reply] 

 [reply] 

To do better than this simple lottery scheme, you must first decide what your optimality critera are. Do you want to maximize the number of first choices satisfied? The number of any choice satisfied?
See, that's part of my problem. I was asked to try and find a way to "get everyone in a section they can make" (ie: if they can't make it, they didn't list it in their prefrences) without any other critera. Which means a naive lottery like you described would work fine, assuming I just kept doing lots of iterations (using a different initial order of students) and picking the "best" based on which solution results in the fewest number of people left out of a section. But it seems to me that there are still two problems;
 An approach like that would give an advantage to people who only listed one choice, while other people who tried to be flexible would get penalized for the "greater good"
 Even discounting that point, there could be lots of solutions in which the same number of people get matched to a section, so what's a good tie breaker in those cases?
(I think the biggest part of my problem is that I'm not sure what the optimal criteria are ... which is why I'm hoping someone would have pointers to algorithms other people have used in problems like this  where one side of the match only cares about quantity, not quality)
UPDATE:
An idea just occured to me. I could score the solutions based on the sum of the "rank" of the choice each of section each student was assigned to. or to use an example, given the following prefrences...
%students = (studentA => [sec3, sec1, sec2],
studentB => [sec1],
studentC => [sec4, sec3, sec1],
studentD => [sec2, sec3],
studentE => [sec2, sec4, sec3, sec1],
...);
A solution in which studentA was put in sec3 would get 3 points, because he listed 3 prefrences and got his first one. putting studentB in sec1 would only be worth 1 point, because he only listed one choice. Putting studentE in sec1 would be worth 1 point (because it was his last choice) but putting him in sec2 would be worth 4 points because it was his first (of 4) choice. if someone doesn't get placed in a section at all, the solution doesn't earn a point for them.
That seems like it would give priority to finding every one a section, but would be somewhat fair to people who were flexible, right?
The only problem I see right now, is that if there were two solutions that differed only in the order of processing studentB and studentE, and sec2,sec3,sec4 were already full and sec1 only had one spot left, then there would still be a tie between the solutions, even though studentE had been more flexible and (in my mind) deserves the slot more.
Maybe a solution in which someone doesn't get into any section should have the k*n points removed from it's score (where k is some small constant and n is hte number of prefrences the student listed?)
 [reply] [d/l] 
Re: [OT] simple algorithm for assigning students to class sections
by BrowserUk (Pope) on Nov 30, 2004 at 17:20 UTC

A few questions:
 From your example, students only express a preference for those sections they are interested in. They do not rate all sections?
Can it be assumed that they have eual distaste for all unrated sections?
 Can a student be in more than one section?
It radically alters the complexity if they can.
 Do all sections have to have students?
In effect, this question really amounts to: "Do the numbers of places per section, total to the number of students?"
Also: "Less than?" and "More than?"
 What are the orders of magnitude you are dealing with?
'20 students split across 5 sections with 4 places each' is a whole different kettle of fish to '2000 students across 50 courses with 0 to 20 places each'.
One is easily solved by brute force with a 'judgement criteria', the other requires a heuristic.
Examine what is said, not who speaks.
"But you should never overestimate the ingenuity of the sceptics to come up with a counterargument." Myles Allen
"Think for yourself!"  Abigail
"Time is a poor substitute for thought"theorbtwo
"Efficiency is intelligent laziness." David Dunham
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] 

 [reply] 

#! perl slw
use strict;
use List::Util qw[ shuffle max min reduce ];
our $SECTIONS = 15;
our $STUDENTS = 50;
our $MAXSECT = 20;
#srand( 1 );
## Gen some test data
my %sections = map{;
sprintf( "Section_%02d", $_ )
=>
{available => 0 }
} 0 .. $SECTIONS  1;
my @sections = sort keys %sections;
my $n = $STUDENTS;
$sections{ $sections[ rand @sections ] }{ available }++
while $n;
printf "Sections: %d \n\t%s\n", scalar keys %sections, join "\n\t",
map{ join " $_ =>", %{ $sections{ $_ } } } @sections;
my %students = map{
my $prefs = 1+int( rand $SECTIONS );
sprintf( "Student_%02d", $_ )
=>
[ ( shuffle( 0 .. $SECTIONS1 ) )[ 0 .. $prefs1 ] ]
} 0 .. $STUDENTS1;
my @students = sort keys %students;
printf "Students: %d [%s\n]\n", scalar keys %students, join ', ',
map{ "\n\t$_\t[ @{ $students{ $_ } } ]" } @students;
my $maxChoices = max( map{ scalar @{ $students{ $_ } } } @students );
##
for my $choice ( 0 .. $maxChoices ) {
my $byChoice = reduce{
push @{ $a }, []
if defined $a>[ 1 ][ 0 ]
and ( $students{ $students[ $a>[ 1 ][ 0 ] ] }[ $choice ]
+1e99 )
!= ( $students{ $students[ $b ] }[ $choice ]
+1e99 ) ;
push @{ $a>[ 1 ] }, $b;
$a
} [[]], sort{
($students{ $students[ $a ] }[ $choice ]99999)
<=>
($students{ $students[ $b ] }[ $choice ]99999) ## By nth cho
+ice
or
@{ $students{ $students[ $a ] } }
<=>
@{ $students{ $students[ $b ] } } ## or number of
+choices
} 0 .. $#students;
my @allocated;
for my $chose ( @$byChoice ) {
next unless defined $students{ $students[ $chose>[ 1 ] ] }[
+$choice ];
my $section = sprintf "Section_%02d",
$students{ $students[ $chose>[ 1 ] ] }[ $choice ];
# print "Sect:$section; avail: $sections{ $section }{ available
+ }",
"\t[@{[ sort {$a<=>$b} @$chose ]}][@{[ sort {$a<=>$b} @alloc
+ated ]}]";
if( @$chose <= $sections{ $section }{ available } ) {
push @{ $sections{ $section }{ allocated } }, @students[ @
+$chose ];
$sections{ $section }{ available } = @$chose;
push @allocated, @$chose;
# print "Alloc1: \t\t\t[@{[ sort {$a<=>$b} @$chose ]}]",
"[@{[ sort {$a<=>$b} @allocated ]}]";
next;
}
my @lastChoice = grep{
$#{ $students{ $students[ $_ ] } } == $choice
} @$chose;
# print "lastchoice: \t\t[@lastChoice]";
if( @lastChoice
and @lastChoice <= $sections{ $section }{ available }
) {
push @{ $sections{ $section }{ allocated } },
@students[ @lastChoice ];
$sections{ $section }{ available } = @lastChoice;
@{ $chose } = grep{
my $allocated = $_;
!grep{ $_ == $allocated } @lastChoice
} @$chose;
push @allocated, @lastChoice;
# print "Alloc2: \t\t\t[@{[ sort {$a<=>$b} @$chose ]}]",
"[@{[ sort {$a<=>$b} @allocated ]}]";
}
if( @$chose and $sections{ $section }{ available } ) {
my @random = ( shuffle( @$chose ) )
[ 0 .. $sections{$section}{available} 1 ];
push @{ $sections{ $section }{ allocated } }, @students[ @
+random ];
$sections{ $section }{ available } = 0;
@{ $chose } = grep{
my $allocated = $_;
!grep{ $_ == $allocated } @random
} @$chose;
push @allocated, @random;
# print "Alloc3: \t\t\t[@{[ sort {$a<=>$b} @$chose ]}]",
"[@{[ sort {$a<=>$b} @allocated ]}]";
}
}
delete @students[ @allocated ];
@students = grep{ defined } @students;
# print "left: @students";
last unless @students;
}
print "$_($sections{ $_ }{ available }) => ", ref $sections{ $_ }{ all
+ocated }
? "[ @{ $sections{ $_ }{ allocated } } ]"
: "[EMPTY]"
for sort keys %sections;
print "\nUnallocated; [@students]";
Output: [16:10:52.01] P:\test>411129 STUDENTS=50 SECTIONS=15
Sections: 15
available Section_00 =>2
available Section_01 =>3
available Section_02 =>1
available Section_03 =>3
available Section_04 =>9
available Section_05 =>3
available Section_06 =>3
available Section_07 =>5
available Section_08 =>3
available Section_09 =>2
available Section_10 =>6
available Section_11 =>1
available Section_12 =>2
available Section_13 =>5
available Section_14 =>2
Students: 50 [
Student_00 [ 6 14 8 10 1 3 2 13 5 12 4 ],
Student_01 [ 6 0 7 13 4 9 14 1 3 2 8 ],
Student_02 [ 11 10 9 ],
Student_03 [ 13 14 4 2 12 3 8 11 5 7 9 6 1 0 ],
Student_04 [ 11 3 9 ],
Student_05 [ 4 13 0 1 9 11 6 8 7 14 5 12 3 ],
Student_06 [ 5 13 9 10 7 0 ],
Student_07 [ 0 9 11 6 5 12 ],
Student_08 [ 5 ],
Student_09 [ 4 ],
Student_10 [ 3 6 1 0 12 8 13 14 5 10 9 4 2 11 7 ],
Student_11 [ 5 14 ],
Student_12 [ 0 11 2 6 10 8 9 1 7 12 5 14 3 ],
Student_13 [ 4 2 9 14 3 7 12 ],
Student_14 [ 4 3 2 6 13 0 14 12 7 9 10 5 ],
Student_15 [ 2 8 10 13 12 7 3 0 6 ],
Student_16 [ 6 11 1 3 7 2 5 0 12 10 ],
Student_17 [ 6 3 13 10 7 0 5 9 14 1 11 2 ],
Student_18 [ 0 7 11 12 9 1 13 4 3 14 6 8 5 2 10 ],
Student_19 [ 14 9 13 12 3 2 7 1 10 8 4 5 ],
Student_20 [ 0 13 8 3 4 6 11 1 12 10 9 5 7 14 2 ],
Student_21 [ 6 0 ],
Student_22 [ 4 6 12 2 7 1 3 ],
Student_23 [ 0 ],
Student_24 [ 2 9 1 13 5 0 12 3 11 8 4 7 10 6 ],
Student_25 [ 9 6 11 12 ],
Student_26 [ 13 7 8 10 12 11 4 ],
Student_27 [ 9 11 8 10 13 ],
Student_28 [ 14 0 4 8 10 9 13 3 6 2 5 7 12 1 11 ],
Student_29 [ 3 10 0 8 4 ],
Student_30 [ 2 5 14 13 12 1 10 0 7 8 4 9 3 ],
Student_31 [ 3 ],
Student_32 [ 1 14 0 9 ],
Student_33 [ 2 3 8 13 4 12 5 10 7 6 0 14 1 11 ],
Student_34 [ 7 12 9 11 13 10 1 2 14 8 5 6 3 ],
Student_35 [ 6 11 ],
Student_36 [ 1 4 8 11 ],
Student_37 [ 12 13 1 11 0 ],
Student_38 [ 0 7 14 5 13 ],
Student_39 [ 14 6 2 11 0 12 4 3 5 8 13 10 7 9 1 ],
Student_40 [ 13 0 14 4 1 8 10 6 9 11 12 2 5 7 3 ],
Student_41 [ 9 0 13 ],
Student_42 [ 3 2 5 11 8 13 10 14 4 ],
Student_43 [ 1 3 2 13 8 10 4 11 0 ],
Student_44 [ 0 7 ],
Student_45 [ 8 9 12 10 4 6 2 11 13 0 ],
Student_46 [ 8 3 6 0 4 ],
Student_47 [ 7 5 4 9 13 ],
Student_48 [ 8 1 7 14 4 6 11 3 10 ],
Student_49 [ 6 0 5 13 12 8 ]
]
Section_00(0) => [ Student_23 Student_44 ]
Section_01(0) => [ Student_32 Student_36 Student_43 ]
Section_02(0) => [ Student_15 ]
Section_03(0) => [ Student_31 Student_29 Student_10 ]
Section_04(2) => [ Student_09 Student_13 Student_22 Student_14 Student
+_05 Student_39 Student_24 ]
Section_05(0) => [ Student_08 Student_11 Student_06 ]
Section_06(0) => [ Student_17 Student_49 Student_01 ]
Section_07(0) => [ Student_47 Student_34 Student_38 Student_18 Student
+_16 ]
Section_08(0) => [ Student_46 Student_48 Student_45 ]
Section_09(0) => [ Student_27 Student_41 ]
Section_10(2) => [ Student_00 Student_12 Student_42 Student_30 ]
Section_11(0) => [ Student_02 ]
Section_12(0) => [ Student_37 Student_25 ]
Section_13(0) => [ Student_26 Student_03 Student_40 Student_20 Student
+_33 ]
Section_14(0) => [ Student_19 Student_28 ]
Unallocated; [Student_04 Student_07 Student_21 Student_35]
Examine what is said, not who speaks.
"But you should never overestimate the ingenuity of the sceptics to come up with a counterargument." Myles Allen
"Think for yourself!"  Abigail
"Time is a poor substitute for thought"theorbtwo
"Efficiency is intelligent laziness." David Dunham
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side."  tachyon
 [reply] [d/l] [select] 




15X20 = 300 spots, for 750 students. If that is the case, a random lottery is your only option to make it fair.
 [reply] 
Re: [OT] simple algorithm for assigning students to class sections
by mattr (Curate) on Dec 01, 2004 at 15:17 UTC

Hi, two longish comments since I got excited googling. Found what one school uses.
1. I'm not sure it's ethical to give "flexible" people more weight. It is a lottery, and the choices are based on what classes they are physically able to attend. 2nd, 3rd choices are decisions based on the scenario on what if I fail to get my wish, which is different from being flexible.
So I think you should try to get everyone their first choices and see which courses are oversubscribed. Then run a lottery for the busy courses. People who lose the lottery get second choice if possible, and if they have no other choices they get either no course, a random choice, or a list of remaining courses for which they can apply.
Another possibility is you can leave a few seats in busy courses free and decide in a second round who to put into them based on other considerations.
I also think it might be a good idea to try and develop other considerations to reduce the number of people vying for a popular course, such as their year in college, their academic credentials, their major, etc. I am not at all for it but if you want female engineers there's that too.
2. A voice spoke to me and said "simulated annealing". Which I always thought would be interesting to investigate but never had a reason. It is another way to solve these things, and there is a related algorithm called the Metropolis algorithm (google annealing and scheduling). It is based on a slowly cooling metal that crystallizes with minimum energy disparities and tunnels out of local minima, and is useful for traveling salesman type problems. In other words if you can model this in terms of energy and geometrical relationships this might be a good one. But it seems hard to me to see what is going on and hard to ensure human considerations like the above.
Incidentally I was googling (school scheduling algorithm) and then added the term "annealing". Turns out that is what Syracuse University used, I just found out, for their NP hard timetabling problem. (See the PDF) They are mixing in some neurons too, it is pretty serious stuff. They are calling certain classes "high cost" and looking for scheduling conflicts, etc. which is more than what you need. So you will have to model differently or use a subset of what they do. As noted on p. 16 of the pdf, it includes student preferences.
Some good news perhaps is that the GNU Scientific Library has a simulated annealing function def called gsl_siman_solve which might be usable from the Perl module Math::GSL::Sf. There are other perl annealing modules (search cpan) and it turns out the Perl Mongers world map used an Inline::C annealing routine (until a year ago) (in Image::WorldMap to keep labels from overlapping) too! (See bottom for link about the algorithm used now,) Bioperl also uses it. You might like to check out Algorithm::Evolutionary which provides both annealing and roulette wheels. Anyway this looks like a very useful thing to have in Perl for a lot of things so I would like to learn how to use it better.
FWIW I also found an overview of scheduling algorithms, this is an exciting (to me anyway) area that is well researched, and aside from annealing I understand "listbased" algorithms are used to manage limited resources however this was in the context of chip development.
Anyway this may be overkill but I think if you have more restrictions then simulated annealing is a) the only way to stay sane, b) competitive with other heuristics (so they say), and c) going to give you close to the best possible results. Oh and it takes a looong time to run so you will be happy about that part maybe. The pdf mentions a comparison between graph coloring (as a fast heuristic) instead of a rulebased preprocessor, which sounds like what someone mentioned earlier.
Other links:
NIST Algorithm dictionary and entry on Fisher Yates Shuffle used now in Image::WorldMap pure perl replacement for Inline::C (so annealing or a simplification of it)
1 (quotation about annealing)
Nice explanation of Simulated Annealing with simple algorithm, found through NIST
2 (descriptions of algorithms for chips) 3 (more googling) 4 Tabu search in spanish school
 [reply] 
Re: [OT] simple algorithm for assigning students to class sections
by exussum0 (Vicar) on Nov 30, 2004 at 14:19 UTC

Simpelest and most complex running time solution, create a graph of all students and create vertexes of which ones should not be in the same section. Then try and colour the graph as best you can. If you try to colour the graph by descending order of the degree of their vertexes, you'll get a smaller amount of colours. Each colour represents a section. This method doesn't give the uber best results. There are other graph colouring schemes to produce the optimal number.
Update: as for limiting the amount of people with the same class/colour, you can do that progrematically. Once a colour is all used up, use a new one by force.

Then B.I. said, "Hov' remind yourself
nobody built like you, you designed yourself"
 [reply] 
Re: [OT] simple algorithm for assigning students to class sections
by Anonymous Monk on Nov 30, 2004 at 09:29 UTC

 [reply] 

