A while back I posted a node looking for good algorithm to use when assigning students to class sections. BrowserUk had a lot of good suggetions that got me thinking about all sorts of strategries, but in the end, some comments kvale made got me thinking about just trying a lottery based approach with lots of iterations to get the best score. That first pass then evolved.
What I finally came up with, is (I think) a really fair approach, that if allowed to run long enough (or on a small enough data set) winds up being a brute force algorithm to try giving the students their first pick in every possible permutation of students.
BUT: since trying *EVERY* permutation of all students might take a while in large datasets, it works it's way up to that by picking combinations of C "starter students" who get to go first (all permutations of those N are tried) followed by R random orderings of the remaining students. C is initially 1, and is incrimented once R orderings starting with every permutation of C have been tried. Once it's generated a solution with a score you are happy with, you can kill it. (using r 1 and c numberOfStudents it only tries the brute force approach of every permutation of students)
This kind of approach is similar to an Iteratively Deepening Depth First Search of a graph.
The Code...
#!/usr/local/bin/perl
# $Source: /home/hossman/cvs_archive/code/perl/sectionlottery.pl,v $
# $Id: sectionlottery.pl,v 1.7 2005/01/12 22:35:43 hossman Exp $
#
# see "help" for usage info
#
# Given a section CSV file and a prefrences CSV file (in that order)
# This tries a bunch of random orderings of students and gives each
# one the best choice it can.
#
# As it goes, it keeps track of solutions it's already found and won't
# generate a solution if:
# 1) this solution excludes more students then a previous one
# 2) this solution excludes an identicle list of students
#
# The output files have names indicating how many people have been lef
+t out,
# followed by a random number, followed by a timestamp indicating when
+ it
# generated.
#
# The contents of the CSV files contain 2 sections. The first lists
# all of the sections and how many "free spots" are still left. the s
+econd
# lists each student, their choice they recieved (ie: 1st, 2nd), the s
+ection
# they've been assigned to, and their orriginal prefrences in order.
# their section may be empty if they are unassigned, in which case the
+ir
# choice number will be really really big (to make sorting in excel ea
+sier)
#
# In an attempt to ensure "fairness" this script generates combination
+s of
# students of increasing sizes to be the "start" students who get "fir
+st
# pick". For each combination of starting students, all permutations
# of the starters are considered, and then some number of random
# permutations of the remaining students are tried. The initial start
# set combination size, and the number of random orderings to do using
# each starting permutation are configurable on the command line
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
use List::Util qw(shuffle);
use Math::Combinatorics;
use Text::CSV_XS;
use Data::Dumper;
use Getopt::Std;
use constant {
SECTIONS => 'sections',
PREFS => 'prefs',
ASSIGNED => 'assigned',
};
$Getopt::Std::STANDARD_HELP_VERSION = 1;
sub HELP_MESSAGE {
my $fh = shift;
print $fh
"Usage: $0 [m] [i num] [r num] [c num] s sec.csv p prefs
+.csv\n",
" m treat the csv files as mac files (optional)\
+n",
" i num max num of iterations to try (optional)\n",
" (default to infinite)\n",
" r num num of random attempts per combination (opti
+onal)\n",
" (defaults to sqrt number of students)\n",
" c num combination size to start at (optional)\n",
" (defaults to 1)\n",
" s sec.csv section availability file\n",
" p prefs.csv prefrences file\n";
}
{ # block scope because of stupid "globals"
my %opts;
(getopts('ms:p:i:r:c:', \%opts) and exists $opts{s} and exists $op
+ts{p})
or (HELP_MESSAGE(\*STDOUT) and exit());
my $state = State::parse_files($opts{s}, $opts{p},
(exists $opts{m} ? "\015" : "\n"));
# NOTE: for each number of unassigned, we are writting out solutio
+ns
# even if they don't have as good a score as previous solutions.
# this is so the user has the option of picking a higher scoring
# solution if they like the distrobution better.
my @students = sort $state>unassigned();
my $students = scalar(@students);
my $r = exists $opts{r} ? $opts{r} : int(1 + sqrt $students);
my $min = $students + 1;
# negative indicates run forever
my $count = exists $opts{i} ? $opts{i} : 1;
my %known = ();
print "Number of sections: " . keys(%{$state>{SECTIONS}}) . "\n";
print "Number of students: $students\n";
if (0 < $count) {
print "Doing $count iterations...\n";
} else {
print "Looping over all permutations (or untill a perfect one)
+...\n";
}
for (my $c = exists $opts{c} ? $opts{c} : 1; $c <= $students; $c++
+) {
# gradually incriment c, finding the list of all
# combinations of that size as we go.
# for each permutation of each combination, generate
# some number of random orderings that start with that
# combination/permutation
# write out any solutions we find that are better then
# (or as good as) the ones before in terms of unassigned
#
# stop if we get a perfect one
print STDERR "Trying combinations of size $c\n";
my $comb = Math::Combinatorics>new(count=> $c, data => \@stud
+ents);
while (my @start = $comb>next_combination()) {
my $perm = Math::Combinatorics>new(data => \@start);
while (my @p = $perm>next_permutation()) {
my $pclone = $state>clone();
$pclone>in_order(@p);
foreach (1..$r) {
my $clone = $pclone>clone();
$clone>in_order(shuffle($clone>unassigned()));
my @leftovers = $clone>unassigned();
my $cur = scalar @leftovers;
my $score = $clone>score();
my $md5 = md5_hex(join ' ', $score, sort @leftover
+s);
if ($cur <= $min and ! exists $known{$md5}) {
print "Found new solution with $cur leftout: $
+score\n";
$min = $cur;
$known{$md5}++;
$clone>write_new_file();
}
die "found a perfect solution\n"
if 0 == $min and 0 == $score;
die "exceeded max iterations\n" if 0 == $count;
} # foreach random ordering
} # foreach permutation of our combination
} # foreach combination of size $c
} # foreach $c
}
######################################################################
+##
package State;
# class method
sub new {
# basic empty constructor
my $self = bless { SECTIONS => {}, PREFS => {}, ASSIGNED => {} },
+shift;
return $self;
}
# class method
sub parse_files {
my ($sections, $prefs, $sep) = @_;
# sections is the sections and the number of spots
# prefs is the users and their prefrences in order
# sep is the input file record seperator
#
# returns a State object
local $/ = $sep;
my $s = State>new();
my ($sec_fh, $pref_fh);
open $sec_fh, $sections or die "can't open sections file";
open $pref_fh, $prefs or die "can't open prefs file";
my $csv = Text::CSV_XS>new();
while (<$sec_fh>) {
chomp;
$csv>parse($_)
or die "can't parse sec line: " . $csv>error_input() ." "
+;
my @cols = $csv>fields();
$s>{SECTIONS}>{$cols[0]} = $cols[1];
}
while (<$pref_fh>) {
chomp;
$csv>parse($_)
or die "can't parse prefs line: " . $csv>error_input() ."
+ ";
my @cols = $csv>fields();
my $stu = shift @cols;
$s>{PREFS}>{$stu} = [ grep { '' ne $_} @cols ];
}
return $s;
}
# instance method
sub assign {
my ($self, $student, $section) = @_;
# assign student to section, returns nothing
die "$section isn't a section" unless exists $self>{SECTIONS}>{$
+section};
die "$student isn't a student" unless exists $self>{PREFS}>{$stu
+dent};
die "$section is full" unless 0 < $self>{SECTIONS}>{$section};
warn "$student didn't pick $section"
unless grep { $section eq $_ } @{$self>{PREFS}>{$student}};
warn "$student was already in another section"
if exists $self>{ASSIGNED}>{$student};
$self>{SECTIONS}>{$section};
$self>{ASSIGNED}>{$student} = $section;
}
# instance method
sub best_avail {
my ($self, $student) = @_;
# given a student, return the best available section in their pref
+s
# returns undef if nothing is available
my @prefs = @{$self>{PREFS}>{$student}};
foreach (@prefs) {
if (0 < $self>{SECTIONS}>{$_}) {
return $_;
}
}
return undef;
}
# instance method
sub in_order {
my ($state, @list) = @_;
# given an ordered list of people
# give them their best available choice.
# if @list includes a person who is allready in a section, happily
+ ignore
foreach (@list) {
next if exists $state>{ASSIGNED}>{$_};
my $best = $state>best_avail($_);
$state>assign($_, $best) if defined $best;
}
}
# instance method
sub unassigned {
my $self = shift;
# returns a list of all the people not yet in a section
my @leftovers = ();
foreach (keys %{$self>{PREFS}}) {
push @leftovers, $_ unless exists $self>{ASSIGNED}>{$_};
}
return @leftovers;
}
# instance method
sub score {
my $self = shift;
# returns a numeric score for the current state
#
# when comparing two states with the same number of people unassig
+ned,
# the smaller the score, the better. (scores can't be compared be
+tween
# states with different numbers of unassigned students)
my $score = 0;
foreach my $student (keys %{$self>{ASSIGNED}}) {
my $sec = $self>{ASSIGNED}>{$student};
my @prefs = @{$self>{PREFS}>{$student}};
my ($rank) = grep $prefs[$_] eq $sec, 0..$#prefs;
$score += $rank;
}
return $score;
}
# instance method
sub write {
my ($self, $fh) = @_;
# writes out the current state as a CSV file to $fh
my $csv = Text::CSV_XS>new({ eol => "\n", always_quote => 1 });
# print all of hte sections and how many they have left
foreach (sort keys %{$self>{SECTIONS}}) {
$csv>combine($_, $self>{SECTIONS}>{$_})
or die "can't combine secs line: " . $csv>error_input() .
+" ";
print $fh $csv>string();
}
# print a "blank line" in CSV speak
print $fh qq{""\n};
foreach (keys %{$self>{PREFS}}) {
my ($assign, $rank) = ('',99999);
my @prefs = @{$self>{PREFS}>{$_}};
if (exists $self>{ASSIGNED}>{$_}) {
$assign = $self>{ASSIGNED}>{$_};
($rank) = grep $prefs[$_] eq $assign, 0..$#prefs;
$rank++;
}
$csv>combine($_, $rank, $assign, @prefs)
or die "can't combine assigned line: " . $csv>error_input
+() ." ";
print $fh $csv>string();
}
}
# instance method
sub write_new_file {
my $self = shift;
# writes to a new file with a name based on the number of
# unassigned people, the score, a random number, and a timestamp.
my $name = ($self>unassigned()) . "" . ($self>score()) . '' .
int(rand(10000)) . '' . time() . ".csv";
open my $fh, ">$name";
$self>write($fh);
close $fh;
}
# instance method
sub clone {
my $self = shift;
# returns a copy of itself with independant sections/assigned hash
+es
# (PREFS is shared since it shouldn't be changable)
my $that = State>new();
foreach (keys %{$self>{SECTIONS}}) {
$that>{SECTIONS}>{$_} = $self>{SECTIONS}>{$_};
}
foreach (keys %{$self>{ASSIGNED}}) {
$that>{ASSIGNED}>{$_} = $self>{ASSIGNED}>{$_};
}
foreach (keys %{$self>{PREFS}}) {
$that>{PREFS}>{$_} = [ @{$self>{PREFS}>{$_} } ];
}
#$that>{PREFS} = $self>{PREFS};
return $that;
}
__END__
And an example run...
radix:~/tmp/sectionlottery> cat data/sec.1.csv
sec1,1
sec2,45
sec3,9
sec4,0
sec5,0
"section six",3
section999,10
radix:~/tmp/sectionlottery> cat data/prefs.1.csv
1234,sec1,sec3,sec5
2345,"section six",sec2
5678,sec5,sec4,sec3,sec1,sec2,section999
4,sec1
5,sec1,"section six"
6,sec1
radix:~/tmp/sectionlottery> ~/cvs/working/code/perl/sectionlottery.p
+l s data/sec.1.csv p data/prefs.1.csv
Number of sections: 7
Number of students: 6
Looping over all permutations (or untill a perfect one)...
Trying combinations of size 1
Found new solution with 2 leftout: 3
Found new solution with 1 leftout: 4
Found new solution with 1 leftout: 4
Trying combinations of size 2
Trying combinations of size 3
Trying combinations of size 4
Trying combinations of size 5
Trying combinations of size 6
radix:~/tmp/sectionlottery> ls *.csv
1452011105570785.csv 1486971105570785.csv 2373131105570785.
+csv
radix:~/tmp/sectionlottery> cat 1452011105570785.csv
"sec1","0"
"sec2","45"
"sec3","7"
"sec4","0"
"sec5","0"
"section six","1"
"section999","10"
""
"6","99999","","sec1"
"2345","1","section six","section six","sec2"
"4","1","sec1","sec1"
"5678","3","sec3","sec5","sec4","sec3","sec1","sec2","section999"
"1234","2","sec3","sec1","sec3","sec5"
"5","2","section six","sec1","section six"
Re: Student Class Section Lottery by ff (Hermit) on Jan 19, 2005 at 02:08 UTC 
The "iterative graph" link may state this or you may have already considered this, but I seem to remember from an Operations Research class some 20 years ago something called the Law of Large Numbers: When you have to come up with a very good solution to a problem with a huge solution space, but not necessarily the optimal solution, generating roughly 30 solutions has something like a 90% chance of one of the solutions being among the top 5 or 10% of all possible solutions.
Thus, with your lottery approach and a consistent way of scoring the solution (weighted criteria), the highest scoring solution among 30 that you can generate will just (most likely) BE one of the very best solutions you could possibly generate. Not the optimal solution but an extremely good one. (Even if there are 436,357,219,651 possible solutions. A statistician could provide details. :)
 [reply] 

