A while back I posted a node looking for good algorithm to use when assigning students to class sections. BrowserUk had a lot of good suggetions that got me thinking about all sorts of strategries, but in the end, some comments kvale made got me thinking about just trying a lottery based approach with lots of iterations to get the best score. That first pass then evolved.

What I finally came up with, is (I think) a really fair approach, that if allowed to run long enough (or on a small enough data set) winds up being a brute force algorithm to try giving the students their first pick in every possible permutation of students.

BUT: since trying *EVERY* permutation of all students might take a while in large datasets, it works it's way up to that by picking combinations of C "starter students" who get to go first (all permutations of those N are tried) followed by R random orderings of the remaining students. C is initially 1, and is incrimented once R orderings starting with every permutation of C have been tried. Once it's generated a solution with a score you are happy with, you can kill it. (using -r 1 and -c *numberOfStudents* it only tries the brute force approach of every permutation of students)

This kind of approach is similar to an Iteratively Deepening Depth First Search of a graph.

The Code...

#!/usr/local/bin/perl # $Source: /home/hossman/cvs_archive/code/perl/section-lottery.pl,v $ # $Id: section-lottery.pl,v 1.7 2005/01/12 22:35:43 hossman Exp $ # # see "--help" for usage info # # Given a section CSV file and a prefrences CSV file (in that order) # This tries a bunch of random orderings of students and gives each # one the best choice it can. # # As it goes, it keeps track of solutions it's already found and won't # generate a solution if: # 1) this solution excludes more students then a previous one # 2) this solution excludes an identicle list of students # # The output files have names indicating how many people have been lef +t out, # followed by a random number, followed by a timestamp indicating when + it # generated. # # The contents of the CSV files contain 2 sections. The first lists # all of the sections and how many "free spots" are still left. the s +econd # lists each student, their choice they recieved (ie: 1st, 2nd), the s +ection # they've been assigned to, and their orriginal prefrences in order. # their section may be empty if they are unassigned, in which case the +ir # choice number will be really really big (to make sorting in excel ea +sier) # # In an attempt to ensure "fairness" this script generates combination +s of # students of increasing sizes to be the "start" students who get "fir +st # pick". For each combination of starting students, all permutations # of the starters are considered, and then some number of random # permutations of the remaining students are tried. The initial start # set combination size, and the number of random orderings to do using # each starting permutation are configurable on the command line use warnings; use strict; use Digest::MD5 qw(md5_hex); use List::Util qw(shuffle); use Math::Combinatorics; use Text::CSV_XS; use Data::Dumper; use Getopt::Std; use constant { SECTIONS => 'sections', PREFS => 'prefs', ASSIGNED => 'assigned', }; $Getopt::Std::STANDARD_HELP_VERSION = 1; sub HELP_MESSAGE { my $fh = shift; print $fh "Usage: $0 [-m] [-i num] [-r num] [-c num] -s sec.csv -p prefs +.csv\n", " -m treat the csv files as mac files (optional)\ +n", " -i num max num of iterations to try (optional)\n", " (default to infinite)\n", " -r num num of random attempts per combination (opti +onal)\n", " (defaults to sqrt number of students)\n", " -c num combination size to start at (optional)\n", " (defaults to 1)\n", " -s sec.csv section availability file\n", " -p prefs.csv prefrences file\n"; } { # block scope because of stupid "globals" my %opts; (getopts('ms:p:i:r:c:', \%opts) and exists $opts{s} and exists $op +ts{p}) or (HELP_MESSAGE(\*STDOUT) and exit()); my $state = State::parse_files($opts{s}, $opts{p}, (exists $opts{m} ? "\015" : "\n")); # NOTE: for each number of unassigned, we are writting out solutio +ns # even if they don't have as good a score as previous solutions. # this is so the user has the option of picking a higher scoring # solution if they like the distrobution better. my @students = sort $state->unassigned(); my $students = scalar(@students); my $r = exists $opts{r} ? $opts{r} : int(1 + sqrt $students); my $min = $students + 1; # negative indicates run forever my $count = exists $opts{i} ? $opts{i} : -1; my %known = (); print "Number of sections: " . keys(%{$state->{SECTIONS}}) . "\n"; print "Number of students: $students\n"; if (0 < $count) { print "Doing $count iterations...\n"; } else { print "Looping over all permutations (or untill a perfect one) +...\n"; } for (my $c = exists $opts{c} ? $opts{c} : 1; $c <= $students; $c++ +) { # gradually incriment c, finding the list of all # combinations of that size as we go. # for each permutation of each combination, generate # some number of random orderings that start with that # combination/permutation # write out any solutions we find that are better then # (or as good as) the ones before in terms of unassigned # # stop if we get a perfect one print STDERR "Trying combinations of size $c\n"; my $comb = Math::Combinatorics->new(count=> $c, data => \@stud +ents); while (my @start = $comb->next_combination()) { my $perm = Math::Combinatorics->new(data => \@start); while (my @p = $perm->next_permutation()) { my $pclone = $state->clone(); $pclone->in_order(@p); foreach (1..$r) { my $clone = $pclone->clone(); $clone->in_order(shuffle($clone->unassigned())); my @leftovers = $clone->unassigned(); my $cur = scalar @leftovers; my $score = $clone->score(); my $md5 = md5_hex(join ' ', $score, sort @leftover +s); if ($cur <= $min and ! exists $known{$md5}) { print "Found new solution with $cur leftout: $ +score\n"; $min = $cur; $known{$md5}++; $clone->write_new_file(); } die "found a perfect solution\n" if 0 == $min and 0 == $score; die "exceeded max iterations\n" if 0 == $count--; } # foreach random ordering } # foreach permutation of our combination } # foreach combination of size $c } # foreach $c } ###################################################################### +## package State; # class method sub new { # basic empty constructor my $self = bless { SECTIONS => {}, PREFS => {}, ASSIGNED => {} }, +shift; return $self; } # class method sub parse_files { my ($sections, $prefs, $sep) = @_; # sections is the sections and the number of spots # prefs is the users and their prefrences in order # sep is the input file record seperator # # returns a State object local $/ = $sep; my $s = State->new(); my ($sec_fh, $pref_fh); open $sec_fh, $sections or die "can't open sections file"; open $pref_fh, $prefs or die "can't open prefs file"; my $csv = Text::CSV_XS->new(); while (<$sec_fh>) { chomp; $csv->parse($_) or die "can't parse sec line: " . $csv->error_input() ." " +; my @cols = $csv->fields(); $s->{SECTIONS}->{$cols[0]} = $cols[1]; } while (<$pref_fh>) { chomp; $csv->parse($_) or die "can't parse prefs line: " . $csv->error_input() ." + "; my @cols = $csv->fields(); my $stu = shift @cols; $s->{PREFS}->{$stu} = [ grep { '' ne $_} @cols ]; } return $s; } # instance method sub assign { my ($self, $student, $section) = @_; # assign student to section, returns nothing die "$section isn't a section" unless exists $self->{SECTIONS}->{$ +section}; die "$student isn't a student" unless exists $self->{PREFS}->{$stu +dent}; die "$section is full" unless 0 < $self->{SECTIONS}->{$section}; warn "$student didn't pick $section" unless grep { $section eq $_ } @{$self->{PREFS}->{$student}}; warn "$student was already in another section" if exists $self->{ASSIGNED}->{$student}; $self->{SECTIONS}->{$section}--; $self->{ASSIGNED}->{$student} = $section; } # instance method sub best_avail { my ($self, $student) = @_; # given a student, return the best available section in their pref +s # returns undef if nothing is available my @prefs = @{$self->{PREFS}->{$student}}; foreach (@prefs) { if (0 < $self->{SECTIONS}->{$_}) { return $_; } } return undef; } # instance method sub in_order { my ($state, @list) = @_; # given an ordered list of people # give them their best available choice. # if @list includes a person who is allready in a section, happily + ignore foreach (@list) { next if exists $state->{ASSIGNED}->{$_}; my $best = $state->best_avail($_); $state->assign($_, $best) if defined $best; } } # instance method sub unassigned { my $self = shift; # returns a list of all the people not yet in a section my @leftovers = (); foreach (keys %{$self->{PREFS}}) { push @leftovers, $_ unless exists $self->{ASSIGNED}->{$_}; } return @leftovers; } # instance method sub score { my $self = shift; # returns a numeric score for the current state # # when comparing two states with the same number of people unassig +ned, # the smaller the score, the better. (scores can't be compared be +tween # states with different numbers of unassigned students) my $score = 0; foreach my $student (keys %{$self->{ASSIGNED}}) { my $sec = $self->{ASSIGNED}->{$student}; my @prefs = @{$self->{PREFS}->{$student}}; my ($rank) = grep $prefs[$_] eq $sec, 0..$#prefs; $score += $rank; } return $score; } # instance method sub write { my ($self, $fh) = @_; # writes out the current state as a CSV file to $fh my $csv = Text::CSV_XS->new({ eol => "\n", always_quote => 1 }); # print all of hte sections and how many they have left foreach (sort keys %{$self->{SECTIONS}}) { $csv->combine($_, $self->{SECTIONS}->{$_}) or die "can't combine secs line: " . $csv->error_input() . +" "; print $fh $csv->string(); } # print a "blank line" in CSV speak print $fh qq{""\n}; foreach (keys %{$self->{PREFS}}) { my ($assign, $rank) = ('',99999); my @prefs = @{$self->{PREFS}->{$_}}; if (exists $self->{ASSIGNED}->{$_}) { $assign = $self->{ASSIGNED}->{$_}; ($rank) = grep $prefs[$_] eq $assign, 0..$#prefs; $rank++; } $csv->combine($_, $rank, $assign, @prefs) or die "can't combine assigned line: " . $csv->error_input +() ." "; print $fh $csv->string(); } } # instance method sub write_new_file { my $self = shift; # writes to a new file with a name based on the number of # unassigned people, the score, a random number, and a timestamp. my $name = ($self->unassigned()) . "-" . ($self->score()) . '-' . int(rand(10000)) . '-' . time() . ".csv"; open my $fh, ">$name"; $self->write($fh); close $fh; } # instance method sub clone { my $self = shift; # returns a copy of itself with independant sections/assigned hash +es # (PREFS is shared since it shouldn't be changable) my $that = State->new(); foreach (keys %{$self->{SECTIONS}}) { $that->{SECTIONS}->{$_} = $self->{SECTIONS}->{$_}; } foreach (keys %{$self->{ASSIGNED}}) { $that->{ASSIGNED}->{$_} = $self->{ASSIGNED}->{$_}; } foreach (keys %{$self->{PREFS}}) { $that->{PREFS}->{$_} = [ @{$self->{PREFS}->{$_} } ]; } #$that->{PREFS} = $self->{PREFS}; return $that; } __END__

And an example run...

radix:~/tmp/section-lottery> cat data/sec.1.csv sec1,1 sec2,45 sec3,9 sec4,0 sec5,0 "section six",3 section999,10 radix:~/tmp/section-lottery> cat data/prefs.1.csv 1234,sec1,sec3,sec5 2345,"section six",sec2 5678,sec5,sec4,sec3,sec1,sec2,section999 4,sec1 5,sec1,"section six" 6,sec1 radix:~/tmp/section-lottery> ~/cvs/working/code/perl/section-lottery.p +l -s data/sec.1.csv -p data/prefs.1.csv Number of sections: 7 Number of students: 6 Looping over all permutations (or untill a perfect one)... Trying combinations of size 1 Found new solution with 2 leftout: 3 Found new solution with 1 leftout: 4 Found new solution with 1 leftout: 4 Trying combinations of size 2 Trying combinations of size 3 Trying combinations of size 4 Trying combinations of size 5 Trying combinations of size 6 radix:~/tmp/section-lottery> ls *.csv 1-4-5201-1105570785.csv 1-4-8697-1105570785.csv 2-3-7313-1105570785. +csv radix:~/tmp/section-lottery> cat 1-4-5201-1105570785.csv "sec1","0" "sec2","45" "sec3","7" "sec4","0" "sec5","0" "section six","1" "section999","10" "" "6","99999","","sec1" "2345","1","section six","section six","sec2" "4","1","sec1","sec1" "5678","3","sec3","sec5","sec4","sec3","sec1","sec2","section999" "1234","2","sec3","sec1","sec3","sec5" "5","2","section six","sec1","section six"