CUFP
hossman
<p>
A while back I posted a node looking for good algorithm to use when [id://411129|assigning students to class sections]. [BrowserUk] had a lot of good suggetions that got me [id://412851|thinking about all sorts of strategries], but in the end, [id://411140|some comments kvale made] got me thinking about just trying a lottery based approach with lots of iterations to get the best score. That first pass then evolved.
</p>
<p>
What I finally came up with, is (I think) a really fair approach, that if allowed to run long enough (or on a small enough data set) winds up being a brute force algorithm to try giving the students their first pick in every possible permutation of students.
</p>
<p>
BUT: since trying *EVERY* permutation of all students might take a while in large datasets, it works it's way up to that by picking combinations of C "starter students" who get to go first (all permutations of those N are tried) followed by R random orderings of the remaining students. C is initially 1, and is incrimented once R orderings starting with every permutation of C have been tried. Once it's generated a solution with a score you are happy with, you can kill it. (using -r 1 and -c <i>numberOfStudents</i> it only tries the brute force approach of every permutation of students)
</p>
<p>
This kind of approach is similar to an [http://en.wikipedia.org/wiki/Iterative_deepening_depth-first_search|Iteratively Deepening Depth First Search] of a graph.
</p>
<p>
The Code...
</p>
<readmore>
<code>
#!/usr/local/bin/perl
# $Source: /home/hossman/cvs_archive/code/perl/section-lottery.pl,v $
# $Id: section-lottery.pl,v 1.7 2005/01/12 22:35:43 hossman Exp $
#
# see "--help" for usage info
#
# Given a section CSV file and a prefrences CSV file (in that order)
# This tries a bunch of random orderings of students and gives each
# one the best choice it can.
#
# As it goes, it keeps track of solutions it's already found and won't
# generate a solution if:
# 1) this solution excludes more students then a previous one
# 2) this solution excludes an identicle list of students
#
# The output files have names indicating how many people have been left out,
# followed by a random number, followed by a timestamp indicating when it
# generated.
#
# The contents of the CSV files contain 2 sections. The first lists
# all of the sections and how many "free spots" are still left. the second
# lists each student, their choice they recieved (ie: 1st, 2nd), the section
# they've been assigned to, and their orriginal prefrences in order.
# their section may be empty if they are unassigned, in which case their
# choice number will be really really big (to make sorting in excel easier)
#
# In an attempt to ensure "fairness" this script generates combinations of
# students of increasing sizes to be the "start" students who get "first
# pick". For each combination of starting students, all permutations
# of the starters are considered, and then some number of random
# permutations of the remaining students are tried. The initial start
# set combination size, and the number of random orderings to do using
# each starting permutation are configurable on the command line
use warnings;
use strict;
use Digest::MD5 qw(md5_hex);
use List::Util qw(shuffle);
use Math::Combinatorics;
use Text::CSV_XS;
use Data::Dumper;
use Getopt::Std;
use constant {
SECTIONS => 'sections',
PREFS => 'prefs',
ASSIGNED => 'assigned',
};
$Getopt::Std::STANDARD_HELP_VERSION = 1;
sub HELP_MESSAGE {
my $fh = shift;
print $fh
"Usage: $0 [-m] [-i num] [-r num] [-c num] -s sec.csv -p prefs.csv\n",
" -m treat the csv files as mac files (optional)\n",
" -i num max num of iterations to try (optional)\n",
" (default to infinite)\n",
" -r num num of random attempts per combination (optional)\n",
" (defaults to sqrt number of students)\n",
" -c num combination size to start at (optional)\n",
" (defaults to 1)\n",
" -s sec.csv section availability file\n",
" -p prefs.csv prefrences file\n";
}
{ # block scope because of stupid "globals"
my %opts;
(getopts('ms:p:i:r:c:', \%opts) and exists $opts{s} and exists $opts{p})
or (HELP_MESSAGE(\*STDOUT) and exit());
my $state = State::parse_files($opts{s}, $opts{p},
(exists $opts{m} ? "\015" : "\n"));
# NOTE: for each number of unassigned, we are writting out solutions
# even if they don't have as good a score as previous solutions.
# this is so the user has the option of picking a higher scoring
# solution if they like the distrobution better.
my @students = sort $state->unassigned();
my $students = scalar(@students);
my $r = exists $opts{r} ? $opts{r} : int(1 + sqrt $students);
my $min = $students + 1;
# negative indicates run forever
my $count = exists $opts{i} ? $opts{i} : -1;
my %known = ();
print "Number of sections: " . keys(%{$state->{SECTIONS}}) . "\n";
print "Number of students: $students\n";
if (0 < $count) {
print "Doing $count iterations...\n";
} else {
print "Looping over all permutations (or untill a perfect one)...\n";
}
for (my $c = exists $opts{c} ? $opts{c} : 1; $c <= $students; $c++) {
# gradually incriment c, finding the list of all
# combinations of that size as we go.
# for each permutation of each combination, generate
# some number of random orderings that start with that
# combination/permutation
# write out any solutions we find that are better then
# (or as good as) the ones before in terms of unassigned
#
# stop if we get a perfect one
print STDERR "Trying combinations of size $c\n";
my $comb = Math::Combinatorics->new(count=> $c, data => \@students);
while (my @start = $comb->next_combination()) {
my $perm = Math::Combinatorics->new(data => \@start);
while (my @p = $perm->next_permutation()) {
my $pclone = $state->clone();
$pclone->in_order(@p);
foreach (1..$r) {
my $clone = $pclone->clone();
$clone->in_order(shuffle($clone->unassigned()));
my @leftovers = $clone->unassigned();
my $cur = scalar @leftovers;
my $score = $clone->score();
my $md5 = md5_hex(join ' ', $score, sort @leftovers);
if ($cur <= $min and ! exists $known{$md5}) {
print "Found new solution with $cur leftout: $score\n";
$min = $cur;
$known{$md5}++;
$clone->write_new_file();
}
die "found a perfect solution\n"
if 0 == $min and 0 == $score;
die "exceeded max iterations\n" if 0 == $count--;
} # foreach random ordering
} # foreach permutation of our combination
} # foreach combination of size $c
} # foreach $c
}
########################################################################
package State;
# class method
sub new {
# basic empty constructor
my $self = bless { SECTIONS => {}, PREFS => {}, ASSIGNED => {} }, shift;
return $self;
}
# class method
sub parse_files {
my ($sections, $prefs, $sep) = @_;
# sections is the sections and the number of spots
# prefs is the users and their prefrences in order
# sep is the input file record seperator
#
# returns a State object
local $/ = $sep;
my $s = State->new();
my ($sec_fh, $pref_fh);
open $sec_fh, $sections or die "can't open sections file";
open $pref_fh, $prefs or die "can't open prefs file";
my $csv = Text::CSV_XS->new();
while (<$sec_fh>) {
chomp;
$csv->parse($_)
or die "can't parse sec line: " . $csv->error_input() ." ";
my @cols = $csv->fields();
$s->{SECTIONS}->{$cols[0]} = $cols[1];
}
while (<$pref_fh>) {
chomp;
$csv->parse($_)
or die "can't parse prefs line: " . $csv->error_input() ." ";
my @cols = $csv->fields();
my $stu = shift @cols;
$s->{PREFS}->{$stu} = [ grep { '' ne $_} @cols ];
}
return $s;
}
# instance method
sub assign {
my ($self, $student, $section) = @_;
# assign student to section, returns nothing
die "$section isn't a section" unless exists $self->{SECTIONS}->{$section};
die "$student isn't a student" unless exists $self->{PREFS}->{$student};
die "$section is full" unless 0 < $self->{SECTIONS}->{$section};
warn "$student didn't pick $section"
unless grep { $section eq $_ } @{$self->{PREFS}->{$student}};
warn "$student was already in another section"
if exists $self->{ASSIGNED}->{$student};
$self->{SECTIONS}->{$section}--;
$self->{ASSIGNED}->{$student} = $section;
}
# instance method
sub best_avail {
my ($self, $student) = @_;
# given a student, return the best available section in their prefs
# returns undef if nothing is available
my @prefs = @{$self->{PREFS}->{$student}};
foreach (@prefs) {
if (0 < $self->{SECTIONS}->{$_}) {
return $_;
}
}
return undef;
}
# instance method
sub in_order {
my ($state, @list) = @_;
# given an ordered list of people
# give them their best available choice.
# if @list includes a person who is allready in a section, happily ignore
foreach (@list) {
next if exists $state->{ASSIGNED}->{$_};
my $best = $state->best_avail($_);
$state->assign($_, $best) if defined $best;
}
}
# instance method
sub unassigned {
my $self = shift;
# returns a list of all the people not yet in a section
my @leftovers = ();
foreach (keys %{$self->{PREFS}}) {
push @leftovers, $_ unless exists $self->{ASSIGNED}->{$_};
}
return @leftovers;
}
# instance method
sub score {
my $self = shift;
# returns a numeric score for the current state
#
# when comparing two states with the same number of people unassigned,
# the smaller the score, the better. (scores can't be compared between
# states with different numbers of unassigned students)
my $score = 0;
foreach my $student (keys %{$self->{ASSIGNED}}) {
my $sec = $self->{ASSIGNED}->{$student};
my @prefs = @{$self->{PREFS}->{$student}};
my ($rank) = grep $prefs[$_] eq $sec, 0..$#prefs;
$score += $rank;
}
return $score;
}
# instance method
sub write {
my ($self, $fh) = @_;
# writes out the current state as a CSV file to $fh
my $csv = Text::CSV_XS->new({ eol => "\n", always_quote => 1 });
# print all of hte sections and how many they have left
foreach (sort keys %{$self->{SECTIONS}}) {
$csv->combine($_, $self->{SECTIONS}->{$_})
or die "can't combine secs line: " . $csv->error_input() ." ";
print $fh $csv->string();
}
# print a "blank line" in CSV speak
print $fh qq{""\n};
foreach (keys %{$self->{PREFS}}) {
my ($assign, $rank) = ('',99999);
my @prefs = @{$self->{PREFS}->{$_}};
if (exists $self->{ASSIGNED}->{$_}) {
$assign = $self->{ASSIGNED}->{$_};
($rank) = grep $prefs[$_] eq $assign, 0..$#prefs;
$rank++;
}
$csv->combine($_, $rank, $assign, @prefs)
or die "can't combine assigned line: " . $csv->error_input() ." ";
print $fh $csv->string();
}
}
# instance method
sub write_new_file {
my $self = shift;
# writes to a new file with a name based on the number of
# unassigned people, the score, a random number, and a timestamp.
my $name = ($self->unassigned()) . "-" . ($self->score()) . '-' .
int(rand(10000)) . '-' . time() . ".csv";
open my $fh, ">$name";
$self->write($fh);
close $fh;
}
# instance method
sub clone {
my $self = shift;
# returns a copy of itself with independant sections/assigned hashes
# (PREFS is shared since it shouldn't be changable)
my $that = State->new();
foreach (keys %{$self->{SECTIONS}}) {
$that->{SECTIONS}->{$_} = $self->{SECTIONS}->{$_};
}
foreach (keys %{$self->{ASSIGNED}}) {
$that->{ASSIGNED}->{$_} = $self->{ASSIGNED}->{$_};
}
foreach (keys %{$self->{PREFS}}) {
$that->{PREFS}->{$_} = [ @{$self->{PREFS}->{$_} } ];
}
#$that->{PREFS} = $self->{PREFS};
return $that;
}
__END__
</code>
</readmore>
<p>
And an example run...
</p>
<readmore>
<code>
radix:~/tmp/section-lottery> cat data/sec.1.csv
sec1,1
sec2,45
sec3,9
sec4,0
sec5,0
"section six",3
section999,10
radix:~/tmp/section-lottery> cat data/prefs.1.csv
1234,sec1,sec3,sec5
2345,"section six",sec2
5678,sec5,sec4,sec3,sec1,sec2,section999
4,sec1
5,sec1,"section six"
6,sec1
radix:~/tmp/section-lottery> ~/cvs/working/code/perl/section-lottery.pl -s data/sec.1.csv -p data/prefs.1.csv
Number of sections: 7
Number of students: 6
Looping over all permutations (or untill a perfect one)...
Trying combinations of size 1
Found new solution with 2 leftout: 3
Found new solution with 1 leftout: 4
Found new solution with 1 leftout: 4
Trying combinations of size 2
Trying combinations of size 3
Trying combinations of size 4
Trying combinations of size 5
Trying combinations of size 6
radix:~/tmp/section-lottery> ls *.csv
1-4-5201-1105570785.csv 1-4-8697-1105570785.csv 2-3-7313-1105570785.csv
radix:~/tmp/section-lottery> cat 1-4-5201-1105570785.csv
"sec1","0"
"sec2","45"
"sec3","7"
"sec4","0"
"sec5","0"
"section six","1"
"section999","10"
""
"6","99999","","sec1"
"2345","1","section six","section six","sec2"
"4","1","sec1","sec1"
"5678","3","sec3","sec5","sec4","sec3","sec1","sec2","section999"
"1234","2","sec3","sec1","sec3","sec5"
"5","2","section six","sec1","section six"
</code>
</readmore>