### Sports Conference Rankings, Colley Matrix Style

by Zaxo (Archbishop)
 on Oct 25, 2006 at 04:49 UTC ( #580484=CUFP: print w/replies, xml ) Need Help??

It's (US) football season now, and arguments over Strength of Schedule and the iniquity of zebras are heard across the land.

This is Matthew Colley's elegant method of calculating a probability-like ranking from the results of contests. I won't go into the mathematical details or properties of the method - there is a paper at Colley's site which gives that.

Colley uses this method to rank all Division-1A teams as part of the "computer" segment of the all-important BCS ratings. It is distinguished by its simplicity and lack of mystery tweaks.

The magnificent PDL module is ideal for carrying out these calculations. Here, I've applied them to intra-conference games only, to get a current conference ranking. The data is hard-coded in this simple version, with enough information in comments to let you replace it with your own favorite conference's results. It is more flexible and convenient to get the data by a database query or by web scraping.

```#!/usr/bin/perl    # -*-EPerl-*-

use PDL;

my @becteams = (
'Pittsburgh',
'Louisville',
'Rutgers',
'West Virginia',
'South Florida',
'Connecticut',
'Syracuse',
'Cincinnati'
);

# \$C is the Colley Matrix. It depends only on the schedule
# of games already played. Rows and columns are indexed in
# the same order, by teams. The diagonal elements are the
# number of games played plus two. Off-diagonals are zero
# for no game yet played for the indexed teams, or minus one
# for a game played. It contains nothing about the result of
# the games. It's obviously a symmetric matrix.
my \$C = pdl([
# UP UL RU WV SF CT SU UC
[ 5, 0,-1, 0, 0, 0,-1,-1], # Pittsburgh
[ 0, 4, 0, 0, 0, 0,-1,-1], # Louisville
[-1, 0, 4, 0,-1, 0, 0, 0], # Rutgers
[ 0, 0, 0, 4, 0,-1,-1, 0], # West Virginia
[ 0, 0,-1, 0, 5,-1, 0,-1], # South Florida
[ 0, 0, 0,-1,-1, 4, 0, 0], # Connecticut
[-1,-1, 0,-1, 0, 0, 5, 0], # Syracuse
[-1,-1, 0, 0,-1, 0, 0, 5]  # Cincinnati
]);

# \$wl is a column vector containing win and loss information.
# For each team in the same order as \$C is indexed, the value
# is numerically 1 + (wins - losses)/2.
my \$wl = pdl([[ 3/2],[ 2 ],[ 2 ],[ 2 ],[ 1/2],[ 0 ],[-1/2],[ 1/2]]);
#              Pitt   UL    Rut   WVU    USF  UConn   SU    Cincy

my \$c = \$C->inv;
my \$r = \$c x \$wl;

my %rating;
@rating{@becteams} = list \$r;

{
my \$ct = 1;
for (sort {\$rating{\$b}<=>\$rating{\$a}} keys %rating) {
my \$out = pack 'A4 A20 A6', \$ct++, \$_, sprintf '%5.4f', \$ratin
+g{\$_};
print \$out, \$/;
}
}

__END__
1   Rutgers             0.7443
2   Louisville          0.6779
3   West Virginia       0.6339
4   Pittsburgh          0.5912
5   Cincinnati          0.4310
6   South Florida       0.3861
7   Syracuse            0.2806
8   Connecticut         0.2550

Congratulations to Rutgers, their higher rating for the same record as Louisville and West Virginia comes from having beaten tougher teams so far. In the Big East everybody plays everybody, so that advantage will level off by the end of the season. A conference which is too large to allow all-pairs play admits more interesting use of this method.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re: Sports Conference Rankings, Colley Matrix Style
by zentara (Archbishop) on Oct 25, 2006 at 12:49 UTC
I remember reading an article in Sport's Illustrated, about a statistician who applied this concept to horse racing. He meticulously compiled statistics on every horse and every race.... things like air temp, time of day, dry, muddy, etc. He could then predict with better than 50% accuracy, the results of horse races. He was in Las Vegas, living off of his bets. :-)

I'm not really a human, but I play one on earth. Cogito ergo sum a bum

Create A New User
Node Status?
node history
Node Type: CUFP [id://580484]
Approved by chargrill
help
Chatterbox?
 [karlgoethebier]: i guess i've been ikegamid, he

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2017-11-23 17:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
In order to be able to say "I know Perl", you must have:

Results (336 votes). Check out past polls.

Notices?