Sunday, August 22, 2010

(Yet Another) College Football Ranker

I love arguing about College Football, but I believe that we need a method for determining which teams actually deserve to be ranked higher than others. The two "best" teams get to play for the title at the end of the year, and we need an objective way to find those two best teams from the results of the regular season.

Thus, an algorithm. Instead of having humans bring all their bias to the table to vote on who deserves this chance, let's have the process set beforehand that is independent of personal favorites and hatred.

As it turns out, there are TONS of these algorithms. I am not familiar with many of these, but I can see that many of them do not fit the specifications I have.

Specifications? What are these?

- No human element. That means no using preseason rankings or the rankings of any other poll system. The downside to the lack of preseason seeds is the lack of a strong poll after the first few weeks. Until lots of games are played, there is little

- Ignore divisions/associations. Schools often get to choose which division they're in. There should be no bias placed on a team based on where they exist in the politics of athletic conferences.

- Actually, use as little information as possible. Humans can glean a lot of information about a team by watching just one or two games. No doubt this is useful, but how do we quantify it? The big news is: Who won the game? Next: How much did they win by? Let's try to get by using only that much information. Not average yards per carry, or number of sacks or anything. A team can look great, but lose by 14 points. That's all that really matters.

- Instill a max on the margin of victory. Some teams will avoid a complete blow out, using less experienced players once they are winning by a large margin. Thus, we will cap the number of points a team "wins by". Won the game 59 - 3? We're only going to count that as a two-or-three score victory. What is this margin? It is currently at 14 points (do the NCAA algorithms use this?) but that might change to 21 or so. Three touchdowns seems reasonable.

- Ignore the when. A game late in the season should not be worth more than a game early on. This is an arguable point, but I'm sticking with it.

- Openness. I will tell you exactly how I'm doing this. I will publish the algorithm, explain where I'm getting the data, etc. If there are any problems, I'll let you know! Want to suggest a change? Great! Want to try it out yourself and check my work? Great!

Okay, I think those are all my requirements. It's also important to note that I am biased. I teach at Wittenberg University in Ohio, but also am a fan of Michigan. This causes a cascading list of enemies, enemies of enemies, and so forth. I'll try not to inject this bias into an algorithm, but please check me!

With these in mind, here's a description of the algorithm.

All teams start with the same ranking, and we will change the rankings until they appear to converge by changing only a small amount. All rankings will start at .5, and can range from 0 to 1 throughout the course of the algorithm. Each round of altering the rankings has two steps:

First, we'll look at the results of all games each team has played so far. For each of those games, determine the value of the match to that team by multiplying the strength of the opposing team (in the previous round) by the strength of the margin of victory (negative if the team lost that game). So, if Boston University played SUNY ESF, and won by 10 points, then the value of that match, for BU, would be:

f(10) * str(SUNY ESF)

where str is the strength of that team in the previous round, and f is a function from the integers to the range [0, 1].

For each team, calculate these products, then sum up the values from all matched played (so far) this season. This results in a new list of strengths for each team, but not necessarily in the range of 0 to 1.

The second step, then, is to either stretch or shrink that range so that it fits back exactly into the range [0, 1]. Thus, the lowest-ranked team will have value 0, and the top will have value 1 and the ratio of differences between strengths is maintained.

Finally, if the strengths have changed by any non-negligible amount, run the algorithm for another round.

Note that this actually gives us a family of algorithms: one for each choice of function f. When running this algorithm last year, I used a number of different functions, f, to see which I liked more. What makes a good function?
  • It should be non-decreasing. f(10) should not be less than f(-10) or even f(7).

  • Use the whole range. If the maximum-counting margin of victory is 'max', then f(max) = 1 and f(-max) = 0

  • If f(0) is defined, then f(0) = .5 .

  • In fact, if f(x) = .5 + c, then f(-x) = .5 - c.
There are plenty of options here. Last year, I focused mostly on a sinusoidal curve, but I can't recall exactly why.

Alright, that's enough for this post. I'll put up more as the football season gets underway (including the code). Also, I will start to post rankings after each weekend.

Great!

No comments:

Post a Comment