Yesterday was the last day of my semester and a great day to put one of my favorite models, the Massey Method, from linear algebra to good use. As I revisited my linear algebra book to look back at all that we’ve learned through out the semester, the idea of finding out how NFL teams ranked from day one till now caught my interest. That being said, I committed to finding an approximation to that using the Massey Method and here I will share what I’ve found.
Finding the Data
I first checked NFL’s site to see how far schedules went back and noticed it was from 1970 till now. So I went to search Google for an API or dataset that I could use to give me the scores of each game but unfortunately, I wasn’t able to find anything. Well, I lied. I found some resources but they weren’t exactly what I was looking for. So I ended up writing a python script (source code here) that crawled the week page of each season and scraped the scores and teams for all games.
How the Massey Method Works
The Massey Method works based on the amount of games played between teams and point differential across games. The ranking for each team is found using least squares approximation. Here’s the general algorithm for the Massey Method:
- Write down a system of equations for every single match played. An equation that relates two teams r1 and r2 with a point differential d can be expressed as r1 – r2 = d.
- Convert that into a system of equations of the form Ar̄ = p̄.
- Reach the least squares system of the form ATAr̂ = ATp̄.
- Change the left matrix of this system by setting the last row to a row consisting of solely the number 1. Also change the bottom entry of the new right vector to a 0.
- Now solve the system to determine the ranks.
In total, over the span of around 49 years of NFL history, 11,652 games were played across 32 teams. This is only counting the regular season and playoff games. Preseason games are skewed and not worth much adding into this data as coaches use that time mainly to test out rookies and evaluate their team as a whole unit. That being said, here are the results that my script returned:
|New England Patriots||2.666908177632458|
|San Francisco 49ers||2.190527900025692|
|Kansas City Chiefs||0.8187838466188585|
|Los Angeles Chargers||0.47777279702627246|
|Los Angeles Rams||-0.2173506542743492|
|New York Giants||-0.2642147178334522|
|New York Jets||-1.3913297554590114|
|New Orlean Saints||-1.4872441285045244|
|Tampa Bay Buccaneers||-3.6587151519770678|
Analysis of Results
As you may have already noticed, the numbers in the table above are sorted in decreasing order. That is, the highest rank is at the top while the lowest rank is at the bottom of the table respectively. However, what does that really mean in terms of the question we asked at start: which teams have the greatest point differential? From the data above, the higher a team ranks, the greater is its point differential placement. What that means, for example, is that the Pittsburgh Steelers tend to win games by a greater point differential than the Dallas Cowboys and that the Dallas Cowboys tend to win games by a greater point differential than the Baltimore Ravens, and so on and so forth. It is important to note that this does not necessarily mean that one team is better than the other if it has a higher point differential ranking.
There are many ways this data could’ve been used for meaningful information, but I found this approach particularly interesting because of two reasons: (1) I’m very fond of the Massey Method, specifically the least squares approximation aspect of it and (2) I wanted to see which teams tend to score more than their opponents on average. At the very least, this project was very interesting and a great way to test data approximation models on real data and that is something I value. That being said, I hope you found this helpful or at least an interesting read. As usual, feel free to express your opinions and concerns below and I’d be glad to respond back.
- Featured image can be found here.