A Basic Econometric Analysis of NFL Football
As a long-time professional football fan, I have read countless articles analyzing games and teams explaining why teams are good and why teams are not. One key article of faith in NFL analysis is “Run The Ball, Stop The Run, Win” meaning that the most important things for a team to do to win is run the ball and stop the run. A whole company of football statistics was built to disprove a Boston Globe writer who said that the New England Patriots failed to make the postseason in 2002 “because they could not establish the run”.
This site uses a series of team statistics to determine which ones are significant to a team’s winning percentage. At the end of the season, the most successful teams go in to the playoffs. At that time, each team will have made over 1,000 plays on offense and defense so we will look at the estimated winning percentage for each playoff team and see how close the point spread, the number of points that handicappers establish to equalize betting on both sides, is to that and if the betting public improperly values teams compared to the regression model.
From the website http://www.pro-football-reference.com, the dataset includes team statistics from 319 teams from 2001 to 2010. The game of football can be broken down into various elements for analysis. On any given play, a team is either playing offense or defense. On offense, a team can run the ball, pass the ball, give up a sack of the quarterback, or turn the ball over to the other team either by a fumble or an interception. To measure these, four statistics were calculated from the raw data:
- OPYard – The number of yards the offense gets every time they run the ball
- ORYard – The number of yards the offense gets every time they pass the ball
- OTOPlay – The percentage of plays in which the offense turns the ball over to the other team, either by fumbling the ball and the other team recovering it or the quarterback throwing an interception to a defender
- OSack – The percentage of plays in which the quarterback gets sacked
Similarly, on defense, a team wants to prevent the other team to get yards when they run the ball, prevent the other team to get yards when they pass the ball, take the ball away by forcing and recovering a fumble or intercepting the ball, or sacking the opposing quarterback. To measure these, four statistics were calculated from the raw data:
- DPYard – The number of yards the offense gets every time they run the ball
- DRYard – The number of yards the offense gets every time they pass the ball
- DTOPlay – The percentage of plays in which the offense turns the ball over to the other team, either by fumbling the ball and the other team recovering it or the quarterback throwing an interception to a defender
- DSack – The percentage of plays in which the quarterback gets sacked
The dependent variable in this model is the season winning percentage. In professional football, each team plays 16 games, so the variable WinPct is computed by taking the number of wins and dividing it by 16.
The first regression looks at WinPct as the dependent variable and OPYard, ORYard, OTOPlay, OSack, DPYard, DRYard, DTOPlay, and DSack.
Coefficient |
Standard Error |
t-score |
p Score |
|||
Constant |
.548 |
.121 |
4.507 |
.000 |
||
Offense Pass Yards/Attempt |
.081 |
.008 |
10.211 |
.000 |
||
Offense Run Yards/Attempt |
.030 |
.013 |
2.369 |
.018 |
||
Offense Turnovers/Total Plays |
-6.588 |
.892 |
-7.388 |
.000 |
||
Offense Sacks/Total Plays |
-2.727 |
.510 |
-5.346 |
.000 |
||
Defense Pass Yards/Attempt |
-.096 |
.011 |
-8.959 |
.000 |
||
Defense Run Yards/Attempt |
-.019 |
.013 |
-1.440 |
.151 |
||
Defense Turnovers/Total Plays |
6.014 |
.953 |
6.311 |
.000 |
||
Defense Sacks/Total Plays |
3.345 |
.748 |
4.474 |
.000 |
||
R = .876 |
||||||
R^{2} = .768 |
In this model, the coefficient tells us that if the team doesn’t get any yards per rushing or passing attempt, doesn’t give up any sacks or turn the ball over, doesn’t allows any yards per rushing or passing attempt, and doesn’t sack the quarterback or takeaway the ball, the team will win 54.3% of their games. This is a meaningless statistic because in no cases would it make sense for the 8 variables to have a value of zero over the course of a season. This model tells us that if a team is able to increase by one yard per passing attempt, their winning percentage will increase by .081 percentage points. If the opposition offense can increase their passing yards per attempt by one yard, your team’s winning percentage will drop by .096 percentage points. Six of these eight variables have t-scores above 3 or higher, so all are significant at over the 99% level. The R^{2 }score in the model tells us that these eight variables describe 76.8% of the variation in winning percentage of teams from 2001-2010.
Now to figure out which statistics are most predictive in determining winning percentage, these beta coefficients need to be standardized. The following table shows the means and standard deviations for each statistic:
Mean |
Standard Deviation |
Max |
Min |
95% Confidence Interval |
||||
Winning Percentage |
.4996 |
.1925 |
1.0000 |
.0000 |
0.1145 to 0.8847 | |||
Offense Pass Yards/Attempt |
6.4152 |
.7939 |
8.7723 |
4.5224 |
4.8273 to 8.0030 | |||
Offense Run Yards/Attempt |
4.1281 |
.4326 |
5.4730 |
3.1442 |
3.2629 to 4.9932 | |||
Offense Turnovers/Total Plays |
.0279 |
.0069 |
.0485 |
.0101 |
0.0141 to 0.0416 | |||
Offense Sacks/Total Plays |
.0357 |
.0119 |
.0803 |
.0113 |
0.0119 to 0.0595 | |||
Defense Pass Yards/Attempt |
6.4345 |
.5957 |
8.3883 |
4.7111 |
5.2432 to 7.6259 | |||
Defense Run Yards/Attempt |
4.1287 |
.4279 |
5.3333 |
2.8305 |
3.2729 to 4.9844 | |||
Defense Turnovers/Total Plays |
.0279 |
.0063 |
.0477 |
.0120 |
0.0154 to 0.0404 | |||
Defense Sacks/Total Plays |
.0356 |
.0084 |
.0641 |
.0096 |
0.0189 to 0.0524 |
Each of these variables has a wide variation in results. To compute the standard coefficients for each independent variable, we will multiply the unstandardized or beta coefficient by the standard deviation and then divide that by the standard deviation for the dependent (winning percentage) variable. This results in:
Beta Coefficient |
Standard Deviation |
Standard Coefficient |
|||
Offense Pass Yards/Attempt |
.081 |
.7939 |
0.334 |
||
Defense Pass Yards/Attempt |
-.096 |
.5957 |
-0.297 |
||
Offense Turnovers/Total Plays |
-6.588 |
.0069 |
-0.235 |
||
Defense Turnovers/Total Plays |
6.014 |
.0063 |
0.195 |
||
Offense Sacks/Total Plays |
-2.727 |
.0119 |
-0.169 |
||
Defense Sacks/Total Plays |
3.345 |
.0084 |
0.146 |
||
Offense Run Yards/Attempt |
.030 |
.4326 |
0.067 |
||
Defense Run Yards/Attempt |
-.019 |
.4279 |
-0.043 |
This table is ranked by the absolute value of the standard coefficient to display which variables have the greatest positive or negative impact on winning percentage. The best way to improve a football team is to increase the number of passing yards per attempt on offense. The next best way is to reduce the number of passing yards per attempt on defense. Improving passing yards per attempt on offense and defense is far more effective, by a factor of four, than improving the number of yards per attempt running the ball or preventing the run. Next, it is better for a team to improve their ability to recover turnovers on defense and prevent turning over the ball on offense than it is to sack the opposing quarterback and prevent sacks. Finally, in acquiring players, it makes more sense to spend money on a quarterback who can increase the yards per pass attempt by one yard than it is to spend it on a running back who will increase the yards per attempt by one yard. Similarly with defense, it should be a higher priority for a team to be able to defend the pass than defend the run.
To refine this model further, how teams make decisions whether to run or pass should be considered. When a team starts an offensive play, they either pass the ball or run the ball. Over the course of the season, do teams generally pursue the same strategies between running the football or passing? To determine this, two variables were created. One is called ORunPct which is the percentage of time teams decide to run the ball, or rushing attempts divided by total offensive plays. The other is called DRunPct which is the percentage of times the opposing teams decide the run the ball. We can also look at the percentages in terms of percentage times passing the ball and being passed on, but both sets of variables cannot be run together because of collinearity as the sum of ORunPct and OPassPct is 1 and they are a function of each other. First we look at the distribution of the percentage of time teams run the ball for all 319 teams and see if there is a distribution:
On average, a team runs the ball an average of 44.3% of the time during a season with a standard deviation of 4.8%. The chart above shows on the x-axis the percentage of times a team runs the ball in a season and the y-axis shows the total number of teams that ran the ball that percentage of times in a season. It is not a perfect normal curve but there is sufficient variation to include it in the model.
Coefficient |
Standard Error |
t-score |
p Score |
Standard Coefficient |
||||
Constant |
.799 |
.157 |
5.079 |
.000 |
||||
Offense Pass Yards/Attempt |
.071 |
.008 |
8.902 |
.000 |
.291 |
|||
Offense Run Yards/Attempt |
.002 |
.013 |
.185 |
.854 |
.005 |
|||
Offense Turnovers/Total Plays |
-5.189 |
.871 |
-5.955 |
.000 |
-.185 |
|||
Offense Sacks/Total Plays |
-2.211 |
.488 |
-4.532 |
.000 |
-.137 |
|||
Defense Pass Yards/Attempt |
-.088 |
.010 |
-8.524 |
.000 |
-.271 |
|||
Defense Run Yards/Attempt |
.016 |
.014 |
1.144 |
.254 |
.035 |
|||
Defense Turnovers/Total Plays |
4.619 |
.925 |
4.994 |
.000 |
.150 |
|||
Defense Sacks/Total Plays |
2.415 |
.726 |
3.327 |
.001 |
.105 |
|||
Offensive Run% |
.398 |
.124 |
3.215 |
.001 |
.100 |
|||
Defensive Run% |
-.975 |
.180 |
-5.419 |
.000 |
-.203 |
|||
R = .892 |
R^{2} = .795 |
Adding in these two variables, increases the R squared score from .768 to .795 so this model explains a higher percentage of the variance. Also, by adding in the variable for percentage of plays defending the run, the variables of offense yards per attempt (t-score = .185, p-score = .854) and defensive yards allowed per run attempt loses significance (t-score = 1.144, p-score = .254). How a team runs or defends the run has no impact at all on whether a team wins or loses. It only matters that they do attempt running plays and their opponent does not.
The higher percentage of times a team has to defend the run instead of a pass, the lower the winning percentage. This makes sense as when a team is ahead in a game, they run more to take time off the clock. When a team is behind, they pass the ball almost exclusively since on average, a team gains 2.5 more yards per pass than run and the clock stops when a pass is missed, thus allowing for more plays. If a team is facing a large share of defensive plays defending the run, they are more likely to be losing the game. This can be seen in the table below where percentage of offensive and defensive plays being runs is replaced by percentage of offensive and defensive plays being passes. The coefficient for defensive pass percentage is positive. If the opponent is throwing the ball more than running, the winning percentage for the team examined will go up. The more the team you are looking at throws the ball instead of runs, the lower their winning percentage. Therefore, while it is most significant for an offensive to increase the yards per passing attempt, that gets diminished the more a team throws the ball instead of running it.
Beta Coefficient |
Standard Error |
t-score |
p Score |
Standard Coefficient |
||||
Constant |
.222 |
.161 |
1.377 |
.170 |
||||
Offense Pass Yards/Attempt |
.071 |
.008 |
8.902 |
.000 |
.291 |
|||
Offense Run Yards/Attempt |
.002 |
.013 |
.185 |
.854 |
.005 |
|||
Offense Turnovers/Total Plays |
-5.189 |
.871 |
-5.955 |
.000 |
-.185 |
|||
Offense Sacks/Total Plays |
-2.609 |
.493 |
-5.290 |
.000 |
-.161 |
|||
Defense Pass Yards/Attempt |
-.088 |
.010 |
-8.524 |
.000 |
-.271 |
|||
Defense Run Yards/Attempt |
.016 |
.014 |
1.144 |
.254 |
.035 |
|||
Defense Turnovers/Total Plays |
4.619 |
.925 |
4.994 |
.000 |
.150 |
|||
Defense Sacks/Total Plays |
3.391 |
.705 |
4.809 |
.000 |
.148 |
|||
Offensive Pass% |
-.398 |
.124 |
-3.216 |
.001 |
-.098 |
|||
Defensive Pass% |
.975 |
.180 |
5.419 |
.000 |
.187 |
|||
R = .892 |
R^{2} = .795 |
Next, there are other elements to the game of football that we have not accounted for in the model – special teams. The following four variables to be added include:
- PuntRetG – The number of times per game a team returns a punt. In the game of football, teams have four opportunities to gain 10 yards. After three opportunities they either kick the ball away from the endzone they are defending or they make another attempt to get the 10 yards needed, but if they fail, the other team gets to take the ball at that spot instead of further down the field if they punt. The more a team gets the opportunity to return a punt, the better their defense is doing at stopping the ball
- PuntRetY – The amount of yards gained per punt return.
- KickRetY – The number of yards gained per kick return. A team has the opportunity to return a kick after the other team has scored a touchdown or a field goal and at either the beginning of the game, the second half, and sometimes overtime.
- KickRetYD – The number of yards the opponent gains per kick return.
The website I downloaded the data from did not have a metric for evaluating defending punt returns so it will not be part of this model. Adding these four variables to the model above, while removing defense run yards per attempt, results in:
Coefficient |
Standard Error |
t-score |
p Score |
Standard Coefficient |
||||
Constant |
.664 |
.193 |
3.345 |
.001 |
||||
Offense Pass Yards/Attempt |
.074 |
.008 |
8.969 |
.000 |
.305 |
|||
Offense Run Yards/Attempt |
.003 |
.013 |
.259 |
.796 |
.008 |
|||
Offense Turnovers/Total Plays |
-5.219 |
.879 |
-5.936 |
.000 |
-.186 |
|||
Offense Sacks/Total Plays |
-2.195 |
.490 |
-4.481 |
.000 |
-.136 |
|||
Defense Pass Yards/Attempt |
-.085 |
.011 |
-8.114 |
.000 |
-.264 |
|||
Defense Run Yards/Attempt |
.016 |
.014 |
1.148 |
.252 |
.036 |
|||
Defense Turnovers/Total Plays |
4.601 |
.931 |
4.942 |
.000 |
.149 |
|||
Defense Sacks/Total Plays |
2.272 |
.745 |
3.048 |
.003 |
.099 |
|||
Offensive Run% |
.391 |
.127 |
3.078 |
.002 |
.098 |
|||
Defensive Run% |
-.947 |
.182 |
-5.197 |
.000 |
-.198 |
|||
Punt Returns/Game |
.014 |
.011 |
1.223 |
.222 |
.036 |
|||
Yards per Punt Return |
.002 |
.002 |
.920 |
.358 |
.025 |
|||
Kick Return Yards/Return |
.002 |
.002 |
.642 |
.521 |
.017 |
|||
Kick Return Yards/Return Defended |
.000 |
.003 |
.021 |
.983 |
.001 |
|||
R = .893 |
R^{2} = .797 |
Adding these variables did little to explain the variance in winning percentage as the R squared score stayed at .797. None of the variables have a significant impact on the model. Removing them and the offensive and defensive running yards per attempt results in the following model that will be applied to all teams this year:
Coefficient |
Standard Error |
t-score |
p Score |
Standard Coefficient |
||||
Constant |
.821 |
.153 |
5.351 |
.000 |
||||
Offense Pass Yards/Attempt |
.073 |
.008 |
9.480 |
.000 |
.301 |
|||
Offense Turnovers/Total Plays |
-5.320 |
.863 |
-6.166 |
.000 |
-.190 |
|||
Offense Sacks/Total Plays |
-2.252 |
.486 |
-4.635 |
.000 |
-.139 |
|||
Defense Pass Yards/Attempt |
-.085 |
.010 |
-8.555 |
.000 |
-.263 |
|||
Defense Turnovers/Total Plays |
4.669 |
.914 |
5.109 |
.000 |
.152 |
|||
Defense Sacks/Total Plays |
2.335 |
.722 |
3.236 |
.001 |
.102 |
|||
Offensive Run% |
.394 |
.115 |
3.424 |
.001 |
.099 |
|||
Defensive Run% |
-.904 |
.165 |
-5.485 |
.000 |
-.189 |
|||
R = .891 |
R^{2} = .794 |
The formula that will be used for 2011 is:
Projected Winning % = .821 + .073*OPYard – 5.320*OTOPlay – 2.252*OSack – .085*DPYard + 4.669*DTOPlay + 2.335*DSack + .394*ORunPct – .904*DRunPct
Team |
Win% |
Projected Win % |
Actual Wins |
Projected Wins |
Actual-Projected |
Atlanta Falcons |
0.813 |
0.652 |
13 |
10 |
3 |
Chicago Bears |
0.688 |
0.527 |
11 |
8 |
3 |
Jacksonville Jaguars |
0.500 |
0.323 |
8 |
5 |
3 |
New Orleans Saints |
0.688 |
0.511 |
11 |
8 |
3 |
Arizona Cardinals |
0.313 |
0.196 |
5 |
3 |
2 |
Baltimore Ravens |
0.750 |
0.642 |
12 |
10 |
2 |
Indianapolis Colts |
0.625 |
0.526 |
10 |
8 |
2 |
New England Patriots |
0.875 |
0.767 |
14 |
12 |
2 |
New York Jets |
0.688 |
0.652 |
11 |
10 |
1 |
Philadelphia Eagles |
0.625 |
0.571 |
10 |
9 |
1 |
Seattle Seahawks |
0.438 |
0.389 |
7 |
6 |
1 |
Tampa Bay Buccaneers |
0.625 |
0.582 |
10 |
9 |
1 |
Buffalo Bills |
0.250 |
0.254 |
4 |
4 |
0 |
Miami Dolphins |
0.438 |
0.417 |
7 |
7 |
0 |
Oakland Raiders |
0.500 |
0.500 |
8 |
8 |
0 |
Washington Redskins |
0.375 |
0.352 |
6 |
6 |
0 |
Cleveland Browns |
0.313 |
0.359 |
5 |
6 |
-1 |
Denver Broncos |
0.250 |
0.290 |
4 |
5 |
-1 |
Houston Texans |
0.375 |
0.447 |
6 |
7 |
-1 |
Kansas City Chiefs |
0.625 |
0.665 |
10 |
11 |
-1 |
Minnesota Vikings |
0.375 |
0.435 |
6 |
7 |
-1 |
New York Giants |
0.625 |
0.684 |
10 |
11 |
-1 |
San Francisco 49ers |
0.375 |
0.462 |
6 |
7 |
-1 |
Carolina Panthers |
0.125 |
0.239 |
2 |
4 |
-2 |
Dallas Cowboys |
0.375 |
0.520 |
6 |
8 |
-2 |
Detroit Lions |
0.375 |
0.492 |
6 |
8 |
-2 |
Green Bay Packers |
0.625 |
0.739 |
10 |
12 |
-2 |
Pittsburgh Steelers |
0.750 |
0.846 |
12 |
14 |
-2 |
San Diego Chargers |
0.563 |
0.717 |
9 |
11 |
-2 |
St. Louis Rams |
0.438 |
0.532 |
7 |
9 |
-2 |
Tennessee Titans |
0.375 |
0.499 |
6 |
8 |
-2 |
Cincinnati Bengals |
0.250 |
0.441 |
4 |
7 |
-3 |
The model projected within two wins, the actual number of wins for 27 of the 32 teams. Teams that overperformed the model by three games include Atlanta, Jacksonville, New Orleans, and Chicago. Three of those teams made the playoffs and two of them lost their first playoff game while favored (Atlanta, New Orleans). Chicago won a playoff game against the team that beat New Orleans (Seattle) and then lost to Green Bay at home. The Packers won the Super Bowl by beating teams that won as many or more games than them (Philadelphia, Atlanta, Chicago). However, that shouldn’t have been a surprise given how they had a better projected winning percentage than those three teams. They also beat the Steelers in the Super Bowl, an upset using this model as Pittsburgh had a projected winning percentage of .846 versus Green Bay’s .739. The game went to the final play showing how close these teams were.
Pingback: Week 3 Rankings | Legendreball