About the Projected Win Model

A Basic Econometric Analysis of NFL Football

As a long-time professional football fan, I have read countless articles analyzing games and teams explaining why teams are good and why teams are not. One key article of faith in NFL analysis is “Run The Ball, Stop The Run, Win” meaning that the most important things for a team to do to win is run the ball and stop the run. A whole company of football statistics was built to disprove a Boston Globe writer who said that the New England Patriots failed to make the postseason in 2002 “because they could not establish the run”.

This site uses a series of team statistics to determine which ones are significant to a team’s winning percentage. At the end of the season, the most successful teams go in to the playoffs. At that time, each team will have made over 1,000 plays on offense and defense so we will look at the estimated winning percentage for each playoff team and see how close the point spread, the number of points that handicappers establish to equalize betting on both sides, is to that and if the betting public improperly values teams compared to the regression model.

From the website http://www.pro-football-reference.com, the dataset includes team statistics from 319 teams from 2001 to 2010. The game of football can be broken down into various elements for analysis. On any given play, a team is either playing offense or defense. On offense, a team can run the ball, pass the ball, give up a sack of the quarterback, or turn the ball over to the other team either by a fumble or an interception. To measure these, four statistics were calculated from the raw data:

  • OPYard – The number of yards the offense gets every time they run the ball
  • ORYard – The number of yards the offense gets every time they pass the ball
  • OTOPlay – The percentage of plays in which the offense turns the ball over to the other team, either by fumbling the ball and the other team recovering it or the quarterback throwing an interception to a defender
  • OSack – The percentage of plays in which the quarterback gets sacked

Similarly, on defense, a team wants to prevent the other team to get yards when they run the ball, prevent the other team to get yards when they pass the ball, take the ball away by forcing and recovering a fumble or intercepting the ball, or sacking the opposing quarterback. To measure these, four statistics were calculated from the raw data:

  • DPYard – The number of yards the offense gets every time they run the ball
  • DRYard – The number of yards the offense gets every time they pass the ball
  • DTOPlay – The percentage of plays in which the offense turns the ball over to the other team, either by fumbling the ball and the other team recovering it or the quarterback throwing an interception to a defender
  • DSack – The percentage of plays in which the quarterback gets sacked

The dependent variable in this model is the season winning percentage. In professional football, each team plays 16 games, so the variable WinPct is computed by taking the number of wins and dividing it by 16.

The first regression looks at WinPct as the dependent variable and OPYard, ORYard, OTOPlay, OSack, DPYard, DRYard, DTOPlay, and DSack.

Coefficient

Standard Error

t-score

p Score

Constant

.548

.121

4.507

.000

Offense Pass Yards/Attempt

.081

.008

10.211

.000

Offense Run Yards/Attempt

.030

.013

2.369

.018

Offense Turnovers/Total Plays

-6.588

.892

-7.388

.000

Offense Sacks/Total Plays

-2.727

.510

-5.346

.000

Defense Pass Yards/Attempt

-.096

.011

-8.959

.000

Defense Run Yards/Attempt

-.019

.013

-1.440

.151

Defense Turnovers/Total Plays

6.014

.953

6.311

.000

Defense Sacks/Total Plays

3.345

.748

4.474

.000

R = .876

R2 = .768

In this model, the coefficient tells us that if the team doesn’t get any yards per rushing or passing attempt, doesn’t give up any sacks or turn the ball over, doesn’t allows any yards per rushing or passing attempt, and doesn’t sack the quarterback or takeaway the ball, the team will win 54.3% of their games. This is a meaningless statistic because in no cases would it make sense for the 8 variables to have a value of zero over the course of a season. This model tells us that if a team is able to increase by one yard per passing attempt, their winning percentage will increase by .081 percentage points. If the opposition offense can increase their passing yards per attempt by one yard, your team’s winning percentage will drop by .096 percentage points. Six of these eight variables have t-scores above 3 or higher, so all are significant at over the 99% level. The R2 score in the model tells us that these eight variables describe 76.8% of the variation in winning percentage of teams from 2001-2010.

Now to figure out which statistics are most predictive in determining winning percentage, these beta coefficients need to be standardized. The following table shows the means and standard deviations for each statistic:

Mean

Standard Deviation

Max

Min

95% Confidence Interval

Winning Percentage

.4996

.1925

1.0000

.0000

0.1145 to 0.8847
Offense Pass Yards/Attempt

6.4152

.7939

8.7723

4.5224

4.8273 to 8.0030
Offense Run Yards/Attempt

4.1281

.4326

5.4730

3.1442

3.2629 to 4.9932
Offense Turnovers/Total Plays

.0279

.0069

.0485

.0101

0.0141 to 0.0416
Offense Sacks/Total Plays

.0357

.0119

.0803

.0113

0.0119 to 0.0595
Defense Pass Yards/Attempt

6.4345

.5957

8.3883

4.7111

5.2432 to 7.6259
Defense Run Yards/Attempt

4.1287

.4279

5.3333

2.8305

3.2729 to 4.9844
Defense Turnovers/Total Plays

.0279

.0063

.0477

.0120

0.0154 to 0.0404
Defense Sacks/Total Plays

.0356

.0084

.0641

.0096

0.0189 to 0.0524

Each of these variables has a wide variation in results. To compute the standard coefficients for each independent variable, we will multiply the unstandardized or beta coefficient by the standard deviation and then divide that by the standard deviation for the dependent (winning percentage) variable. This results in:

Beta Coefficient

Standard Deviation

Standard Coefficient

Offense Pass Yards/Attempt

.081

.7939

0.334

Defense Pass Yards/Attempt

-.096

.5957

-0.297

Offense Turnovers/Total Plays

-6.588

.0069

-0.235

Defense Turnovers/Total Plays

6.014

.0063

0.195

Offense Sacks/Total Plays

-2.727

.0119

-0.169

Defense Sacks/Total Plays

3.345

.0084

0.146

Offense Run Yards/Attempt

.030

.4326

0.067

Defense Run Yards/Attempt

-.019

.4279

-0.043

This table is ranked by the absolute value of the standard coefficient to display which variables have the greatest positive or negative impact on winning percentage. The best way to improve a football team is to increase the number of passing yards per attempt on offense. The next best way is to reduce the number of passing yards per attempt on defense. Improving passing yards per attempt on offense and defense is far more effective, by a factor of four, than improving the number of yards per attempt running the ball or preventing the run. Next, it is better for a team to improve their ability to recover turnovers on defense and prevent turning over the ball on offense than it is to sack the opposing quarterback and prevent sacks. Finally, in acquiring players, it makes more sense to spend money on a quarterback who can increase the yards per pass attempt by one yard than it is to spend it on a running back who will increase the yards per attempt by one yard. Similarly with defense, it should be a higher priority for a team to be able to defend the pass than defend the run.

To refine this model further, how teams make decisions whether to run or pass should be considered. When a team starts an offensive play, they either pass the ball or run the ball. Over the course of the season, do teams generally pursue the same strategies between running the football or passing? To determine this, two variables were created. One is called ORunPct which is the percentage of time teams decide to run the ball, or rushing attempts divided by total offensive plays. The other is called DRunPct which is the percentage of times the opposing teams decide the run the ball. We can also look at the percentages in terms of percentage times passing the ball and being passed on, but both sets of variables cannot be run together because of collinearity as the sum of ORunPct and OPassPct is 1 and they are a function of each other. First we look at the distribution of the percentage of time teams run the ball for all 319 teams and see if there is a distribution:

On average, a team runs the ball an average of 44.3% of the time during a season with a standard deviation of 4.8%. The chart above shows on the x-axis the percentage of times a team runs the ball in a season and the y-axis shows the total number of teams that ran the ball that percentage of times in a season. It is not a perfect normal curve but there is sufficient variation to include it in the model.

Coefficient

Standard Error

t-score

p Score

Standard Coefficient

Constant

.799

.157

5.079

.000

Offense Pass Yards/Attempt

.071

.008

8.902

.000

.291

Offense Run Yards/Attempt

.002

.013

.185

.854

.005

Offense Turnovers/Total Plays

-5.189

.871

-5.955

.000

-.185

Offense Sacks/Total Plays

-2.211

.488

-4.532

.000

-.137

Defense Pass Yards/Attempt

-.088

.010

-8.524

.000

-.271

Defense Run Yards/Attempt

.016

.014

1.144

.254

.035

Defense Turnovers/Total Plays

4.619

.925

4.994

.000

.150

Defense Sacks/Total Plays

2.415

.726

3.327

.001

.105

Offensive Run%

.398

.124

3.215

.001

.100

Defensive Run%

-.975

.180

-5.419

.000

-.203

R = .892

R2 = .795

Adding in these two variables, increases the R squared score from .768 to .795 so this model explains a higher percentage of the variance. Also, by adding in the variable for percentage of plays defending the run, the variables of offense yards per attempt (t-score = .185, p-score = .854) and defensive yards allowed per run attempt loses significance (t-score = 1.144, p-score = .254). How a team runs or defends the run has no impact at all on whether a team wins or loses. It only matters that they do attempt running plays and their opponent does not.

The higher percentage of times a team has to defend the run instead of a pass, the lower the winning percentage. This makes sense as when a team is ahead in a game, they run more to take time off the clock. When a team is behind, they pass the ball almost exclusively since on average, a team gains 2.5 more yards per pass than run and the clock stops when a pass is missed, thus allowing for more plays. If a team is facing a large share of defensive plays defending the run, they are more likely to be losing the game. This can be seen in the table below where percentage of offensive and defensive plays being runs is replaced by percentage of offensive and defensive plays being passes. The coefficient for defensive pass percentage is positive. If the opponent is throwing the ball more than running, the winning percentage for the team examined will go up. The more the team you are looking at throws the ball instead of runs, the lower their winning percentage. Therefore, while it is most significant for an offensive to increase the yards per passing attempt, that gets diminished the more a team throws the ball instead of running it.

Beta Coefficient

Standard Error

t-score

p Score

Standard Coefficient

Constant

.222

.161

1.377

.170

Offense Pass Yards/Attempt

.071

.008

8.902

.000

.291

Offense Run Yards/Attempt

.002

.013

.185

.854

.005

Offense Turnovers/Total Plays

-5.189

.871

-5.955

.000

-.185

Offense Sacks/Total Plays

-2.609

.493

-5.290

.000

-.161

Defense Pass Yards/Attempt

-.088

.010

-8.524

.000

-.271

Defense Run Yards/Attempt

.016

.014

1.144

.254

.035

Defense Turnovers/Total Plays

4.619

.925

4.994

.000

.150

Defense Sacks/Total Plays

3.391

.705

4.809

.000

.148

Offensive Pass%

-.398

.124

-3.216

.001

-.098

Defensive Pass%

.975

.180

5.419

.000

.187

R = .892

R2 = .795

Next, there are other elements to the game of football that we have not accounted for in the model – special teams. The following four variables to be added include:

  • PuntRetG – The number of times per game a team returns a punt. In the game of football, teams have four opportunities to gain 10 yards. After three opportunities they either kick the ball away from the endzone they are defending or they make another attempt to get the 10 yards needed, but if they fail, the other team gets to take the ball at that spot instead of further down the field if they punt. The more a team gets the opportunity to return a punt, the better their defense is doing at stopping the ball
  • PuntRetY – The amount of yards gained per punt return.
  • KickRetY – The number of yards gained per kick return. A team has the opportunity to return a kick after the other team has scored a touchdown or a field goal and at either the beginning of the game, the second half, and sometimes overtime.
  • KickRetYD – The number of yards the opponent gains per kick return.

The website I downloaded the data from did not have a metric for evaluating defending punt returns so it will not be part of this model. Adding these four variables to the model above, while removing defense run yards per attempt, results in:

Coefficient

Standard Error

t-score

p Score

Standard Coefficient

Constant

.664

.193

3.345

.001

Offense Pass Yards/Attempt

.074

.008

8.969

.000

.305

Offense Run Yards/Attempt

.003

.013

.259

.796

.008

Offense Turnovers/Total Plays

-5.219

.879

-5.936

.000

-.186

Offense Sacks/Total Plays

-2.195

.490

-4.481

.000

-.136

Defense Pass Yards/Attempt

-.085

.011

-8.114

.000

-.264

Defense Run Yards/Attempt

.016

.014

1.148

.252

.036

Defense Turnovers/Total Plays

4.601

.931

4.942

.000

.149

Defense Sacks/Total Plays

2.272

.745

3.048

.003

.099

Offensive Run%

.391

.127

3.078

.002

.098

Defensive Run%

-.947

.182

-5.197

.000

-.198

Punt Returns/Game

.014

.011

1.223

.222

.036

Yards per Punt Return

.002

.002

.920

.358

.025

Kick Return Yards/Return

.002

.002

.642

.521

.017

Kick Return Yards/Return Defended

.000

.003

.021

.983

.001

R = .893

R2 = .797

Adding these variables did little to explain the variance in winning percentage as the R squared score stayed at .797. None of the variables have a significant impact on the model. Removing them and the offensive and defensive running yards per attempt results in the following model that will be applied to all teams this year:

Coefficient

Standard Error

t-score

p Score

Standard Coefficient

Constant

.821

.153

5.351

.000

Offense Pass Yards/Attempt

.073

.008

9.480

.000

.301

Offense Turnovers/Total Plays

-5.320

.863

-6.166

.000

-.190

Offense Sacks/Total Plays

-2.252

.486

-4.635

.000

-.139

Defense Pass Yards/Attempt

-.085

.010

-8.555

.000

-.263

Defense Turnovers/Total Plays

4.669

.914

5.109

.000

.152

Defense Sacks/Total Plays

2.335

.722

3.236

.001

.102

Offensive Run%

.394

.115

3.424

.001

.099

Defensive Run%

-.904

.165

-5.485

.000

-.189

R = .891

R2 = .794

The formula that will be used for 2011 is:

Projected Winning % = .821 + .073*OPYard – 5.320*OTOPlay – 2.252*OSack – .085*DPYard + 4.669*DTOPlay + 2.335*DSack + .394*ORunPct – .904*DRunPct

Team

Win%

Projected Win %

Actual Wins

Projected Wins

Actual-Projected

Atlanta Falcons

0.813

0.652

13

10

3

Chicago Bears

0.688

0.527

11

8

3

Jacksonville Jaguars

0.500

0.323

8

5

3

New Orleans Saints

0.688

0.511

11

8

3

Arizona Cardinals

0.313

0.196

5

3

2

Baltimore Ravens

0.750

0.642

12

10

2

Indianapolis Colts

0.625

0.526

10

8

2

New England Patriots

0.875

0.767

14

12

2

New York Jets

0.688

0.652

11

10

1

Philadelphia Eagles

0.625

0.571

10

9

1

Seattle Seahawks

0.438

0.389

7

6

1

Tampa Bay Buccaneers

0.625

0.582

10

9

1

Buffalo Bills

0.250

0.254

4

4

0

Miami Dolphins

0.438

0.417

7

7

0

Oakland Raiders

0.500

0.500

8

8

0

Washington Redskins

0.375

0.352

6

6

0

Cleveland Browns

0.313

0.359

5

6

-1

Denver Broncos

0.250

0.290

4

5

-1

Houston Texans

0.375

0.447

6

7

-1

Kansas City Chiefs

0.625

0.665

10

11

-1

Minnesota Vikings

0.375

0.435

6

7

-1

New York Giants

0.625

0.684

10

11

-1

San Francisco 49ers

0.375

0.462

6

7

-1

Carolina Panthers

0.125

0.239

2

4

-2

Dallas Cowboys

0.375

0.520

6

8

-2

Detroit Lions

0.375

0.492

6

8

-2

Green Bay Packers

0.625

0.739

10

12

-2

Pittsburgh Steelers

0.750

0.846

12

14

-2

San Diego Chargers

0.563

0.717

9

11

-2

St. Louis Rams

0.438

0.532

7

9

-2

Tennessee Titans

0.375

0.499

6

8

-2

Cincinnati Bengals

0.250

0.441

4

7

-3

The model projected within two wins, the actual number of wins for 27 of the 32 teams. Teams that overperformed the model by three games include Atlanta, Jacksonville, New Orleans, and Chicago. Three of those teams made the playoffs and two of them lost their first playoff game while favored (Atlanta, New Orleans). Chicago won a playoff game against the team that beat New Orleans (Seattle) and then lost to Green Bay at home. The Packers won the Super Bowl by beating teams that won as many or more games than them (Philadelphia, Atlanta, Chicago). However, that shouldn’t have been a surprise given how they had a better projected winning percentage than those three teams. They also beat the Steelers in the Super Bowl, an upset using this model as Pittsburgh had a projected winning percentage of .846 versus Green Bay’s .739. The game went to the final play showing how close these teams were.

One Response to About the Projected Win Model

  1. Pingback: Week 3 Rankings | Legendreball