Comparing shots across the divisions of English football

Watching the highlights of each week’s Football League action, it can sometimes feel like there are more spectacular goals in the lower divisions, particularly shots from range. Although, there is the obvious counter-argument that because the highlights for leagues that aren’t the Premier League tend to be much more condensed, we are more likely to quickly forget the average goals as they are quickly passed over, in favour of goals from the likes of Lee Trundle. However, this raises a serious question; do teams shoot differently at different levels of the Premier League and Football League? Another way of thinking about this would be whether the quality of both attacking and defending increases evenly as you ascend the levels of the footballing pyramid.

While I do not intend to look at this from every possible point of view (being able to see how different skills transfer from one league to another would be one of the most valuable commodities on football – just look at the clichés about Eredivisie strikers), I will look briefly at location and mean conversion of shots in different zones in the top 3 divisions of English football.

Shot locations

These maps show the proportion of shots originating from each area of the pitch in each of the divisions.

RplotA

Conversion (Goals per Shot)

(NB: the sample sizes for headed shots outside the box is understandably very small and so the differences here are unlikely to be significant)

RplotB

Interestingly, both shot locations and conversion rates are very similarly distributed across at least these divisions of English football. Perhaps this suggests that a large degree of attacking and defending scales up and down the leagues. There is also the point that these numbers are aggregated from the whole league and so do not reflect the distribution within each league. However, I think this is a noteworthy result, if not particularly earth-shattering.

Approximating game paths and the defensive shell hypothesis

Recently, I looked at game states in the Championship; however, by taking some of the ideas presented by Dan Altman, namely the idea that score can be interpreted as path-dependent rather than a state variable, we can shed more light on the effect that goals have on shot ratios. In short, being path-dependent simply means that how a game state is reached is significant. In my previous post, I treated all the time spent at +1 as equivalent. However, a team can reach the +1 game state either by scoring a goal at +0, or by conceding from +2. A path oriented approach treats both of these instances as separate rather than lumping them together. By taking into account which team scored the previous goal (i.e. is Game State decreasing or increasing, we can further investigate how goals change games.

Total Shot Share

Rplot37

Expected Goals Ratio

Rplot36

This would appear to suggest, along with the effect of time on score effects (see Garry Gelade’s excellent piece and this gif), that defensive shelling occurs when the leading team is under more pressure; it suggests that having just scored, teams are more likely to be outshot especially towards the end of matches. The position of the blue line (teams having just scored) relative to the pink in these two plots agrees with the defensive shell rationalisation, as touched upon in the previous post on game states. Having just scored, teams take fewer shots (blue lower than pink on total shots); however, those shots that they do take tend to be of better quality (blue higher than/similar to pink on xG).

What a difference a goal makes: Score effects in the 14/15 Championship

One of the most useful and widespread tools in football ‘fanalytics’ is the shot ratio. Both Total Shots Ratio (TSR) and Shots on Target Ratio (SoTR) derive their utility from their high repeatability and predictability; though they by no means tell the full story,we are more likely to be correct in predicting a team’s future performance within a season by looking at their TSR than by simply looking at their points or goal difference.

Shots ratios are also useful as they allow us to quantify how much a team dominates the shots tally. Again, despite not telling the full picture (just look at Derby County this season), we know intuitively that teams that can take more shots than their opponents will generally score more points than those that concede a greater share of the shots.

However, as the old cliché goes, goals change games. In fact, the change that takes place on the scoreboard can be seen no the shots chart, too. As Ben Pugsley, puts it in his primer for score effects in the Premier League: “The team that takes the lead in any given fixture is likely to sit a little deeper and take fewer shots. The team that is trailing will attack more and take more shots – especially as time begins to tick down.”

So do we see the same effect in the Championship? Using 2014/15 data, we can look at the shots ratios by each game state, where game state is simply the relative score (i.e. a team winning by one goal is at a game state of +1, while a team losing by four goals would be at -4). N.B. The vast majority of playing time and shots occur at the ‘close’ game states (-1,0, and +1), so due to the reduced sample sizes of more extreme scores, it is best to focus most of our attention on trends within close states.

Total Shots Ratio

TSR GS

Mirroring Ben Pugsley’s Premier League findings, we see that teams at +1 tend to be slightly outshot. The effect is not huge; their shot share is reduced from 50% to a little over 48% which is not a huge drop. This can seem like a strange behaviour. If teams are winning by one goal, why not push for more to extent their lead to a more comfortable 2-0, especially when they’ve proven that they can score?

The most popular explanation offered, when teams take the lead, it can be advantageous to sit back and create a defensive shell. By sitting deeper and sacrificing some of your attack in favour of defence you protect your lead. Furthermore, as the opponent struggles to break down the leading team, taking more shots from poorer situations, you can open up opportunities for the counter. Teams at +1 can therefore afford to take fewer risks and be more selective with their shots, taking fewer shots but from situations from which they are more likely to score.

So does the defensive shell hypothesis match the data? Well, we can test this by looking at expected goals by game state.

Expected Goals Ratio / Expected Goals per Shot

In short, expected goals (or xG for short) models are an attempt to weight different shots according to their likelihood of being scored. For instance, a  shot from 40 yards is generally not as valuable as one from the centre of the penalty box, and will therefore have a lower expected goals value.

xG game state

We can see here that despite taking 48.4% of the shots, teams at +1 have an xG share of 54.4%, fitting nicely with our defensive shell explanation and the bump in shot quality that teams at +1 and higher have can be seen even more clearly by looking at the average xG per shot (a measure of shot quality) and conversion by game state:

Rplot02 Rplot05

An alternative explanation

There is, of course, a competing explanation for these trends that I haven’t yet confronted. The sample of teams at +1 is likely to be biased; good teams with good players are more likely to spend time in a winning game state as opposed to bad teams and it would be unsurprising for these teams with better players to be taking better shots and converting at higher rates.

If this were the case, we would expect the trend in xG share to disappear when teams of similar quality played each other.

To test this, I split the league into 4 groups of six teams, ranked by their total xG ratio, and looked at the trends for score effects only in games in which teams of the same groups played each other. This resulted in the following plot:

Rplot06

  • Green (Top 6 xG): Bournemouth, Middlesbrough, Derby, Norwich City, Ipswich Town, Brighton-and-Hove Albion
  • Red: Watford, Nottingham Forest, Blackburn Rovers, Brentford, Reading, Sheffield Wednesday
  • Blue: Wolves, Cardiff City, Wigan Athletic, Millwall,  Huddersfield Town, Rotherham United
  • Black (Bottom 6 xG): Bolton Wanderers, Birmingham City, Fulham, Charlton Athletic, Leeds United and Blackpool.

As we can see, despite the teams being grouped by model ranking, the effect remains. This evidence is more suggestive of the defensive shell effect; however, I would expect that the alternative explanation of sample bias also plays an effect, though perhaps a smaller one.

Closing point

With all this in mind, it is incredibly important to get the first goal in the Championship. If we look at this chart of goal share by game state, we can see just how hard it is for teams to clamber out of a losing position:

Rplot07

Teams at -1 (losing by one goal) get back level just 12% of the time. Perhaps this is part of what makes the Championship such a volatile and exciting league. Even the best teams can be flipped onto their backs by a goal against and have difficulty coming back.

EDIT: Having shared this, Ronnie (@NotAllGlsEqual) shared an image of score effects in different time bands for the Premier League. This then inspired me to do something similar to show how score effects changed over the course of a Championship match:

mins75

Thoughts on shot ratios

I was thinking the other day about the metrics we use to evaluate team performance in football/soccer and I thought I’d write a little about it. Now, definitionally SoTR, TSR and other such metrics (definitions here) are comparative; it’s easy when discussing them as a proxy for team quality to forget this, but it’s important to remember when we quote a TSR/SoTR we are looking at a team’s ability to generate shots relative to their opponents. TSR or SoTR are by no means absolute measures, because the teams that make up each division change each year.

I think this is in part why the various shot ratios are much less repeatable from year to year in the Championship than in the Premier League; the Premier League is a more rigid league than the Championship, changing only 3 teams each year combined with a financial structure that helps to maintain the status quo (i.e. there is more disparity in terms of team quality by default).

This brings me on to my second thought. Given we expect a degree of noise around the mean, the relationship between the various shots ratios and points ought to be clearer when the league is more skewed ( so long as the noise increases at a lower rate than the range in SoTR does, which I think is reasonable). The following gif shows the plots of SoTR vs Points from each season from 2004-2005 along with the least-squares trend-line from the combined data (i.e. the data from all years). The number on the bottom is the r-squared (a measure of how well the model matches the data: explanation here) for that year’s data and the line fitted from the combined data.

SoTR vs Points 04/05 to 13/14

So there’s perhaps a couple of things we can look at from this.

  • 2012-2013 was weird. With such a tight range of SoTRs, the relationship between SoTR and Points becomes difficult to see. The negative r-squared tells us that for this specific data set, a flat line (i.e. SoTR doesn’t have any effect on Points) fits the data better than our overall trend-line. 2013’s weirdness may be worth a closer look at some point.
  • It does appear that my initial hypothesis that the relationship is clearer when the league is more stretched (I suppose r-squared could be interpreted as a measure of signal:noise). Moreover, I have another graph that appears to back this up:
Rplot
Linear fit of Stdev vs R-squared… so meta. R-squared = 0.618; P-value = 0.0070;

While I don’t want to fall into the trap of simply throwing out statistical measures without thought, I think this plot looks pretty encouraging; there appears to be a clear linear relationship between how well SoTR correlated with Points scored over a season and how uneven the league is that year.

I started this post recalling that shots ratio-based metrics are comparative metrics. As a result, a team which stays at the same quality from one year to the next will not necessarily have a similar SoTR the next year. Likewise I should be wary of comparing two teams across different years directly by SoTR or TSR without context (although in truth any statistical analysis without contextual awareness is pernicious).  Likewise, it does appear that SoTR/TSR are more powerful metrics for evaluating relative team quality in years where the league is more uneven.