Quantifying attacking player reliance

Back in 2012, Brendan Rodgers remarked that he has “always thought that if you have three-and-a-half goalscorers in your team, you have got a chance”. This raises the question, are good teams more likely to have their shots and goals spread out amongst the team, or localised in a handful of players? If so, what does this tell us about the nature of the sport.

To test this, we need a metric that can help quantify the distribution of shots within a team. One option would be to use the the Gini coefficient, a common measure of inequality. However, this would not entirely account for the fact that teams with fewer shots, will naturally tend have a more equal distribution of shots. Instead, I have chosen to use the coefficient of variance of teams’ simple expected goal (expected goals being a model for weighting shots based on their likelihood of resulting in a goal). By this measure, a team with a lot of players contributing more equally to their expected goals tally will have a lower coefficient of variance than a team whose shots are coming from only a couple of players.

When we plot this measure against goals scored in the league, we can see a weak correlation:

Rplot09

So that’s it, focusing your attack around fewer players is more effective? Well, not necessarily. Firstly, attacking systems will tend to focus themselves around star players and if we look at where teams are placed along the x axis, it would seem sensible to suggest that teams with high quality strikers will tend t have a higher coefficient of variance. For instance, Blackburn have both Gestede and Rhodes, Watford have Deeney and Ighalo (and Vydra), and Ipswich have Murphy scoring goals for fun. If we can expect top strikers to score goals somewhat independently of the quality of their teammates, then a correlation such as the one seen here would be expected.

So what conclusions can we draw from this, if any? Well, there are obvious limitations in the method used, here. For one thing, a simple shots-based evaluation of a team’s attack will fail capture all the subtleties of attacking contribution and variation. With this and the diffuse nature of the correlation, it would be pernicious to try to draw any large or spectacular conclusions. However, we have at least derived a useful method for determining the spread of xG around a team.

Lawro vs the machines: A statistical analysis of Mark Lawrenson’s PL predictions

EDIT: It turns out that Lawro’s predictions have been subject to analysis before by the excellent @We_R_PL, which can be found here. His is a more even-handed analysis, which also features a nice summary of the benefits of statistically-based analytics.

The BBC football department has long stretched the definition of the word ‘expert’. From regularly publishing Garth Crooks’ (“football analyst”) team of the week to being responsible for the reintroduction of Robbie Savage to our TV screens, The BBC’s punditry and analysis has often felt more Jamie Redknapp than Gary Neville. One regular BBC column is Mark Lawrenson’s (“football expert”) weekly predictions. In these pieces, he predicts the score for each the the weekend’s Premier League matches. Upon completion of the games, he (and a celebrity guest who has also submitted a set of predictions) receives a score based on the accuracy of his predictions: 3 points for the correct scoreline or 1 point for just the correct outcome (win/lose/draw) and receiving no points if he is wrong. So how expert Lawrenson’s opinion? To test this, I decided to test some simple models against Lawro’s predictions for this season. In the first round of testing, I used 3 ‘dumb’ (in other words, it uses no information about the teams aside from whether they are playing at home or away):

  1. (Red) – This model predicted the home and away goals as random numbers from one to eight. These numbers were not weighted, so this model was just as likely to pic 8-8 as its prediction as it would a more reasonable scoreline like 0-0 or 2-1.
  2. (Green) – This model predicted the home and away goals as random numbers from one to eight. However, this time, the probability of picking each number was weighted using historical averages, so 0,1 or 2 goals are by far the most likely results.
  3. (Blue) – This model predicted the home and away goals as random numbers from one to two, weighted by whether a team was playing at home or away. In short, this meant that there was a 4% chance of an away win (0-1), a 14% chance of a draw (0-0, 1-1) and an 82% chance of a home win (2-1, 2-0, 1-0).

By simulating the predictions 10,000 times we can get an idea of how well each model is likely to do and compare that with Lawro’s score for the season, which at the time of writing is 217. Rplot50As we can see Lawro beats these models pretty comfortably, although he has a roughly 0.8 % chance of being beaten by the 3rd model (blue). So what happens when we use a slightly more sophisticated model? The 4th model (purple) is very similar to the 3rd one (blue) except instead of biasing towards the home team, it favours the team with a higher Shots on Target Ratio (shots on target taken divided by the sum of shots on target taken and shots on target conceded). Rplot55 This new model beats Lawro’s score for the season 55% of the time. In other words,the average score for this model is very similar to Lawro’s. So what is it about this model that makes it similar to Lawro? Well, if we think about what the model is doing, it is essentially picking the better team(i.e. the favourite) to win the vast majority of the time. In fact, if we reduce the luck element and alter the model to simply pick whichever team has the higher Shots on Target ratio to win 2-0, the model scores marginally higher than Lawrenson’s own score. The same model comes out very similarly to Lawro for previous seasons, too. For instance, here is the same plot as above for 2013/14: Rplot57 In other words, we can get a very good estimate of Lawro’s total score for the season simply by backing the favourite team to win each game. And herein lies the true nature of Mark Lawrenson’s expertise. What good is an expert who fails to give more insight than simply “the best team will win”? This may be a good strategy to take of you knew little about football, but one would expect better from someone whose employment is based upon their access to a higher level of understanding of the game. Perhaps this is harsh, after all, football is a low-scoring sport and as we are so often reminded, underdogs can nab a goal and come out victorious. All of which makes football inherently difficult to predict. Moreover, I will concede that there is an argument that the predictions are meant to be a bit of fun and not to be taken too seriously, which is a fair criticism. However, all of this is part of a broader malaise in football coverage, especially within the BBC. As other media outlets like Sky invest more time and energy into providing increasingly sophisticated analysis, the BBC, content with Alan Hansen’s declarations of “schoolboy defending” and endless discussions on refereeing mistakes, runs the risk of falling further behind.

Approximating game paths and the defensive shell hypothesis

Recently, I looked at game states in the Championship; however, by taking some of the ideas presented by Dan Altman, namely the idea that score can be interpreted as path-dependent rather than a state variable, we can shed more light on the effect that goals have on shot ratios. In short, being path-dependent simply means that how a game state is reached is significant. In my previous post, I treated all the time spent at +1 as equivalent. However, a team can reach the +1 game state either by scoring a goal at +0, or by conceding from +2. A path oriented approach treats both of these instances as separate rather than lumping them together. By taking into account which team scored the previous goal (i.e. is Game State decreasing or increasing, we can further investigate how goals change games.

Total Shot Share

Rplot37

Expected Goals Ratio

Rplot36

This would appear to suggest, along with the effect of time on score effects (see Garry Gelade’s excellent piece and this gif), that defensive shelling occurs when the leading team is under more pressure; it suggests that having just scored, teams are more likely to be outshot especially towards the end of matches. The position of the blue line (teams having just scored) relative to the pink in these two plots agrees with the defensive shell rationalisation, as touched upon in the previous post on game states. Having just scored, teams take fewer shots (blue lower than pink on total shots); however, those shots that they do take tend to be of better quality (blue higher than/similar to pink on xG).

What a difference a goal makes: Score effects in the 14/15 Championship

One of the most useful and widespread tools in football ‘fanalytics’ is the shot ratio. Both Total Shots Ratio (TSR) and Shots on Target Ratio (SoTR) derive their utility from their high repeatability and predictability; though they by no means tell the full story,we are more likely to be correct in predicting a team’s future performance within a season by looking at their TSR than by simply looking at their points or goal difference.

Shots ratios are also useful as they allow us to quantify how much a team dominates the shots tally. Again, despite not telling the full picture (just look at Derby County this season), we know intuitively that teams that can take more shots than their opponents will generally score more points than those that concede a greater share of the shots.

However, as the old cliché goes, goals change games. In fact, the change that takes place on the scoreboard can be seen no the shots chart, too. As Ben Pugsley, puts it in his primer for score effects in the Premier League: “The team that takes the lead in any given fixture is likely to sit a little deeper and take fewer shots. The team that is trailing will attack more and take more shots – especially as time begins to tick down.”

So do we see the same effect in the Championship? Using 2014/15 data, we can look at the shots ratios by each game state, where game state is simply the relative score (i.e. a team winning by one goal is at a game state of +1, while a team losing by four goals would be at -4). N.B. The vast majority of playing time and shots occur at the ‘close’ game states (-1,0, and +1), so due to the reduced sample sizes of more extreme scores, it is best to focus most of our attention on trends within close states.

Total Shots Ratio

TSR GS

Mirroring Ben Pugsley’s Premier League findings, we see that teams at +1 tend to be slightly outshot. The effect is not huge; their shot share is reduced from 50% to a little over 48% which is not a huge drop. This can seem like a strange behaviour. If teams are winning by one goal, why not push for more to extent their lead to a more comfortable 2-0, especially when they’ve proven that they can score?

The most popular explanation offered, when teams take the lead, it can be advantageous to sit back and create a defensive shell. By sitting deeper and sacrificing some of your attack in favour of defence you protect your lead. Furthermore, as the opponent struggles to break down the leading team, taking more shots from poorer situations, you can open up opportunities for the counter. Teams at +1 can therefore afford to take fewer risks and be more selective with their shots, taking fewer shots but from situations from which they are more likely to score.

So does the defensive shell hypothesis match the data? Well, we can test this by looking at expected goals by game state.

Expected Goals Ratio / Expected Goals per Shot

In short, expected goals (or xG for short) models are an attempt to weight different shots according to their likelihood of being scored. For instance, a  shot from 40 yards is generally not as valuable as one from the centre of the penalty box, and will therefore have a lower expected goals value.

xG game state

We can see here that despite taking 48.4% of the shots, teams at +1 have an xG share of 54.4%, fitting nicely with our defensive shell explanation and the bump in shot quality that teams at +1 and higher have can be seen even more clearly by looking at the average xG per shot (a measure of shot quality) and conversion by game state:

Rplot02 Rplot05

An alternative explanation

There is, of course, a competing explanation for these trends that I haven’t yet confronted. The sample of teams at +1 is likely to be biased; good teams with good players are more likely to spend time in a winning game state as opposed to bad teams and it would be unsurprising for these teams with better players to be taking better shots and converting at higher rates.

If this were the case, we would expect the trend in xG share to disappear when teams of similar quality played each other.

To test this, I split the league into 4 groups of six teams, ranked by their total xG ratio, and looked at the trends for score effects only in games in which teams of the same groups played each other. This resulted in the following plot:

Rplot06

  • Green (Top 6 xG): Bournemouth, Middlesbrough, Derby, Norwich City, Ipswich Town, Brighton-and-Hove Albion
  • Red: Watford, Nottingham Forest, Blackburn Rovers, Brentford, Reading, Sheffield Wednesday
  • Blue: Wolves, Cardiff City, Wigan Athletic, Millwall,  Huddersfield Town, Rotherham United
  • Black (Bottom 6 xG): Bolton Wanderers, Birmingham City, Fulham, Charlton Athletic, Leeds United and Blackpool.

As we can see, despite the teams being grouped by model ranking, the effect remains. This evidence is more suggestive of the defensive shell effect; however, I would expect that the alternative explanation of sample bias also plays an effect, though perhaps a smaller one.

Closing point

With all this in mind, it is incredibly important to get the first goal in the Championship. If we look at this chart of goal share by game state, we can see just how hard it is for teams to clamber out of a losing position:

Rplot07

Teams at -1 (losing by one goal) get back level just 12% of the time. Perhaps this is part of what makes the Championship such a volatile and exciting league. Even the best teams can be flipped onto their backs by a goal against and have difficulty coming back.

EDIT: Having shared this, Ronnie (@NotAllGlsEqual) shared an image of score effects in different time bands for the Premier League. This then inspired me to do something similar to show how score effects changed over the course of a Championship match:

mins75

Attack, defence and Middlesbrough’s title hopes

Earlier today, Jonathan Taylor looked at whether a strong attack or defend would be more likely to win you the league. He concluded that over the past 10 seasons, being the league’s top team in either scoring or conceding alone was not enough to guarantee automatic promotion. This makes sense; no matter how many goals you score, if you’re conceding just as many, you are unlikely to win a huge amount of your matches and vice versa.

The relationship between final position and goals scored and conceded is shown on two chart below, along with Middlesbrough’s projected finishing values, assuming they continue at the same rate as they have done so far. While there is clearly a relationship in each case, it isn’t especially strong.

Attack

Rplot17

  • As we can see, Karanka’s side are by no means tearing up the league with their attack; Watford’s 2010/11 side finished with more goals (77) than Boro are projected to score and finished 14th. However, Middlesbrough are clearly within the right ballpark and would not be out of place int he top 2.

Defence

Rplot18

  • This is where Boro come into their own; they are clearly have an elite defence, worthy of any title-winning side.

Combined attack and defence

So where does that leave us? Well, we can combine attack and defence into one metric, Goal Ratio (often abbreviated to GR). Goal ratio is calculated by dividing goals scored by the sum of goals scored and goals conceded. As a result, it gives a number from 0 to 1, which is the proportion of goals scored by a team in their matches. This allows us to account for teams like 2010/11 Watford who, despite scoring a lot, concede a lot too and teams who don’t score as much, but have a tight defence. In this metric, Boro’s current performance matches up extremely well against teams of leagues past. Note also how much less variability in each position there is than when we looked at goals for or goals against; this suggests that it is a much more effective way to evaluate current performance.

Rplot16

Can this be sustained?

The final question to ask, is can Middlesbrough maintain their current Goal Ratio over the rest of the season. To do this I will look at some simple shots ratios.

Shots ratios

This plot shows Middlesbrough’s share of shots on target (SoTR/Shots on Target ratio, calculated in the same way as goals ratio) and their Goal ratio. There is a fuller explanation of what the plot is here, but in short this is useful because teams which have much higher Goal ratio than Shots on Target ratio tend to be unlikely to continue scoring at the same rate in the future. We saw this early on in the season with Charlton, who have now dropped down the table. Likewise, Norwich have been dominating games’ shots all season and are now rising further up the table.

Middlesbrough

Fortunately for Karanka, the black line and dashed line remain fairly close together, which suggests that it is unlikely for Middlesbrough’s Goal ratio to change hugely before the season ends.

While recent results and performances have by not been excellent, Middlesbrough’ underlying numbers remain strong and the team is well placed to challenge for an automatic promotion spot towards the end of the season. This is somewhat mitigated by the fact that in their remaining 13 games, they play each of the rest of the top 6, with 4 of those games away, but they can take encouragement in the fact that their promotion hopes are in their hands and that Karanka’s tactical nous has served them well in games versus their promotion rivals so far this season, with their underlying numbers and results staying strong even in tough periods of games:

Middlesbrough

Visualising the Championship: Historical context charts

Last week, I wrote a post on Millwall, mentioning that their last 5 games had been historically bad. In fact, at the point of writing, Millwall had recorded the third worst share of Shots on Target over a 5 game stretch since the 2004/05 season. However, is there a way to easily show just how bad a team’s recent form has been compared to past teams’?

Historical context charts

In order to quickly compare teams’ performance to past Championship teams, I have constructed a set of charts looking at the proportion of teams with a superior score (shaded orange), along with where on the bell curve their performance lies (black/orange boundary) for four metrics: Total Shots Ratio, Shots on Target Ratio, Goal Ratio and Points Per Game.

For instance, the graphic for the same Millwall streak as in the post mentioned earlier comes out as this:

histcomp

As we can see, Millwall’s performance in each of these metrics is at the lower extreme. Their performance has been terrible; almost every team in the Championship since it’s rebranding has performed better over 5 games (in all of their 5 game streaks) than Millwall have here, as shown by the almost completely orange histograms.

For comparison, this is what a good team looks like:

histcomp

This shows Bournemouth’s most recent 5 games in their historical context. As we can see, despite their accumulation of points being merely “good”, their domination of Shots, Shots on Target and Goals tallies over recent weeks has been excellent.

Hopefully these are faulty intuitive and a useful tool for putting extreme performances into some sort of context.

Fear and fouling in the Championship

I think in football, and sports in general, it’s very easy to get by without questioning the conventional wisdom and intuition that tends to be ingrained in punditry and commentary. However, this does not mean that we should not question these ideas; it is important to challenge and analyse the status quo, even if the questions you ask initially appear simple, obvious and even stupid.

So in this post, I’ll be looking a little at fouls and the factors that influence them. All data used is from the Championship seasons 04/05 to 13/14.

1. Teams that foul more get fouled more

Seasonal Fouls

This is a plot of seasonal totals of fouls committed (x axis) and fouls committed by opponents. The data here suggests that 41% of the variation in fouls committed against your team over the course of a season can be explained by variation in fouls committed by your team over the same period. There is a clear correlation between the two variables, which is probably unsurprising, but nice to see nonetheless. As for the reasons, I would suggest it’s perhaps a conscious choice of teams playing these “rougher” opponents to foul more in return.

Interestingly, although again unsurprisingly, the same effect is not nearly as visible on a game-by-game basis:

Fouls (game by game)

2. Away teams foul more…

Home and away foul ratio

This plot shows the Foul ratio (Fouls committed as a fraction of total fouls) of home and away teams in the Championship on a game-by-game basis. We can see that the FR for away teams is slightly higher than that of home teams (i.e. the away team tends to commit a larger number of fouls than the home team). However, this effect is pretty small. Although it falls within the bounds of statistical significance and we can be pretty confident that there is an effect here, it takes a very large sample to be apparent (10 seasons worth of games). Furthermore, the mean of each group are incredibly close to each other: Home = 0.496, Away = 0.504. However, though the effect is small it could point towards a slight bias in refereeing against away teams (amongst myriad other explanations), which brings me on to…

3. … and appear to get punished more harshly.

Yellows per foul

Were, referee bias to be a real effect, we would probably expect away teams to not only get called up more frequently (which we do), but we would also expect to see away teams punished more harshly for those fouls which they do commit. As you can see from the plot above (Yellow Ratio = Yellows/Fouls), any effect is likely to be very small. In fact, the raw data above, suggests there is no significant difference in yellow ratio between home teams and away teams ( p = 0.16). However, when we take into account an additional bit of information latent in the data – namely that we would expect away teams to be punished more harshly than the home team in the same game –  a large amount of noise is removed. Consequently, the effect becomes far more statistically significant (p = 0.0019). As with the fouls, though, the effect remains very small. This suggests to us that as well as being judged to have fouled more, away teams receive more yellow cards than home sides per foul committed.

It is at this point I feel obliged to note a couple of things. Firstly, I am aware that I have written in a way that perhaps implies that the higher incidence of fouls and yellow cards in away sides is a consequence of referee bias. This is an easy and intuitive explanation, but there is no evidence here linking referee bias to this effect; the data only shows that there is an effect. Secondly, this effect is very small and even if it is due to referee bias, I would suggest that said bias is unlikely to significantly affect games in favour of the home team on a regular basis due to other, far more important factors.

4.  Everybody fouls

TSR vs FR

As we can see from this plot of seasonal Total Shots Ratio (a decent, if imprecise proxy for team quality), there is no correlation between good teams and bad teams fouling more or less than one another.

Furthermore, we don’t see an effect in teams that are dominated on a game-by-game basis either:

TSR (game) vs FR

It appears that teams getting outshot by their opponents don’t foul significantly more than their opponents. This could have helped explain the home/away bias were there a correlation; however it now seems unlikely. If I had the data to hand, I would like to see Foul ratio as a function of Game State (i.e. do teams foul more/less when winning/drawing/losing?).

So there we have it: there’s karmic balance in the distribution of seasonal fouls, referees could be a little biased, but it probably doesn’t matter, and bad teams aren’t necessarily dirtier. If I have the time, I may look at this topic again to see if average attendance has an effect on game fouls, as well as geographic proximity between teams. Anyway, thanks for reading.