Expected Goals (xG) helps us to understand the quality of a shot regardless of whether it ends up as a goal, is saved or is missed completely. Each shot is given a probability (between 0 and 1) of ending up as a goal based on a number of variables. Typically, xG is a combination of variables such as angle, distance from goal, assist type, and shot type. There are different xG models available. The data presented in this article is sourced from understat.com.
In short, the tighter the angle and the further away from goal, the lower the xG value. The graphic on the right shows the average xG value across different areas on the pitch. For instance, a shot from 40 yards out typically has a xG value of <0.01 (or <1%). Rounding up to the nearest percent, for simplicity, means that on average, we would expect a goal roughly once in a hundred attempts. In practice it works really well too, for example, it took Newcastle 121 attempts to score from outside the box this season.
Since xG has become available, it has been widely used to assess the quality of chances a team or player has over the course of a single match or a sequence of matches. For instance, let’s take the recent match between Tottenham and Manchester United. Ole Gunnar Solskjær’s side won 1-0 away from home (although home advantage has become less important). However, in terms of total xG (probabilities of all shots taken for each side), with 1.80, Tottenham had almost double the expected goals of Manchester United (0.96). Tottenham created better quality chances but De Gea’s heroics kept United in the game.
The same principle applies to individual players too. Andros Townsend’s goal against Man City in December - a volley from around 30 yards would be an expected goal less than 1% of the time (>0.01 xG). In other words, the average player would score from there once in a hundred attempts. Does this make Townsend a good finisher?
In both examples, over the course of a short period of time (such as one game), it is possible to ‘outperform’ xG. The key is to look at this for a number of matches rather than for a single match to eliminate elements of luck as much as possible such as red cards, injuries to key players, mistakes (referees and players), tactics and even confidence. Missing an open goal once is bad but missing twice... - you get the idea.
We know that xG measures quality of shots. Therefore, one way to measure a player’s finishing ability is to compare their actual goals scored against their xG over a period of n matches. Over a sequence of multiple matches, one would expect only the best finishers to consistently outperform xG.
Finishing ability = Actual goals - xG
Scoring consistently from low xG chances (i.e. from far away, tight angles etc.) - difficult chances that the average player would miss - would indicate a strong finishing ability. Whereas, a player consistently missing high xG chances could be classified as a poor finisher. If the margin is around zero, then the player is scoring goals as expected. Once we have the figures, we can look at the running (cumulative) difference between actual goals and xG since the start of the 2014/15 season (earliest available xG data).
It’s important to note that this analysis is not a measure for the quantity of goal scoring but for quality. The data is only for league matches and excludes penalties.
Messi scored 28.6 more goals than the expected goals tally, clearly making him number one in Europe. What is particularly impressive is perhaps his form over the last 40 matches in which he scored 13.4 goals more than we would expect the average player to score, almost the same amount he accumulated in the previous 120 matches (+15.2) - barely missing any clear chances (as indicated by the steep slope between matches 122 and 165). Messi really does score when others can’t. This along with the quantity of goal scoring (146 non-penalty league goals) quite possibly makes him Europe’s best finisher.
The latest arrival to the Premier League, Gonzalo Higuain, arrives with an impressive 96 goals (+15.5). Arguably his best spell came under Sarri at Napoli (+8.4 goals) and continued with Juventus (+6.5 goals). Perhaps a slight worry is that he is only up +0.7 since leaving Juventus - then again scoring the chances we expect is still good. Agüero, rated as one of the top strikers in the Premier League, is perhaps somewhat surprisingly further down the list at only +2.7. Arsenal’s strikeforce have 178 goals between them but it is Lacazette who is more clinical (+13.6). However, the data does include his time at Lyon, when he was in fine goalscoring form and he is only now starting to pick things up again after a short period of adjustment after joining Arsenal. In contrast, Aubameyang scored 6 goals less than his expected goals score in the time period and since joining Arsenal, he has not really managed to improve on that score. Son (+16.4) and Hazard (+13.8), both excellent finishers of difficult chances, are in Europe’s top 10, taking 4th and 8th position respectively.
In the context of xG, you may hear often that a team or player has been performing at an unsustainable level (either drastically scoring more or less than expected). As such, conclusions are often drawn that ‘performances should regress’.
So, is Messi just performing at an unsustainable level? Well, unlikely and probably not. This is largely because Messi has been performing at this level for almost the entirety of league matches since the 2014/15 season.
When referring to the idea of results evening out in the long-term, there is a key distinction to be made - there is regression to the mean and there is the Gambler’s Fallacy*. The latter, however, is centered around the false idea that if something happens more frequently than normal it will occur less often in the future.
In the context of football, one may hear that a team or player are due a goal because they haven’t scored in the last 5 matches in efforts to balance out results in the long term. Take Vardy’s record-breaking 11-match goalscoring run during the 2015/16 season as an example. It took 12 years for someone to break Ruud van Nistelrooy’s record and as such one might assume that following this event, Vardy’s goalscoring would likely regress. If we were then to look at the next 10 matches, the Gambler’s Fallacy would assume that the probability of Vardy not scoring any goals in the next set of matches is higher than normal.
Regression to the mean, would assume that following this rare event, the next results are more likely to be moderate results - or would more likely be in line with the mean of Vardy’s goalscoring ability. In the next 10 matches, Vardy ended up scoring four goals. To compare, in that season, Vardy scored 24 goals in 36 games. On average, that would be equivalent to 6.7 goals in 10 matches. The four goals that Vardy actually scored in the subsequent matches, although slightly less, are much closer to Vardy’s average.
So, are some of the elite strikers performing at unsustainable levels? Mostly not. On occasion, following a hat-trick, or in the case of Kane, back-to-back hat-tricks, they may return or regress to their usual goalscoring ability, which in any case is usually above average anyway.
* Since the Gambler’s Fallacy is based on the probabilities of independent events, we have to ask, how independent are consecutive football matches? If we assume static outside factors such as consistent teams, no injuries and suspensions and all teams being equal in ability, we would say football matches are independent events. To what extent do injuries to key players, confidence, memory and other psychological (or outside) factors make football matches dependent events?