top of page
Search

Expected Pass Success Model - Just how good is Trent Alexander-Arnold at passing the ball ?

  • Emmeran
  • Sep 22, 2020
  • 8 min read


Trent Alexander-Arnold has had an incredible rise to the top of world football. An essential part of his game is passing. That is a quality that defines many of the game's best players. But how can we actually quantify how good a passer of the ball a player is ? Of course, we can look at passing accuracy. The table below shows the 10 players (who have attempted over 100 passes) with the best passing accuracy in the English Premier League during the 2017/18 season. What we can immediately notice is that centre backs seem to be the "best" passers in terms of passing accuracy. As much as I loved Ragnar Klavan, we can all agree that he probably does not possess better passing quality than the likes of Kevin de Bruyne or David Silva. This is a consequence of many teams passing the ball around the back. So defenders play many "easy" passes, which in turn makes them have a high passing accuracy.



Passing accuracy is an interesting metric but is not enough by itself to quantify the passing quality of a player. Good passers of the ball create chances to score goals through their passing. That comes with taking risks, playing difficult passes that go in behind or through the defence. Passes that break the lines. Naturally, it is very hard to make such passes and so creative players who try will often fail and hence have a lower passing accuracy.


So how do we actually find such players ? We create a model that given certain parameters tells us the probability that an attempted pass will reach its intended target. By probability, we mean how likely it is that the average player will successfully make that pass.


The data we are using is data on the 2017/18 English Premier League season made public online by Wyscout. Parameters that we have access to are the coordinates of the origin of the pass and of the destination, what type of pass it was and at what time in the match did it happen. From this we calculate the distance of the pass as well as the polar coordinates of the location of the player when attempting the pass, i.e the angle as well as the distance from the centre of the pitch of the location of the player. We will also use these as parameters. There is a certain level of subjectivity when it comes to the type of pass, with some classified as 'Smart Pass'. We won't use those as labels but will for example use 'High Pass', 'Cross' or labels that are less ambiguous and open to interpretation.


But before we start getting into creating some models, let's check whether some of the parameters we have are of any use for the problem at hand. The figure below is a heat map made from all the passes of the season that shows what proportion of passes attempted from each section of the pitch were successful. We see that the likelihood of the pass reaching its intended target does seem to depend on where it is taken from. Passes in the corners of the opponent's half tend to fail more than passes in the centre of the pitch. This agrees with what we know about football as crosses are typically attempted from that area of the pitch and are certainly amongst the passes that more rarely reach their target.



So let's fit our model! We are using a logistic regression model. We won't go into the details of what that is exactly or how it works mathematically but there is plenty of documentation online if you are unfamiliar and interested (https://rb.gy/eilcx1 gives a nice simple explanation). It works in exactly the same way as the expected goals model xG if you know what that is. It's more and more seen on TV when a player takes a shot or scores a goal, it gives the probability that that shot would have resulted in a goal.


The model trains on data that you give it where there are passes that have succeeded and some that have failed along with all of our parameters. It then tunes the parameter coefficients that go into a logistic function, which gives us the probability of the pass being made. Obviously there are passes with very similar parameters that have different outcomes. That is because there is a random element to why a pass fails. This is why the model does not give us a certain answer such as: "the pass will for sure succeed". But instead, it tries to fit the data as best possible and then say: "in 90% of the cases, this particular pass is successful".


Let's start with a simple model to get to grips with what we are doing. We will use just where on the pitch the pass is being taken from. So that is, the Cartesian coordinates as well as the polar coordinates of where the pass is being made (and some interactions terms). There is redundancy in including two sets of equivalent coordinates but as we are modelling these parameters in a linear way before applying the logistic function, this will allow us to identify more interesting trends. This model gives us the below figure showing the likelihood of you making a pass according to where you are taking the pass from. The contours are (almost) circles (from the polar coordinates) that are slightly shifted towards the team's goal (from the Cartesian coordinates). We find a similar trend to what we could see in our first heat map from earlier. It is less likely that a pass is accurate as you progress into the opponent's half and specifically in the corners.

This relatively simple model is interesting because it is easy to visualise. It is not however particularly insightful. Two passes taken from the same place on the pitch can be completely different. From the centre of the pitch, a simple 5 yard pass is not the same as a 40 yard ball over the defence to send your striker through on goal. Robin Van Persie's goal against Spain at the 2014 World Cup is a perfect example. That pass is a lot harder to execute than a ball back to your central defender. There is a lot more that influences the likelihood of a pass being successful than just where it is being taken from. Let's thus fit the full model with all the parameters. At this point, we should spend some time analysing the fit of the model and which variables are truly statistically significant. However in the interest of simplicity, we can relax the rigour of our approach slightly and use this full model for further analysis.



Let's look at an example of the model in practise. Consider Trent-Alexander Arnold's assist for Mohammed Salah's goal in Liverpool's 3-0 win against Bournemouth in April 2018 (video below). It's a pretty exceptional pass and goal. But what does our model have to say about it? According to our model, Alexander-Arnold's pass has a 20% chance of reaching Salah. This is low but not remarkably low, especially given the pass we are considering. What is happening ? This is what the pass looks like in the data:

The tags specify an attribute about the pass. 1801 indicates it was successful, 901 indicates it was a through ball and 301 indicates it was an assist. There is not so much ambiguity here, it is usually pretty clear when a pass is a through ball or not. Now under 'subEventName', there is 'High pass'. So the individual that recorded the data classified it as a 'High pass', which arguably it is. There is also however a case to be made for it to be considered as a 'Cross'. In that case, the probability of success drops down to 8%, which perhaps captures the difficulty of this pass more appropriately. In either case, for a pass, this is quite low as usually we are dealing with values in the 70-80%. Hence, our model agrees that the pass was of real quality. One could then continue to analyse the sequence of events by using an expected goals model to assess how likely it is for Salah to score his header. I presume it would give a rather low goal probability as Salah did exceptionally well to score a header from just inside the penalty area.



Finally, we can use the model to assess how good Alexander-Arnold was at passing throughout the season. We do this by running every pass TAA attempted that season through our model. We then sum all of these probabilities to get his expected number of successful passes. By comparing this to his actual number of accurate passes, we get an idea of his qualities as a passer of the ball.


We find that TAA was expected to complete 793 passes out of 1029 during that season and he completed 797. So he completed approximately as many passes as would be expected of an average Premier League player attempting the passes TAA tried that season. Looking at the breakdown of the types of passes he played, he seemingly does better in his crossing. It is not highly significant but it may become more notable if he starts crossing the ball more and more.


It is important to note that this was Alexander-Arnold's first season playing regular first team football. The fact that the model ranks his passing as above average remains a notable achievement for a player just starting out. It is in the following seasons where he truly became a starter for Liverpool and collected assist after assist. In the 2017/18 season, he actually only managed one assist and it is the one we took a look at earlier. It would be of real interest to analyse his passing in the more recent seasons but unfortunately the data is not publicly accessible.


We can take from this analysis that passing seems to indeed be a real quality in Trent Alexander-Arnold's game. However, in his first season in the Premier League it was not head and shoulders above everyone else. With that being said, it has most certainly improved since then. His assist tally certainly has.


We can wrap up by saying that we have developed an interesting Expected Pass Success model to analyse Trent Alexander Arnold's passing. However, this model is far from perfect. If we look at the players that are making many more passes than they are expected to according to our model, many goalkeepers come out on top. This results from them making long yet not so difficult passes. The distance of the pass is an important parameter for our model that does not always accurately reflect the probability of success of a pass. More generally, there are lots of factors that we are not considering. The position of the opposing players, whether it was taken with the player's strong foot or not, the speed of the ball as the player is getting ready to hit it. These are amongst many variables that could be interesting to include to improve the model in the future. In fact, as we improve the model further perhaps the somewhat arbitrary decision of whether a pass is a 'High pass' or a 'Cross' might lose its influence on the model as the data might implicitly account for that. We could then use that model to determine more accurately what defines (if anything) the difference between those two types of passes. Of course, we would need more detailed data such as tracking data to develop all of this.


I hope this has been an accessible, insightful and interesting read about football analytics. I would be delighted to have some feedback and hear what you think. Thanks for reading :)

 
 
 

Comments


  • Facebook
  • Twitter
  • LinkedIn

©2020 by StatsBall. Proudly created with Wix.com

bottom of page