It sounds simple on paper.
“Say we had the historical data on Bayern Munich – which we do – you could then create the model for when Bayern Munich won the Champions League.”
Paul Power, an AI scientist at STATS, is talking about a technique called ‘ghosting’. In layman’s terms, this involves feeding data in at one end and the computer learning how a football team would respond in any given scenario. You can adjust the settings to reveal what a hypothetical ‘average’ team would do, or discover how a really good team side – with really good players – would react. And, in theory, you can compare it to your club’s youth players.
“So, we have [Philipp] Lahm, we have [Bastian] Schweinsteiger, we have [Thomas] Muller,” Power continues. “OK, now when we look at our young players playing a game, how similar is their defensive behaviour or attacking behaviour to these guys?
“The whole idea should be that you can start to identify ‘well actually this guy has the same movements as Lahm, he has the same defensive decision-making as Lahm, and he can recover or attack in the same way.”
That’s not to say computers will replace humans by deciding which academy prospects make the cut, but the potential for identifying subtle aspects of a youngster’s game could be huge. But before that, there’s an extraordinary amount of brain and computing power required.
Football is awash with experts. An entire industry of advanced technology exists behind the scenes, supplying clubs with expertise in statistical modelling, artificial intelligence and even particle physics. There’s a surprising amount of cross-over between different sports too.
‘Ghosting’ has been applied in both football and basketball, two sports where “a lot of the approaches can be very similar, [and] a lot of the questions are the same too.” Those are the words of Jennifer Hobbs, another AI scientist at STATS.
“Obviously at the algorithm level you’re making changes to account for the different sports,” she says. “But they’re similar to the extent that certain principles are still there, like spacing and timing. That’s why you can re-use a lot of the principles that are driving the analysis in both.”
It seems intuitive that sports such as basketball and football share some transferable qualities, but Formula One? SBG Sports Software is a company which started out in the motorsport industry, before realising their expertise in machine learning could also be applied to football.
It took a couple of years to complete the transition. The first task was designing the product and collating different data sources into a tool that could be used by analysts, before polishing it into something analysts would want to use.
The individuals working on this technology don’t necessarily come from footballing backgrounds, but they do talk to football people and consider football-specific problems. As SBG Sports Software product specialist Jack Heggie says: “It’s very much analyst-led.”
“It’s not us dictating to the clubs. They come to us and say ‘would it be possible to do this?’ and we go ‘right, let’s have a look’. We investigate it and we’ll build something as close to what they want or as far as we can go.”
Canadian company SPORTLOGiQ have used a similar approach. Pressing is an important part of the modern game and feedback from coaches and analysts helped SPORTLOGiQ refine their definitions.
“If you only look at movement towards the ball-carrier then you might have people too far away,” says Head of Soccer Development Daniel Nahmias-Léonard, listing situations that could fall in the grey area between ‘pressing’ and ‘not pressing’.
“You might have a centre-back stepping up to close down the space to a forward who receives the ball with their back to the goal. Or containment, where you’re going to get forward and then stop and kind of leave the ball-carrier. If it doesn’t speak the football language then it’s not doing the job properly.”
The company, like SBG Sports Software, began life in another sport – hockey – before recently moving to football.
“I was just very clear about ‘I don’t care what technology exists today’,” says co-founder Craig Buntin, a former Olympic figure skater. “’I don’t care what’s possible [or] what’s not possible, let’s talk about what needs to exist in the long run and let’s put together a long-term plan to invest in research, to bring in as many researchers as we need to actually build that.”
Build it they did. SPORTLOGiQ now have a system that can gather ‘event data’ (think passes and shots in football) as well as ‘tracking data’ (the movement of players) from video. And it doesn’t stop there. The model can recognise the position of players’ hands, feet, torsos and heads at any stage of the match.
“We’re really only starting to skim the surface of what we can actually do,” says Buntin. “But at the end of the day, what we have built is essentially a 3D motion capture system for the entire pitch at any time.”
For some, the absence of footballing baggage can bring a fresh perspective to proceedings. Will Spearman is a former particle physicist who moved into football after spending a brief time working at the European Organization for Nuclear Research.
“A lot of the work has involved the kinematics of the ball, the time to intercept of the players – kind of solving very simple physics equations,” he says.
At Hudl, a software company which helps provide and analyse video in sports, Spearman helped produce work that quantified pitch control and scoring opportunities for players who are off the ball and in space.
Below is an image taken from his recent paper from the MIT Sloan Sports Analytics Conference. It builds up a sequence of statistical models that culminates in an ‘Off-Ball Scoring Opportunity’ model (OBSO).
First comes an Expected Goals diagram, followed by areas of the pitch that the possession team are likely to ‘control’, and then the likelihood of where the next on-ball action will take place. The final image shows the OBSO.
Just like the ‘ghosting’ mentioned at the start of the article, this model uses tracking data to capture each player’s position and current trajectory. The data can even be input into virtual reality systems, allowing players to experience a sequence of play time in pre- or post-match analysis. There are, however, some limits to tracking data.
“You have to understand biometrically how the body is moving in order to begin defining the way that certain players play,” Buntin says.
“Tell me how this was happening, how was that pass received, what was going on when this player or the other player jumped a bit higher and managed to head the ball. Describe that to me without using body joints.”
It’s a big effort to collect such information (158 million data points are extracted from a single hockey game), but it provides insight – such as the way a player is facing at any given moment – that tracking data cannot.
According to Ted Knutson, co-founder of Statsbomb, tracking data isn’t very scalable, which can make it difficult to work with. That’s why Statsbomb have started collecting their own data, a hybrid between tracking and traditional event data.
“A lot of arguments that people have [about data], coaches are like ‘well, this doesn’t have where the defenders are, it doesn’t have where the goalkeeper’s at [when shots are taken]’ and we were like ‘well that’s solvable, let’s see what we can do’,” Knutson explains.
As well as the traditional event data, Statsbomb are now also collecting ‘pressures’ – simply, whenever a player pressures an opponent. While work has been done on pressing before, Knutson says he’s “never seen anybody really invert that and say ‘what happens to players that are being pressed?'”
“At the top levels you need players who are at least somewhat pressure resistant, and we’re going to be able to use that data and information to help find those players. And then what happens to your passing profile when you’re under pressure versus when you’re not and we’re going to be able to find that too.”
It’s not just players who are under pressure in football. The sport moves quickly, and so does the fixture schedule. Data processing can’t always keep up, particularly when you’re dealing with the millions of data points that tracking data throws into the mix.
“Because there’s so much data, if you’re not careful it can take a very long time to actually run everything,” says STATS’ Jennifer Hobbs. “If you actually want teams to use it, if you want it to be production-worthy, it’s got to run in a reasonable fashion.”
However, the work of club analysts has to be of the same high quality every time, “no matter if it’s a Saturday-to-Wednesday or a Saturday-to-Saturday fixture,” says Jack Heggie, who moved towards the data side of football while working as an analyst himself.
“You can do all this fancy analysis [on a longer Saturday-Saturday turnaround], but if you can’t replicate that on a Saturday-Wednesday then the coaching staff will go: ‘I need this now’.”
Automating certain processes is therefore a big attraction. At Hudl, the aim was to be a ‘force multiplier’ for their clients’ existing personnel – Spearman estimates off-hand that his work on off-ball scoring opportunities “might be something that means [club analysts] won’t have to watch 10 hours of opposition footage in order to identify how they might create those opportunities.”
Automating processes to quickly highlight certain features of a match – say, line-breaking passes or dangerous chances – would change the way analysts work, explains SBG’s Commercial Director Simon Cuff.
“The workflows change from saying ‘what I’ll do is I’ll manually analyse five games’ to ‘I’ll use automation to analyse 20 games and get more insight out of it because you can see patterns of play’,” he says.
With such a quick turnaround for their match analysis, it’s no surprise club analysts have little time for anything else. As such, much of the advanced work is currently being done by external companies rather than in-house at clubs, although there are signs that this is beginning to change.
“Maybe it’s a trend,” says Cuff, “and you’ll find in a year’s time there’s 10 clubs going ‘look we’ve got a development department now, as well as a data scientist and we’ve got a C++ programmer’.”
The German football federation (DFB), for one, are taking matters into their own hands, with a planned €110m football academy – a ‘Silicon Valley’ of football based near Frankfurt – set to use as much video and data as possible.
What the landscape will look like in 10 years’ time is anyone’s guess. It’s difficult enough to predict where it will be in two years, but there’s a palpable sense of excitement among those working with such technology. With competitive edges always being sought, it shouldn’t be at all surprising that top clubs are investing in this area.
Competition is also fierce between the companies involved, even though there are several similarities between them: the high-level expertise, the constant refinements and the desire to mould the technology around football knowledge, rather than the other way around.
Unspoken amongst all of this is the conflict, to whatever degree it exists, between ‘football people’ and ‘data people’.
“Being pigeon-holed as a company that’s an analytics firm is really the wrong way to look at this,” says Buntin, who explains how SPORTLOGiQ previously tried to make their data ‘speak the language’ of hockey to maximise its impact.
“I know that’s sort of an abstract concept,” he continues. “We’ve got to a point now where the data and the AI algorithms are essentially turning this into something that can see and understand and describe the games the way a person does. It’s not an analytics company.”
He’s right. It’s not an analytics company, it’s a football company.