Saturday 23 April 2016

Why the mystery of F1's greatest ever hasn't been answered - not even by science

So, that's that settled then. The sport's biggest and most ubiquitous bone of contention. Done. That one of who is the best driver ever, over and above the equipment they had access to. That one we thought near unsolvable given everything. All by a team of academics from the Universities of Sheffield and Bristol, and using statistical analysis.

According to the statistical study,
Juan Manuel Fangio's 
F1's best of all time
By Unknown - Museo Juan Manuel
 Fangio, reimpreso en "La fotografía
en la historia argentina", Tomo I,
Clarín, ISBN 950-782-643-2,
Public Domain, https://commons.
And when its news release announcing this was published just over a week ago it caused quite the stir. All seemed rather monumental indeed. Underlining the view even the (not necessarily always science-loving) Daily Mail proclaimed in its headline that what we had revealed before us was "the best Formula One driver of all time according to SCIENCE". Yes it actually capitalised the word. It was as if we were getting something irrefutable.

The first few on the study's all-time driver ranking - Juan Manuel Fangio top, followed by Alain Prost, Fernando Alonso, Jim Clark and Jackie Stewart - are hardly hideous. Michael Schumacher appeared low in ninth but when he was considered from before his first retirement in 2006 only he shot up to third, which again looked fair enough (as an aside, a curiosity about the reporting of this study is that there is more than one list floating about, in addition to the pre/post Schumi lists the one presented in the academic paper has Alonso sixth rather than his widely-reported placing of third, and it's not explained by the Schumi shift apparently as the Daily Mail article at least shows Alonso fourth when Schumi from 2006 and before only is considered).

But then it gets patchy. Stirling Moss is but 35th in the ranking (some 12 places behind Marc Surer) while the likes of Niki Lauda, Nigel Mansell, Alberto Ascari, Jochen Rindt and Gilles Villeneuve are simply nowhere to be seen in the top 50. It all gets, um, a little more interesting too as Christian Fittipaldi is in the elevated position of 12th best driver ever while the luminary that is Louis Rosier is placed 19th. And unless this pair were against just about all assessments in fact secret F1 geniuses never given their break in a good car - in addition to the drivers listed above being actually vastly over-rated, again contrary most assessments - it would seem the study has some shortcomings.

So what's going on? Having read the headlines, and heard the ire in response, I sought out the fuller academic paper on this study, called Formula for success: Multilevel modelling of Formula One Driver and Constructor performance, 1950-2014 - which is available online. Back in my university days I did for a brief spell similar sorts of statistical modelling (though about politics, rather than about F1 - and as with many things done at uni it seems rather distant now some years on) and this I thought combined with my F1 knowledge (stop laughing at the back) would give me a good chance of understanding what they had actually done and therefore by extension how they had come up with their ranking.

In the paper's own words the scientific model aims to "find out which driver, controlling for the team that they drive for, is the greatest of all time". It also seeks to judge how the influence of the team as opposed to the driver alone in F1 success has varied over time (it's grown, unsurprisingly) as well as in a slightly curious detour looks at who is the best wet-weather driver (they conclude it's still Fangio) and the best on specific types of track such as permanent and street circuits.

Yet the first thing to keep in mind in all of this is that for all of its merits, and as outlined sometimes sheer reverence, science cannot do magic. And any scientific model is only as good as the data fed into it, as well as is only as good as its assumptions applied. Garbage in; garbage out, as the phrase goes.

And there's some of it here. You don't have to read too long into the paper to find our first potential sticking point. If you thought the study had access to some mysterious data we'd never clapped eyes on before then you're wrong, the 'outcome' they use as their measure of greatness in their model - the 'dependent variable' as scientists call it - are the championship points each driver has scored in F1.

The scoring system used from the start of th
1991 season is used as an equaliser
By Stuart Seeger from College Station, Texas, USA -
Grand Prix Start, CC BY 2.0, https://commons.
Not much wrong here, you might think. After all it's used to decide the ultimate prize of who wins the F1 world championship, and ultimately F1's a results business. As an equaliser they apply the same points system throughout (that used between 1991 and 2002: 10-6-4-3-2-1) with fractions used for finishing positions lower than sixth, and the size of the overall field which has varied over time is controlled for. All of which is fair enough. Using points and finishes as your measure also has the benefit of completeness, given all F1 world championship results since the start of the championship in 1950 are easily accessed.

Yet some thinking about it all uncovers problems. As I outlined in a recent article for Grand Prix Times even the measure of finishing and points totals can be a crude, perhaps misleading, measure of driving quality, given the influence of dumb luck. As I demonstrated too in an article for Vital F1 last year the proportion of F1 cars that reach the end of races has increased over time, given cars' reliability and general robustness has improved, while perhaps too modern circuits don't punish error as ruthlessly. So it's not a consistent measure on this front either.

And the paper explains also that failures to finish are treated pitilessly by the model, in effect as low finishing places. If you retired first from a race for the purposes of this study you finished last.

The academics in the paper claim to account for this though. "We do not need to treat driver failures and team/car failures differently" it says, "the model will automatically apportion the latter into the team or team-year levels so they will not unfairly penalise a driver who suffers such failing".

I'll explain more of this later but here, in other words, it seems to assume that all cars within the same team will have the same reliability, so the model will only punish a driver if they have more DNFs than their stable mate, and if that's the case that'll be because of something the driver's doing wrong. It's better than nothing I suppose but it doesn't necessarily cover everything, given for example that for much of the sport's past it was demonstrable that the 'number 2' machine in a team would have less care and attention and therefore would make the finish less often. This applies especially to teams with lower budgets, but even at the front, and even in the modern age, some of us have been given cause to muse "why does it always happen to Rubens? To Felipe? To Kimi?" Sheer random chance of mechanical woe we've mentioned too. Whatever is the case, the cruise and collect pilots in this study are rather generously rewarded.

The unlikely figure of Louis Rosier is 19th in the ranking
By Noske, J.D. / Anefo [CC BY-SA 3.0 nl (http://
nl/deed.en)], via Wikimedia Commons
As Mark Hughes explained indeed in response to Rosier's place in the ranking, he "used to drive privateer Ferraris in the early '50s at relatively sedate endurance-like pace and consequently had a good finishing record". At a time when as noted non-finishes were relatively common too, Rosier apparently has been well elevated in this model. Perhaps it explains too why the more win-or-bust drivers such as Villeneuve are so ill served apparently. The paper also explains James Hunt's non-appearance in the top 50 by his comparatively poor finishing record.

Other factors considered important in judging a driver, such as lap times, qualifying speed and the like, are not factored in. In a way it's understandable as the points/finishing outcome has as noted the benefit of completeness as well as that getting hold of all race lap times, certainly from F1 races way back in the day pre-electronic timing, would be a mammoth undertaking. Again though whatever is the explanation it seems rather a shortcoming, certainly when compared with what F1 observers tend to take into account when judging the best drivers. The paper indeed acknowledges as much, saying "it could also be interesting to see how these results differ when qualifying positions, or fastest lap times, are used as the response variables". Quite.

But in terms of what the model does, statistical models like these work on the basis of the ability of a certain piece of data to predict an outcome of interest. You take the outcome (in this case, a driver's points) and throw in various types of data, or 'variables', into the model you think are related to that outcome. Then you measure the significance of each individual variable by measuring its ability to predict the outcome from just knowing that data, holding all other types of data in the model constant.

Still with me? Good. And from what I can tell the main 'types of data' in this model are the driver's points, the team's points (i.e. what the team has done in F1 since the dawn of time, and we can argue too as to the extent that's a helpful measure) and what the authors of this study call the 'team-year', which is the points the team got in the particular years that the driver was there. So in this case to take the example of Fernando Alonso, the model measures how many points Alonso 'should' have in his F1 career just from knowing the teams he drove for as well as what those teams did generally in the years he drove for them. And what Nando's actually got in reality over and above that is taken as Nando's personal contribution. And therefore his measure of greatness.

All sounds reasonable enough, but there's a problem. Which is the rather titchy base size of all this. The model would likely be more valuable if several people drove any F1 car in a season, but as we know for the most part there are only two, and the 'team-year' part therefore is in effect a comparison of just two pilots. And as the paper acknowledges almost apologetically, "the model really tells us how drivers perform against their team mates".

The study measures essentially how good drivers such as
Fernando Alonso are at beating their team mates
Photo: Octane Photography
So there we have it. It's a study essentially of who most conclusively whipped the guy across the garage in their F1 careers, and by the crude measure of their finishing place and by extension of bringing the car home. And you've probably worked out a few of the myriad problems with this. One is that the quality of team mates will vary - Alonso could tell you that beating Lewis Hamilton is rather different to beating Tarso Marques. Complicating matters generally your team mate will be of a better quality the better car you're in - which again the paper acknowledges, saying "team mates are not randomly selected since good drivers will self-select into good teams".

Further complicating matters sometimes team mates are chosen pretty much explicitly on the grounds that they won't challenge the top driver - reflected in the sport's maxim of the follies of putting two roosters in the same hen-house. It all rather muddies the waters given this study purports to control for the effect of the team. In other words, those paired up with a idiot, perhaps at a tail end team, start with an advantage as far as this study's rankings are concerned.

The paper suggests indeed this is why Christian Fittipaldi ended up in the haughty position that he did in their ranking, as he "consistently outperformed his team mates, and because he never raced for a 'good' team, the standard required to get a high ranking is lower. More specifically C. Fittipaldi's team mates had relatively high rates of retirement: he gains his high ranking by being able to successfully keep a relatively poor car on the track". They might have added what I outlined above that teams towards the back may be less likely to produce two cars to the same level of mechanical preparation, which may have aided Fittipaldi to "outperform his team mates" and "keep a relatively poor car on the track", working on the premise that he often was his team's 'number one'.

Another problem is that the playing field between you and your team mate will not necessarily be level - plenty of F1 drivers, most great ones, have benefited from strict number one treatment after all.

Moss's low positioning perhaps indicates another of the study's flaws. I dare say that being paired with Fangio for a year in 1955 didn't help him in this, neither that he later spent two years in fast but unreliable Vanwalls with strong team mates. But there's a bigger possible problem. Indeed the paper hints at it ever so gently when explaining its auxiliary study of how the relative influence of driver and team has changed over time, in which it's used data from 1979 onwards only. "The reason for this" it says, "is that, prior to this date, the team-structure of F1 was less clearly defined". But it hasn't taken this to its logical conclusion.

Stirling Moss has not been well
rewarded in this study
By AngMoKio - Own work, CC BY-SA
And Moss's case with teams is less straightforward than most, given that from 1959 onwards rather than be at a classic two-car constructor as we know it in the modern age he drove a series of privately-entered machines for Rob Walker, which leads us to the question of how exactly this was treated. Was Moss considered in 1960 and 1961 by the model as if he was driving a works Lotus, or for a separate team? The former would be rather unfair on him but the paper doesn't explain either way. Which brings me to another point. What happens in teams with only one driver, of which there were plenty in the sport's past including Moss with Rob Walker?

We may also be able to get the beginnings of our explanation of why some such as Lauda and Mansell, routinely considered great, haven't shown up well in this particular study. In addition to having long spells in their careers when they weren't trouncing their team mates, I'd imagine the overall team legacy variable hit them given they both drove for teams at hardly the most auspicious periods of their existences (in Nigel's case Lotus, in Niki's BRM and Brabham).

There is too a general problem of low bases which may explain the rather volatile outcomes. As mentioned already drivers for the most part have only one team mate, and perhaps too there's a problem that Grand Prix careers aren't sufficiently long to give us a robust base size in our data. After all our man at the top Fangio only started 51 Grands Prix; Alberto Ascari who missed out altogether only 32. Even the very longest F1 career so far only has 300 odd races. I dare say there aren't many statistical analyses published that are based only on 51 cases. Or even on only 300.

As for team-year, as my recent Grand Prix Times article outlines a single season isn't enough time for bad luck with mechanical unreliability and otherwise to even out. Bear in mind too that the first F1 season had just six races in it...

Perhaps acknowledging the various limitations the academics warn in the paper indeed that their "claim of who is the 'best driver should be treated with an appropriate degree of circumspection" as well as that "there is substantial uncertainty around each of the drivers' residuals". It seems a few that wrote the headlines didn't read this part though.

While Hughes summed up the study's ultimate shortcoming on the Motor Sport Magazine website. And it is one of genesis. "The basic premise that it is possible to accurately model the miraculous mix of neurons, psychology, hydrocarbons and elastomers that is this sport is deeply flawed" he said. "Just like telemetry defining driving technique, it is tail wagging the dog. Our understanding of the processes involved is so incomplete, we have an imperfect idea of how the picture is formed but a visceral appreciation of the quality of the picture. We can know the level of performance of great drivers without resorting to statistics - and if working the other way, the model does not support that picture, then it is the modelling that is clearly wrong, not the picture".

Indeed. The biggest problem this study has is that its subject matter of F1 is extremely complicated, and there are plenty of things influencing it - known and unknown - that for the most part we cannot possibly begin to hope to measure. The more you delve into the past the harder it becomes too. Given these when judging quality we rely on qualitative assessments rather than only on the quantitative.

We're probably best off sticking to that too.

No comments:

Post a Comment