When analysing batsmen, the notion of "consistency" is frequently discussed - the mark of a player who scores well and does it frequently enough. A batsman with a wide variation in scores, with a hundred in one game, and a failure in the next, is colloquially deemed erratic, and thus "inconsistent". But how does one exactly quantify this notion of consistency? Is there a rigorous measure that can pin down this criterion, which is abstract but often bandied about criterion?
There have been multiple forays into measuring consistency. Ananth Narayanan has used the concept of the median - which is the value that lies at the midpoint if all the scores of a batsman are arranged in an ascending or descending order - to measure how lopsided a batsman's distribution of scores is. In another analysis, he broke batting careers into ten-Test chunks, and measured how many of these phases were low-scoring, as a proxy for consistency.
More simply, though, let's go back to the simpler definition of consistency I outlined above. Someone who scores "well enough" frequently enough is a consistent player. Let us begin our analysis at this first step.
We must define a score that is "good", and look at how often players cross that barrier. For a start, let's say 50 runs or more makes for a good innings, and look at when batsmen cross that score. For instance, here is Virat Kohli's Test career, with his 50-plus scores in blue.
In 145 innings, Kohli has scored 50 or more runs 49 times. If we look at the grey gaps between his consecutive crossings of 50, the average "wait" between two such good innings is 1.9 innings. This means he scores 50 or more every 2.9th innings on average. We shall use this number to compare players.
Although the average gap between consecutive crossings of any barrier is correlated with the batting average, it is an enlightening way of looking at things: for how long do good batsmen "fail" on average before making a good score again?
Let's first look at the average gap between starts, which I define as crossing 20 runs. Here are the best batsmen by this measure, considering players with 5000 or more Test runs. Just to clarify again, a "mean gap" of 2.0 means the player reaches 20 every second innings in his career on average.
As we can see, the best players get starts about every 1.5 innings on average. Don Bradman, as usual is in his own league, at 1.34. Test players with the highest averages dominate the top of this table, as is expected. Michael Hussey, despite having a lower average than the table toppers, scores high on this measure.
What is this gap when we raise the barrier to 50 runs, which can be considered a successful innings across situations and conditions?
Two notable new entrants in this table are Misbah-ul-Haq and Joe Root, both pillars of their middle orders. Despite having lower runs-per-innings figures than others on the list, they cross 50 every 2.45 and 2.63 innings respectively, which is a testament to their reliability. Doug Walters went past 50 in close to 40% of his knocks, and it shows in his high rating here. Brian Lara and Kohli are placed towards the bottom of this top-30 list, though their averages are high; that indicates their erratic scoring, with high peaks.
However, just looking at the average gap between good innings does not tell the full story. The actual spread of these high scores tells us more: how regularly a batsman has scored well. The average gap tells us the frequency of good scores, but what if those scores occur irregularly in a career? A batsman who crosses 50 with regularity is more consistent compared to one who goes through patches of very frequent scoring, mixed with lean patches of strings of low scores.
To illustrate this, let's look at two simple hypothetical careers of 20 innings each: Pauli and Erwin. Each has crossed 50 ten times, but the patterns of their scoring are very different. In the graph below, a 50-plus score is denoted by a blue bar, and all other innings are blank. We can see how Erwin's scores occur more haphazardly than Pauli's, who has crossed fifty with perfect regularity.
Both careers have the same average gap between crossings, which is every second innings. However, if we look at the sizes of the gaps themselves, we see a difference in the players' spread of good scores.
For Erwin, the gap between the first two fifties is two innings (he did not cross that barrier in innings three and four). His subsequent gaps are 0, 2, 0, 0, 5, 0, 0, 0 innings. On the other hand, the gaps for Pauli are all one innings each. Visually, it is simple to see how Pauli's career is more consistent.
Is there a way to quantify this? To boil it down to one number, we calculate the standard deviation of the gaps. We add up the square of the distance of each gap from the average gap, and take the mean of that. Remember, the average gap is one innings for both, since both cross 50 every second innings.
For Erwin, this would look like:
( (0 - 1)^2 + (2 - 1)^2 + (0 - 1)^2 + (0 - 1)^2 + (5 - 1)^2 + (0 - 1)^2 + (0 - 1)^2 + (0 - 1)^2 ) / 9
We then take the square root of that. For Erwin, this gives us 1.6. For Pauli, it gives us 0.
Simply put, the "spread" of these gaps is higher for a more inconsistently constructed career, given the same average gap.
This example tells us that the spread of gaps divided by the average gap is a good measure of the regularity of scores in a career.
To summarise the method: we first define a barrier and label the crossing of that barrier as a good innings. We take the gaps between these and see how much they fluctuate. Let's plot these two quantities for all Test batsmen with more than 5000 runs, taking 50 runs as our barrier.
This shows the spread of these gaps is highly correlated with the average gap. To account for how much a batsman's scoring pattern actually deviates from this relationship, we divide the spread by the mean gap. This tells us how evenly scores are spread relative to a player's own frequency of scoring.
Remember that in our Pauli-Erwin example as well, it was important that they both had the same average gap. So we have to divide this spread by the average gap if we want to meaningfully compare batsmen.
Let us call this measure the Spread index, SI, which is obtained by dividing the standard deviation of gaps by the mean gap.
For the 96 batsmen under consideration, here is a table of those with the best SI, considering the barrier for a good innings to be 50 runs.
After Bradman, Greg Chappell scores more than 50 most uniformly. Rohan Kanhai shines with a low mean gap coupled with a low SI; his consistency of scoring brings him up to the third spot. Younis Khan, who has an average of more than three innings between fifty crossing, scores them with relatively high regularity.
A lower SI is generally better. However, since the SI is a function of the mean spread and the deviation, higher values in both will lead to the same SI as a player with lower values in both. The comparison between Younis and Root makes this clear: Root has a much smaller mean gap between fifties and spread, but his SI is the same as Younis'. Hence, for the full picture, we need to show the mean gap as well.
A batsman with low values of both crosses 50 often, while also doing it evenly, and a scatter plot of the two lets us compare the two facets of batting between players.
Kane Williamson and Root are very similar in the pattern of their 50-scoring, and although Steven Smith's relative regularity of scoring is similar to theirs, he crosses 50 more often. On the other hand, Kohli, despite having a high average, goes through highs and lows, which places him north-east of the other three. Cheteshwar Pujara is less prone to patches of good and bad form than Kohli, but his scoring is also slightly less frequent.
Surprisingly, Sachin Tendulkar and Lara are almost at the same frequency, but Tendulkar scored a little less regularly than Lara, which is expected in such a long career.
Finally, let's look at the same plot with a barrier of a century. When it comes to making hundreds, the most noticeable change is Kohli shifting into the elite, and Ken Barrington scoring hundreds highly irregularly.
In fact, we can use his case to illustrate the utility of the Spread Index: Barrington has scored hundreds as often as other elite batsmen, but they have come in bunches, separated by long century-less streaks. This bunching is displayed by his place on the plot: a low mean gap, but a very high SI.