Measuring Software teams

Press enter or click to view image in full size

What gets measured gets managed. This is an age-old saying that I have heard too many a time. But is it something to be managed? Do the costs outweigh the benefits of building the tools to measure them? In the article “The rule is simple: be careful what you measure”[1], Simon Caulkin summarised it best:

What gets measured gets managed — even when it’s pointless to measure and manage it, and even if it harms the purpose of the organisation to do so.

He summarises an original paper, Dysfunctional Consequences of Performance Measurements, in 1956[2]. The author, V. F. Ridgway, talks about the different measures used for evaluating performance in organisations. He studied single, multiple, and composite criteria for measurement and concluded that these quantitative performance measurements, regardless of the kind, are seen to have undesirable consequences for over-all organisational performance.

Single criteria measurements are considered particularly difficult, as choosing the proper criteria for performance measurement is challenging and is much simpler to game. We have seen this in software engineering in the form of lines of source code written, or the number of tickets closed. Organisations are complex and to make best use of the personnel available the managers had to have a better understanding of the organisational behaviour.

We try to measure everything, and try to interpret those results; sometimes to the detriment of the organisation. Not everything that matters can be measured, and not everything that we can measure matters.

The performance of software has misguided us to think that we can evaluate the performance of our software teams based on numerous metrics that are available. The performance of a system is relatively easy to understand and evaluate. Does the database perform well? I would look at the time it takes for queries. SLAs help to determine what we should measure and what we should optimise. This has fogged our minds to think we can do the same for the software teams.

Some of the more younger generation managers would laugh, but there were times we as an industry thought it was good to count the number of lines written as a measure of performance[3]. Why, even today story points are used to measure velocity.

Dave Nicolette[4], putting it humorously, explains the absurdity of using story points as a measure for anything:

How many Elephant Points are there in the veldt? Let’s conduct a poll of the herds. Herd A reports 50,000 kg. Herd B report 84 legs. Herd C reports 92,000 lb. Herd D reports 24 head. Herd E reports 546 elephant sounds per day. Herd F reports elephant skin rgb values of (192, 192, 192). Herd G reports an average height of 11 ft. So, there are 50,000 + 84 + 92,000 + 24 + 546 + 192 + 11 = 142,857 Elephant Points in the veldt. The average herd has 20,408.142857143 Elephant Points. We know this is a useful number because there is a decimal point in it.

Efficiency by xkcd

But we are not the only industry that is focussed on performance. Looking through Google Scholar you can see numerous papers on measuring performance. There is even a paper titled “Improving the performance of a performance measure”[5]. People are generally concerned on how to improve the performance of the teams.

Get Satyajit Ranjeev’s stories in your inbox

Join Medium for free to get updates from this writer.

You might ask me, so are you recommending not to measure anything? Definitely not. We do need to look at the data that we have, and sometimes things that we should instrument to measure. In his book[6] Robert Austin tells a story told by one software measurement expert in an interview:

I was in a meeting last week… and a woman in that meeting-she was a program manager-said, “Do you have any papers that discuss return on investment of software measurement programs?”… I just sort of looked at her. I said, “Well, no, but-it doesn’t matter if this is software we are talking about or anything else we are talking about… How is it that you have insight into how your programs are going? Is something on schedule?… How are we doing with respect to baselines or budgets or schedules?… How big is something?”… Whatever you are doing, you want some quantification of these things things.
The question “Why quantify?” is not seriously addressed. Instead a rhetorical parry suggests that the reason for measuring is obvious.

We should rationalise if it necessary to measure something. And what we are going to do once we have these measures. The authors of the book “Software Engineering at Google”[7] suggest to triage if it is worth measuring as “the measurement itself is expensive: it takes people to measure the process, analyse the results, and disseminate them to the rest of the company.” We need to ask ourselves:

What result are you expecting, and why?
If the data supports your expected result, what action will be taken?
If we get a negative result, will appropriate action be taken?
Who is going to decide to take action on the result, and when would they do it?

It doesn’t tell us what we should measure. May be we should measure something towards 42. The further you move from that number the worse you perform; direction doesn’t matter. Jesting aside there is no silver bullet here. There are some measures that can be gamed easily and some that are quite helpful. There are research-based measures that could be helpful for a certain question you are trying to answer. One such is the “State of DevOps report”[8]. That could be a measure that you could use to determine how teams work relatively to the ones defined in the report.

A manager would ask, how do we determine when a project would be delivered? Is there some measure we should look into? Trust your engineers, they have a better understanding of the system.

A group of managers were given the assignment to measure the height of a flagpole. So they go out to the flagpole with ladders and tape measures, and they’re falling off the ladders, dropping the tape measures — the whole thing is just a mess. An engineer comes along and sees what they’re trying to do, walks over, pulls the flagpole out of the ground, lays it flat, measures it from end to end, gives the measurement to one of the managers and walks away.
After the engineer has gone, one manager turns to another and laughs. “Isn’t that just like an engineer, we’re looking for the height and he gives us the length.”

https://www.theguardian.com/business/2008/feb/10/businesscomment1
Ridgway, V. F. (1956). Dysfunctional Consequences of Performance Measurements. Administrative Science Quarterly, 1(2), 240–247. https://doi.org/10.2307/2390989
https://www.ifpug.org/content/documents/Jones-LinesofCodeMetricV6.pdf
https://twitter.com/davenicolette: I understand he is attributed to this example, but couldn’t find the source.
Tangen, S. (2005), “Improving the performance of a performance measure”, Measuring Business Excellence, Vol. 9 №2, pp. 4–11. https://doi.org/10.1108/13683040510602830
Measuring and Managing Performance in Organisations — Robert Austin
Software Engineering at Google, ISBN: 9781492082798
https://cloud.google.com/devops/