How I started believing in Cycle Time over Estimation

86 points by snorberhuis 5 years ago · 35 comments

Reader

judofyr 5 years ago

The gist of this article is good. Fixing the code is often the smallest part of rolling out a bug fix. That's extremely important to be aware of. Using historical data can be a great way of seeing how fast you can actually roll out changes.

But I don't get how they went from that to "estimations are bad". It seems to me that they're just doing better estimations now by looking at historical data?

> I could tell confusion from the look of my team lead. How could I say it would take over five days for a change that we estimate to take 10 minutes!

Isn't this like comparing apples and oranges? The estimation of 10 minutes is clearly "the time it takes to flip the flag", but what they actually want to estimate is "the time it takes to make it into production in a safe manner".

Maybe the team leader is confused because they thought this was a man-hour estimation? And they're now afraid that one developer will work full-time five days on it?

> On Monday, we started with the stand-up and discussed picking up this task. Another engineer proposed fixing a bug on another component first before turning on the flag.

Doesn't this imply that their 10 minute estimation was done completely without asking the team?

> As humans, it is impossible to consider all the potential complexity. […] We did not consider a legacy system with bugs and scaling problems. […] We did not consider a combined release process with a Release Manager.

Really? You couldn't consider the complexity of a release process you designed yourself? You honestly thought this was a 10 minute task even though you knew you were dealing with a legacy system?

brightball 5 years ago

Fwiw, there are some pretty prominent authors that suggest you are better off just counting the number of tasks in the queue and tracking the average time per task, rather than estimating at all.
It’s just as reliable since everything averages out over time and the queue size is a leading indicator while cycle time is a trailing indicator.
- geoelectric 5 years ago
  
  It also auto-calibrates to the typical granularity of your tasks, assume you decompose with similar concerns across the board.
snorberhuisOP 5 years ago

Thank you for your comment!
> But I don't get how they went from that to "estimations are bad". It seems to me that they're just doing better estimations now by looking at historical data?
For me personally, I went from estimations are just really poor even for the most simple things and not worth the effort.
It is correct that is can still be seen as migrating to a better form of estimation. If you look at the conference talk by Allen Holub[1]. He tries to make the distinction between estimation and prediction is that the latter is only data driven.
> Maybe the team leader is confused because they thought this was a man-hour estimation? And they're now afraid that one developer will work full-time five days on it?
That could be the case. I never looked at it that way.
> Doesn't this imply that their 10 minute estimation was done completely without asking the team?
Yes, it was done with just the team lead and me. We migrated away from Scrum at that point and we did not do full team refinement with estimations sessions pre-sprint. We would collaborate on the technical solution for any story when we would pick it. This could be full team to two engineers. The engineer that brought up the bug would probably bring up the bug.
> Really? You couldn't consider the complexity of a release process you designed yourself? You honestly thought this was a 10 minute task even though you knew you were dealing with a legacy system?
Yes, I honestly did! It was just a simple code change in our terraform code to set the config from false to true and cycle the machines. I should clarify this more in the story.
[1] https://www.youtube.com/watch?v=QVBlnCTu9Ms&t=1s
- judofyr 5 years ago
  
  And thanks for replying!
  > It is correct that is can still be seen as migrating to a better form of estimation.
  Well, from my perspective there is still estimation going here: A project manager came to you asking for a timeline for a feature and you replied with a concrete number of days. I don't think it makes any difference to the project manager whether you call it an "estimate" or "prediction".
  > For me personally, I went from estimations are just really poor even for the most simple things and not worth the effort.
  The example you're presenting isn't showing this because it doesn't seem like you made any actual effort estimating it. It would have been better if you showed that you spent hours talking with various people for input and then in the end the Cycle Lead was more precise. In this example you did an estimate without even talking to the team! (Yes, you talked to the team leader, but apparently they didn't know that there was performance issues with enabling the flag so there's some communication lacking here.)
  If you wanted to properly estimate it you would at least (1) ask in the stand-up if there was any concerns about flipping the flag and (2) communicate with the release manager to see that it would have been possible to do a release. And it looks like if you actually did this then you would end up with an estimate that is closer to the actual time. It might also have uncovered that there was another release process happening at the same time.
  Considering that the standard deviation is two weeks (!) it seems more that you were lucky in this example that there wasn't more unplanned work. What if the performance issue required three days of work instead of one? What if the release manager told you that they were upgrading database servers and couldn't do a release for two days?
  > Yes, it was done with just the team lead and me. We migrated away from Scrum at that point and we did not do full team refinement with estimations sessions pre-sprint.
  There's a middle way between "full team refinement with estimation" and "only team leader and me estimates". You can ask around and double check before giving out estimates to project managers.
  - eternalban 5 years ago
    
    The approach OP described was new for me. What I got from the article was that a statistical tool allows for fairly accurate estimates without the need to "[spend] hours talking with various people for input". Further, it is not a certainty that an elaborate exploration of the proposed changes with various stakeholders will address potential blindspots or simply unexpected events (e.g. some infra burning). If OP is corrent, the statistical estimate has all these factors baked into it and will only get better over time.
    This is the value of the approach, imo. Takes the 'subjective' element out of the picture. [p.s. And also, it 'scales' with project size, likely O(1). The 'ask everyone involved' approach is at best O(N)'.]
    Your point regarding the standard deviation is quite fair but doesn't indicate a failure of the approach: the approach would be broken if they ever exceed the sd, so it is still a nice, definite, limit to the task completion timeline. Also I assume that as more data is gathered, the sd will shrink.
SideburnsOfDoom 5 years ago

> But I don't get how they went from that to "estimations are bad".
If "estimations" means the usual, i.e. asking "Ok everyone, how long do you think this task will take to complete?" then yes, that is largely a waste of everyone's time. Those estimations are bad.
> they're just doing better estimations now by looking at historical data
Yes, better estimates are coming from data. The people aren't "doing estimation" any more, it's being automatically calculated. Doing estimation takes time and give wildly wrong results.
qznc 5 years ago

The distinction between effort (man hours) and lead time gets lost often. Clueless managers ask for the first and take it for later.
ser0 5 years ago

This is why definition of done is quite important, to get everyone one the same page about what it means to complete a task.

pointyfence 5 years ago

We had a much more primitive version of this for our product roadmap. After working with mostly the same dev team for a few years, we felt that in a given year we seem to consistently be able to complete X “big” projects, Y “mediums” and Z “smalls” despite all the random stuff that invariably pops up.

So that list was the spine of our annual planning, the stuff that we really prioritized. The rest of the stuff that people wanted to do would have to fall in between the gaps of that master set.

It worked surprisingly well for us, but we had to build this group experience first. After a few years with another dev team at another company, I brought it up as an alternative to over ambitious roadmaps and it worked well there too.

How much of this was just due to more team familiarity and experience or this oversimplified process? Don’t know. But it felt like the roadmap trade offs were more thoughtful, devs felt more relaxed and had reserves if we had to red line a bit, and fewer missed commitments.

ohthehugemanate 5 years ago

this is the recommended way to plan time in scrum and many other agile implementations. You're just using "T shirt sizes" rather than "story points". It's recommended because it's quite accurate, as you've found!

eyelidlessness 5 years ago

The detachment of individual deliverables from estimates is definitely an improvement but I honestly think the notion of even hand waving estimation is a bad idea with harmful consequences.

Measure velocity for sure. Keep track of what kinds of things harm or improve it. But the absolute worst thing you can do is find an abstraction like tshirt size or rolling average and superimpose it on discreet work. You’re going to be wrong. And probably disappointed or disappoint someone else.

My strategy for estimation is:

- It’s tiny and I’m doing it now, I can give you an estimate within an hour margin.

- It’s small, it’ll be done in a reasonable amount of time that no further estimation is required.

- It’s not small and more research/design/planning is required. Any estimate you extrapolate from this is dishonest.

That’s it.

Edit: that wasn’t quite all. “It’s tiny” is reserved for urgent situations that require rapid response and not something I offer outside that situation. For daily work, it’s small or it’s too big to say. Tiny things are too easy to abuse (by managers or engineers) if they’re part of the normal flow.

ohthehugemanate 5 years ago

The benefits described here are achieved by ANY abstracted estimation system. Ask people to estimate in units of difficulty, complexity, cups of coffee, whatever. Keep those units consistent relative to each other (a 4 cup task looks roughly twice as big as a 2 cup task, in advance). Track the relationship between units and hour/days over several sprints, and your average automatically factors in all the external things like team collisions, illness, developer downtime, refactoring, bugfixes, release process, etc. The research shows that the law of large numbers makes this MUCH more accurate than pure time estimation. (time estimation is extremely difficult to get correct more than about 35% of the time; averages of consistent units are correct enough to run a casino budget).

The scrum people will tell you to call this unit "story points", but it doesn't matter what you call it, as long as it's based on something connected to task duration. Difficulty, complexity, risk, whatever.

Note that if you're tracking the conversion rate between developer estimated hours and real hours, you're already doing this. But you'll get an increased accuracy by using units that are explicitly not time-related, just because of quirks in how human brains think about time.

choeger 5 years ago

In my experience, you need both numbers. A customer naturally wants to know when a change arrives in production. I call this "latency", you call it cycle time, but maybe there is a small difference because I also try to include the time it takes to even start development. But a customer also wants to know about effort, often. Because usually effort is what gets billed. So a 10min (realistically, the bill we be about an hour, won't it ;) ) change has a five day latency, which is actually a relatively fast pace for a legacy system with many different stakeholders.

snorberhuisOP 5 years ago

Good point, I agree that you need both numbers. This story is only about the latency.

snorberhuisOP 5 years ago

I am the author of the story. I appreciate any feedback so that I learn from you! I will be participating in the discussion.

jacques_chester 5 years ago

I feel like I'm fulfilling a stereotyped HN commenter role by asking this, but isn't "prediction based on historical data" actually, you know, an estimate?

sitkack 5 years ago

I couldn't read all the text, but I believe the metapoint is that the need for accurate estimates drop as the cycle time is reduced.
It is all about feedback.
My analogy is an old analog style joystick that positions a simulated robot arm on a screen. If the update rate is high enough I can track a rapidly moving dot, but if it drops across some threshold, and I needed to accurately position the arm at some time in the future, then I would need to construct a huge model of the system, know the force, stiction, friction, mass, moment and thermal expansion. (edit, I have another analogy, anyone can spray a moving target with a hose, but using a bow and arrow requires skill and practice)
Feedback allows us to use unpredictable components to make predictable systems. Those systems are nearly always amplifiers. Systems that use direct feedback don't have to have the same reductionist model as something that needs better prediction (estimates).
This is why Lisp was a super power in the 80s, it had a repl. Same as Smalltalk, the IDE and repl and the universe were all the same thing. It makes total sense that agile came out of a system based around repls and instant feedback. Arduino did it for embedded dev. Hypercard for programming, the spreadsheet before that.
Highbandwidth feedback allows us to be less skilled. Good estimators need to be highly skilled to make those estimates. Hose vs arrow. That reminds me, have you seen a really skilled FPS player on a predictable but high ping connection? They are almost timeless in how they predict the future, and to everyone else they dance between every 10th frame. Amazing predictors!
You can't agile a martian probe (yet). As new ways are discovered to reduce cycle times, the time between cause and effect, each proceeding structure of feedback is replaced with an even higher bandwidth one. Robust DFU is a metarepl as hardware manufactures race to ship products that are literally not finished and require a firmware update on boot to even function.
- jacques_chester 5 years ago
  
  I just wanted to reply to say this was a great explanation and that I broadly agree.
  - sitkack 5 years ago
    
    Thanks for the feedback.
    Quip aside, I also learned a lot writing it. Always bet on the Scientific Method and if we know something works, we should have to justify not using it.
    
    sitkack 5 years ago
    
    This aligned perfectly with the thread, https://youtu.be/S1nc_chrNQk?t=120
    I went on a tour of Blue Origin, I learned next to nothing. It looked like a the rich kids house that had a coleco and a neo geo.
    The picture they paint of spacex is one of being in continuous flow vs cautious pessimism.
whtrbt 5 years ago

I think the type of estimation meant here is the agile type, where the team looks at a feature/requirement and estimates the time or relative effort based on the nature of the requirement + whatever other factors they choose. It's usually more intuitive and not based on historical data.
tekkno89 5 years ago

he also links to a conference talk about this. "you can make a claim that estimates are based on past behavior, but the fact is that what you're implementing is something that hasn't been implemented before. So any kind of measurement that you've made of something that has happened in the past is not going to impact what you're doing now" https://www.youtube.com/watch?v=QVBlnCTu9Ms
- rualca 5 years ago
  
  I'm rather skeptical on this idea that just because a specific feature was never implemented, or just because a specific bug was never fixed, that estimates based on past behavior don't work. That assertion doesn't have any basis on reality. I mean, implementing a feature of fixing a bug is not an isolated event performed with improvised approaches starting from scratch. Teams have processes and procedures that are standardized and take time, and need to be performed sequentially, which means that the improvisation part at best represents a small subset of the time invested working on a ticket.
  For a concrete example, let's imagine a team which has a continuous delivery pipeline which involves a code review step and manual acceptance tests. Let's say that the code review can stay in a queue for a couple of hours, or even sleep into the next day, and that the manual acceptance tests require the feature to be deployed to a preprod stage after passing through all unit and integration tests, and that it might take a day to run.
  With this process alone, the ticket already takes at least 2 or 3 days between being assigned to someone and being marked as done.
  Now, let's say that the coding bit of a random ticket might take 5 minutes or 3 days. This means that the overall time between the start and end time of a ticket is about 4 days +- 2day, which means worse case scenario, it takes 6 days to close a ticket.
  How is this sort of estimate not possible?
  The problem of providing estimates is not one of predicting the amount of time it takes to close a ticket. The problem of providing estimates is a problem of processes, and how to adequately organize, structure, and classify work. If you don't know what you're doing then you don't know when you're done.
  - snorberhuisOP 5 years ago
    
    Good point you hardly ever start with a clean slate. Using historical data or asking engineers to estimate how long it will take will always be based upon this past performance.
    But the point I try to make is that it is hard to take into account all the factors you have to deal with in a complex situation. As a human, you tend to ignore irregular influences. With tracking tools, you get this data right out of the box and it is more accurate in my opinion.
  - xchaotic 5 years ago
    
    Yeah, I think the problem the OP overlooked is that most of the cycle time was taken by various internal company processes and these are really not novel every time therefore it takes a similar amount of time to deliver different features. This in my mind is simply called estimation.
    
    sitkack 5 years ago
    
    The external clock is going to be some multiple of the internal clock and tasks were piling up behind contended locks. The ferrari and the skateboard make it through rush hour traffic at the same speed.
snorberhuisOP 5 years ago

Thank you for your question!
Yes, I try to make the distinction between prediction looking at historical data and estimation asking the engineers how long it will take.

fn1 5 years ago

What works well for me:

1. Split the project into tasks. (a task taking at least 3-5 PTs)

2. Have a POC / prerelease phase

3. For complex tasks introduce a second polishing task, like "Basic User management 1/2" and later "Basic User management 2/2". If the second one isn't needed: Great!

4. For all tasks/areas estimate a lower and an upper estimation in hours or PT.

5. Revisit your estimation afterwards using this process:

5.1. Imagined that someones gives you the following choice: You get 1000$ if your the true value lies within your estimation 90% of the time. Or you can roll a dice and in 9/10 you get the 1000$.

5.2 If confidently pick your estimation ("I'm 100% sure the true value is within my bounds") then it's too wide.

5.3 If you rather pick the dice-roll, then you don't trust your estimation, it's too narrow.

5.4 Adjust your estimation until both options seem equally probable to you: dice-roll or your estimate being true 90% of the time.

groby_b 5 years ago

I am still not clear at all how "cycle time" is a meaningful measure without correcting for size of the feature.

I'm also confused as to how you can look at "mean of one week, with a standard deviation of two weeks" and not wonder about the asymmetry of the distribution, and if mean/sd are really the metrics of choice here. (I'd think 1st/3rd quartile are a better choice, because it gives some clue as to skewedness)

judofyr 5 years ago

I'm guessing it's a good metric here because most of their tasks are small tasks that only takes a few hours to complete (in code) and then the average Cycle Time becomes the average overhead for getting a fix into production.
lmm 5 years ago

I'd say SD is a much better measure here, precisely because that big standard deviation jumps out at you. My guess would be that the 3rd quartile is some sensible-sounding number like 1.2 weeks but there are some huge outliers pulling up the SD.
- snorberhuisOP 5 years ago
  
  I agree that the Standard Deviation is a good measurement because of the big outliers.
  We also focus on these during retro. What made us remarkably slow? You want to drive down these issues.
  This investigation can help you in future work. I worked with another team that was consistently slow if they would need to build new pipelines or get connections up and running to third parties(heavy regulated environment). This insight helped us in planning.

cheschire 5 years ago

Sounds more like a case for using cycle time as a checksum to make sure the team’s estimates are fully considered.

They’re clearly missing 4 days of standard requirements in their estimates.

Settings

How I started believing in Cycle Time over Estimation

Keyboard Shortcuts