Karhatsu Blog: August 2013

This is my first blog post, welcome! I would like to share with you my real-life experience related to #NoEstimates. I've been following the discussion on Twitter this year but learned most of what I know by reading blogs. So maybe you can learn something from my post as well. The post tells a story about a team that was using plenty of time on estimation but gradually moved towards #NoEstimates. Let's see how that happened.

Initial estimation method: hour estimates

The development team I joined as a Scrum Master had a long history. However, during the previous year about half of the team had changed. I was told how it at least used to be a great Scrum team but when talking with anyone in the team, it was clear that they were struggling. I don't know if it had been like that a year or two earlier but at least at that time they were far from well-performing.

During the first sprint with them I just observed how they worked. Among many other things I noticed:

The sprint planning meeting was neither very efficient nor effective. The group of 12 people was divided into two teams and both teams spent typically 5-7 hours on sprint planning.
During the sprint planning meeting the Scrum Masters were using an electronic tool that contained user stories with initial story point estimates. The team was discussing the stories and the Scrum Master wrote tasks based on the discussion. For each task the team provided an hour estimate using the planning poker method.
The atmosphere during the meetings was, how should I put it, not very energized. They weren't events where the team would eagerly try to find the best possible solution to the problem at hand. I noticed e.g. how some people were so bored and frustrated that every once in a while they would just ignor the discussion and spent time on Facebook or similar.
One of the Scrum Masters' tasks was to print out the user stories and the tasks from the electronic tool. During the sprint the developers of course noticed new things to do but you couldn't see that on the Scrum boards since nobody wanted to write new tasks into the tool and print them out. It was thus difficult to follow the actual progress during the sprint.
Even though the planning was very detailed, the teams weren't able to finish the user stories during the sprint. The other team finished half of the stories completely while the other one finished none. The teams created burndown charts based on tasks and their hour estimates. This meant that if they had 80% of the original tasks done, they had a pretty “successful” sprint. It didn't matter if the initial tasks were irrelevant or if none of the stories were completely done.

The first sprint ended with a retrospective where many of the team members pointed out the problems I listed above. The team decided to try out something new.

Transition from hour estimates to story points

The next sprint planning was quite different from the previous ones. We stopped doing hour estimates. We threw away the electronic tool. We stepped away from the pressing meeting rooms and used the team space instead. We didn't try to do all of the work with the whole team but instead did some of it in groups of 2-3 people. And although we had printed the user stories, we wrote the tasks by hand.

First we checked the product backlog and picked the top four user stories and discussed them briefly all together. Then we split the team into four small groups and each group was responsible for providing the tasks for the story. As a detail I remember how someone suggested that we should write a couple of tasks together so that everyone would see what it is like to write them, how to pick them from the discussion. This was an interesting detail since I realized afterwards how the “Scrum Master uses the tool” approach had made them passive also in this sense. After 15 minutes or so we gathered together and each group explained what they had done. Others made comments and asked some questions. Based on these the team fine-tuned the tasks.

The same was repeated until finally we had about ten stories planned. The only thing we were missing were the estimates. I asked the team which one of the stories is the smallest. It was easy to find and that story got one story point. Then I took a random story and asked if it was the same size and if not, how many times bigger. That way we got story point estimates for each of the stories.

The team had been using story points also before but they were based on hours with some formula that I don't recall. Since we now had a new meaning for one story point, we didn't have comparable data from the previous sprints. Instead I asked the team: do you think that you can completely finish all the stories during the sprint? Although they were not very confident, they decided to commit to all of them. So were we done. We had spent about three hours, went for lunch, and started writing some code.

Story points era

One of the changes we made was that we stopped drawing burndown charts based on tasks. Instead, we used completely finished stories. Below you can see how it looked in the new sprint #1.

This was something I had witnessed before. It goes like this: In the beginning everybody can choose what they start to work on. Since it is the most efficient way (right?), almost everyone picks a story of their own. In the middle of the sprint none of the stories are completely done. At the end of the sprint magic may or may not happen. In this particular case they got pretty close but from an earlier team I remember how there were five developers, five user stories, all of the stories work in progress, and only one of them completely finished on the last day of the sprint.

So during the first sprints we had a lot more to improve than just make the sprint planning more effective and efficient. One thing was to start working more in pairs or small groups. Another important thing was that the developers tried to get something for the tester sooner instead of waiting for the whole story to be coded. This way the user stories were ready sooner. It also made the tester happier since he didn't have to wait until the end of the sprint to get something new for testing.

However, that wasn't enough. The team wasn't able to reach their goal during the first couple of sprints. At the end of one sprint planning one of the team members asked how many points the team had completed in the previous sprint. I said about 30. Then he asked from the team: If we have managed to do 30, why should we commit to 40 again? A good question, I would say. So they decided to drop a couple of stories away.

Little by little the team learned to commit to a reasonable amount of work and also get the work completely done in the sprint. After 2-3 months the charts started to look like this (we changed from burndown to burnup at some stage):

An important thing that the team learned was that if they commit to stories that are too big, there is a high risk that they won't be able to finish them. The team created a rule that if a user story is estimated to be more than five points, they have to split it into smaller pieces. I believe this was a crucial lesson towards the next step.

S/M/L estimating

The duration of a typical planning session had dropped from 5-7 hours to 2 hours or even less. The team was able to finish the sprint goal almost every time. But I still felt that we could do even better.

I remember that sometimes we were using too much energy on discussing if a story was one or two points. I even remember a case when time was spent arguing whether a story was zero or one points.

We also discussed if it made sense to estimate bugs and include finished bugs in the burnup chart. It felt like cheating: what if you finish a 3-point story in sprint n, find three bugs in sprint n+1, and fix 1+1+1 points in sprint n+2? From the commitment perspective (how much we'll be able to do) it made sense but from the value perspective it didn't.

There were also situations that we couldn't know beforehand whether we were able to start working on a certain story since it was blocked by an external party. Or we didn't know exactly what we needed to do since we first needed to find that out by doing another story. However, since those were important tasks that should be done if possible, we reserved space for them in the sprint backlog: “These are the stories we have selected and besides them we have 3 points for these unknown stories.”

Since all of that felt kind of like waste, I proposed the next step for the team. Let's drop the story points and instead use sizes S, M, and L. S means 1-3 old points, M means 5, and L is bigger than that. If a story was S, it required no further discussion about its size. If it was M, it was a warning that further discussion might be needed - can we really complete the story or could we perhaps split it? If it was L, we had to split it. The sprint commitment was made based on the gut feeling using the question: from 1 to 5, how confident are you that we will be able to complete all the stories we have chosen?

An interesting thing was that we never actually used those sizes. The team had learned to split stories so small that all of them were of size S. At that time our typical process was such that we had enough stories on the whiteboard waiting for the next sprint. We spent 10 minutes on them on the last day of the sprint. We started the next sprint with about an hour-long sprint planning meeting where we made sure that the whole team knew what we were going to do and checked if there was something important that was missing from the backlog. The developers wrote the tasks when they picked a story and rewrote them whenever needed. It felt like we were getting closer and closer to a nice flow.

#NoEstimates

At some stage we decided to split the team into two. The reason for this was that even though there was one code base, there were two clearly distinct businesses using it. This caused a major challenge of how to prioritize stories. So one component team became two feature teams, each business having its own.

The team I was in decided to take the next step towards #NoEstimates, although at that time I hadn't heard about such. We decided not to have sprints anymore but instead every time choose the next most important thing. Of course this meant that we tried to keep the amount of work in progress as low as possible, although we didn't have explicit WIP limits written on our board. It was important to have as small stories as possible but we didn't spend any time on estimating them (well, intuitively perhaps). We were just thinking if this story made sense and should and could we split it. Sometimes we noticed during the development that it made sense to split the story and then we just wrote a new story.

Instead of sprint plannings we started to have weekly meetings having all the relevant people from this business area in the company. That included of course the development team and the so called business people. We didn't have a Product Owner anymore since there was no need for such. In the weekly meetings we as a group talked about the big picture, checked what was going on, and decided together what we should do next. We used another whiteboard that was scaled to an upper level than what the development team had.

Instead of calculating velocity based on story points we started to count finished stories per week. Below you can see how our throughput statistics looked during the first 20 weeks. Notice especially the last eleven weeks: every week 2 or 3 finished stories. When the throughput is so stable, why would you need any size estimates?

It was a week 20 or so when we realized that we needed to do a major refactoring in order to meet a certain important business need. It was the first time in this new team when we needed to do estimation of some kind. Our approach was the following: Try to understand what needs to be done. Split the work into user stories or similar. Count the stories. Use the statistics to forecast what the probabilities to have this done before date X are or when all of the stories would be done with a decent certainty.

We were a bit skeptical about how the business owner would deal with our non-traditional approach of forecasting when the project would be ready and in production instead of estimating in man-days. Luckily we were fortunate to work with a smart guy and after asking a couple of questions he just said: ok, go for it.

What really happened was that the required changes were in production pretty much when we expected them to be. However, we didn't finish all of the dozen stories we had planned initially. Instead we realized that half of them could be done later and replaced those with other, more important tasks. The throughput was as expected but the content was something different, more valuable.

#estwaste and euros

Before the #NoEstimates hashtag I remember that at least Vasco Duarte was using #estwaste in his tweets. I like the word waste since it is an easy word to throw out on many occasions but let me provide you with some numbers that should make the word more concrete in this case.

If you read the whole story, you noticed that we started with sprint planning sessions that lasted about 6 hours and in the end we didn't have them at all. If we assume that there are 22 sprints per year and the team has an average of ten members, it means 1320 saved hours per year. I really don't know what the average hourly cost of the team members was but let's pick two numbers: 50 or 100 EUR/hour. On a yearly level this means savings of 66,000 or 132,000 euros. Besides that you probably noticed that we didn't need the Product Owner anymore. So you can add the cost of one manager above that.

I guess you are now saying that I forgot the value part of those sprint plannings or that I forgot the cost of the one-hour weekly meeting. Well, I didn't. First of all, the old sprint plannings produced very little or even negative value. Surely the developers discussed the upcoming work there but I would say that the discussion wasn't very useful. One of the purposes of the plannings was to provide visibility for the Product Owner but it was hard to see such an effect. And the usage of technical tool caused problems during the sprints since the team was having difficulties using the new information they learned while working. With negative value I refer to the drop in people's motivation.

Instead, the weekly meetings really produced value. They helped us to share information very efficiently and make useful prioritization decisions. So the cost calculations above really refer to the waste (=no value added), although they even ignore things like opportunity cost, cost of delay, and so on.

Lessons learned

Let me choose the two most important #NoEstimates lessons that I learned during this journey. The first one is that at least in this kind of context the #NoEstimates approach is perfectly valid and can bring huge improvements for the organization. With “this kind of context” I mean an ongoing product development. Unfortunately I don't have experience on making business decisions before starting to develop a large-scale product. I would love to read your post about that topic.

The second one is that if you start from the situation described above, you cannot just jump to #NoEstimates. Instead, you have to find your own path and take small and sometimes bigger steps towards it. Vasco Duarte claims that story points are harmful. I understand what he means but that statement depends on the context as well. In this post I described how we gradually moved from hour estimates to story points, to S/M/L sizes, and finally to #NoEstimates. The story points helped the team to learn how big stories cause problems and split the stories into smaller ones. I feel it was a necessary step to take.

I think that working without estimates requires that the team has a certain maturity level. If your team doesn't have that yet, you need to work hard (smart) in order to get there and enjoy the benefits of #NoEstimates. That is what we did and I recommend it for you as well.

Karhatsu Blog

Saturday, August 24, 2013

From hour estimates gradually to #NoEstimates