Don’t track bugs, fix them

You do not need a bug tracking system. In fact, a bug tracking system is a symptom of a deeper problem—insufficient focus on quality.

In general, I fix bugs the moment they appear—not in a drop-everything sense, but as soon as I can get to it—usually within a few hours, but sometimes in a day or two when I’m done with the current story. Of course, if I’m working in the code when I notice a bug, I just fix it right then. The software I write has no known bugs in it as a consequence (and yes, I never don’t use TDD and I always add real tests as I work—these are two different things). I can trust the code as I’m working, and I can verify that I haven’t broken anything as I work by running the tests every few minutes. Takes all the pressure off and radically improves development speed.

I don’t use a bug tracker because I don’t need to. There are no bugs to track.

I know that people who haven’t tried it will be skeptical that writing a bunch of tests increases your speed, but programming has never been about the speed at which you type. Time spent typing a test is nothing when compared to overall development time, which is mostly thinking. Having the tests in place eliminates all the staring-at-the-code-wondering-if-it-will-work time and lets you focus on the problem you’re solving rather than whether the code works or not.

If you have so many bugs that you need to track them, it seems to me that you have serious problems in the way you do development. Fix that first. You can start by throwing away every bug report older than a month. If it’s important, it will come back. Then distribute the bugs out to the teams. In other than a very small organization, the teams will need to work out some way to fairly route bug reports. Throwing dice works pretty well. Assign a number to each team. Throw the dice. You can round robin, too. The team can fix it whenever it’s convenient, but there should be an upper limit on that—a few days, maybe. If your stories are properly sized, that means you can finish the current story before you move to the bug fix. If you’re interrupting people, you’re doing it wrong. A bug is rarely a crisis.

Once the existing bugs are dealt with, fix your process so that the problem doesn’t arise again. Learn TDD and testing (two very different subjects), and make a rule that nothing gets pushed unless it passes all tests. Every bug is the absence of a test. When you find a bug, don’t write a report, write a failing test, then get it to pass. It’s not rocket science.

Finally, put aside the canard that you don’t have time/money to fix bugs and write the tests you need to prevent them. The higher the quality of your code, the faster you’ll work and the easier it is to make changes. (You cannot refactor safely without tests.) Defects are waste. The easiest way to increase profit is to remove waste. That bug database is just a black hole where essential work goes to die. Dump the database and do the work.

Addendum

I brought this notion up over on Twitter, and there was an impressive amount of pushback from people saying that “in the real world” you can’t possible do this because there are so many bugs and you have to prioritize the work and you need a tracking system for that.

Putting aside the fact that there are real companies in the real world who successfully do exactly what I’m suggesting (e.g. Hunter Industries), I’d argue that the solution to the too-many-bugs problem is to change your process to one that doesn’t churn out such buggy code. When you use TDD, test-first development, continuous code review through ensemble/mob programming, merciless refactoring, plenty of automated tests, a CI/CD pipeline that runs them on every check in, a no-known-bugs-on-release policy—in other words, take agility seriously—the code won’t have very many bugs in it. As I said earlier, my experience as a CTO is that not doing all that is way more expensive than doing it. More to the point, when bugs come up once ever few months instead of of once every few minutes, a dedicated tracking system isn’t really required to manage them. Just fix them when you get a chance.

Also, basic Lean theory tells us that you need about 30% slack time in your schedule to handle the unpredictable. Working at “100% capacity” slows you way down. Think of a road at 100% capacity. We call that a traffic jam. Fixing an occasional bug is exactly what that slack time is for. When a bug comes up only every few months, finding time to fix it is not a problem.

Addendum to the Addendum

Most, but not all, bugs are amenable to a story approach. If the bug prevents a user from doing some bit of domain-level work, a description of that domain-level work is a story. You can put that story in the backlog in the normal way, sort it by end-customer value, and flush out the details just before implementation, all like any other story. No need for a bug tracking system, which will get that workflow backwards and not show you the other stories in waiting. The trackers collect details up front when the bug is reported, way before you’ll be working on the code. Of course, changes to the code that happen while the report gradually rots in the tracker could render those details (or the entire bug report) irrelevant. Best to delay capturing details until the last responsible moment, as you would with any story.

And of course, if a bug isn’t stopping a user from doing domain-level work, I would wonder if it’s worth fixing, so maybe those sort of bugs don’t need to be tracked, either. Solve those problems by just removing the code that contains the bug from the system entirely—a standard refactoring to remove unnecessary code.

19 Comments

  1. Scott on February 26, 2022 at 12:14 am

    I generally concur with the notion of test-driven development, and “go slow to go smooth, go smooth to go fast”. I go further than you do (tests without requirements offer little assurance that you’re verifying the right things), but let’s take a look at items in your writing that contradict or undermine your thesis.

    Are tests your only method of verification? what about formal verification? were you to use that, we might have evidence to support the claim of “no known bugs” and there’s be reduced need to run your tests “every few minutes”

    This statement is a bug, and you should remove it from your text immediately, “You can start by throwing away every bug report older than a month. If it’s important, it will come back.” A defect doesn’t “go away” just because you ignore it. If the bug report comes from a world-class security researcher, the defect not only won’t go away, but it may well be exploited without your knowledge and to the detriment of your users.

    I suppose you could argue that no such report would ever be left to age so badly (a month? heaven forfend!). Maybe you haven’t had the joy of researchers discovering insidiously subtle, nuanced problems at a kernel boundary or even a memory cell level. Changes in kernel mechanisms or managing the impacts on the kernel of physical memory characteristics of a machine from the the unwanted movement of electrons from one cell to another that can be driven by javascript running in a virtual machine, well, it’s like brain surgery. Unplanned brain surgery. Your demand of “fix it now” seems not to take into account this kind of real world experience.

    [Don’t mind discussion, but I’ve edited this post to remove the gratuitous insults. There are advantages to having one’s own blog 🙂 –Allen]

  2. Scott on February 26, 2022 at 12:18 am

    LOL, did Berkeley change timezones, or is there a bug in your server-side code? I made a post Fri Feb 25 at 4:14PM PST. Your blog comment code shows it as February 26, 2022 at 12:14 am. Nothing more dastardly than timezone code, am I right?

    • Allen Holub on February 26, 2022 at 2:24 am

      It’s WordPress, but it looks like it’s defaulting to GMT. Not sure that that’s a bug, given that I (the customer) couldn’t care less about the time. I definitely wouldn’t track this one—not worth fixing. 🙂

  3. Jason Frazzano on February 26, 2022 at 6:26 am

    I agree strongly in the general case, especially when building a system using only my own code. That said, I have built some systems in cloud environments where sometimes you throw an error for no known reason just because the cloud environment puked. My general approach to these sorts of issues, however, is again in strong accord with your “fix it when you see it” mentality. I run timed loops that reattempt the failed segment of the code with the same state at the time of fail on the next loop through the system. I have been told this is a bad model and that they should just throw… I have no idea why the try again issue is bad. I also do some tracking… meaning I log fixes for bug types that happen in the cloud environment and then place those fixes approaches in the retry loops if the code can gleam the error types from the cloud environment.

    In systems purely my own, I often use setters to set values… The descriptors used in the setters have evolved over time to account for the various traditional errors caused by “bad” data inputs provided by the user. As a result, I now just uses the descriptors as my default tests. In a sense, they have just become part of the “native language”. These sort of adaptive systems, I imagine, could be fully and well automated–and likely are in large corps–as part of a language that is self-hosting and that learns from past mistakes/changes made to fix errors.

    In my wildest aspirations, I would love to create such a system/language. 🙂

    Do you believe the use of descriptor/setter logic could work as a reasonable replacement for a large swath of TDDs?

  4. Daniel Dunn on February 26, 2022 at 1:29 pm

    Throwing out old bugs is a terrible thing…. Most of them are real bugs that are still there, and people have given up on trying to get you to fix them.

    • Allen Holub on February 26, 2022 at 9:33 pm

      If you have so many “old” bugs that you’ll never get around to fixing them, then I don’t see how keeping them around does anything other than obscure the more important bugs that you need to fix now. Fix your process (see the Addendum I added to the post). If the bug was important, it will certainly come up again.

  5. Taylor Holliday on February 26, 2022 at 10:43 pm

    Reminds me of the time I spent most of a week coming up with a repro case for a bug related to Apple’s GPU shader compiler and submitted it to them. Should they have thrown out my detailed bug report if they couldn’t have gotten to it fast enough, leaving some other poor soul to come up with the tricky repro case?

    • Allen Holub on February 26, 2022 at 11:59 pm

      Great question. My response would be that your bug report is a story: you couldn’t do some critical work because of the bug. The actual story is a description of work you needed to do. Stories start out as a sentence or two. That’s what should be submitted. The week’s work you did was the first step of implementing the story. To me, that shouldn’t have happened until the story got pulled by the team that would implement it. That is, the story should have been prioritized by value to the entire user community (not just you), and then, if it was high-enough priority, Apple should have collaborated with you on the implementation. As it stands, if they put off building the solution for any length of time, all that work you did might be rendered useless by subsequent work that changed the behavior of the OS. In other words, the way bug reports are done now are effectively asking the users to do a massive amount of up-front speculative work for Apple’s benefit with no guarantee of a payment (getting the bug fixed).They then can arrogantly chose to discard your work (i.e., not pay you) if they chose not to fix the bug. It would be better to defer the work until they decide that the bug deserves to be fixed, I think.

  6. Pete Cacioppi on February 27, 2022 at 12:11 am

    (Also tweeted at you)
    I like that you are passionate about testing and TDD. That said, even on solo projects, I need bug tracking because “fix the bug right away” will destroy my focus. “This looks fishy, get to the bottom of it” is often how an issue starts, observed while doing something else.

    • Allen Holub on February 27, 2022 at 12:19 am

      When I’m working solo, if the fishiness has nothing to do with the code that’s right in front of me, I put it on a sticky and tack it up on my whiteboard. When I get to a convenient place, I’ll take another look at it. This is just a to-do list to prevent a context swap. I’ll retire the item very quickly. It’s in no way tracking—at least not in the way the word is typically used. That to-do list is the way I work generally. Lots of things—not just bugs—come up while I’m working. I put them on the list and keep going. When I’m done with what I’m doing, I go look at the list to see what to do next. I retire those items immediately, not let them putrefy in Jira or some other “tracking system.” I learned to do that when I learned TDD—it’s part of the process. If the bug has nothing at all to do with the current story, then the worst case is a day or two until I finish the story. I’ll give it another look, then, and decide if it’s worth pursuing. If the sticky stays on the board for more than a day or two, I throw it out.

      Actually, rereading the above, I do the same when I’m working in an ensemble, so this isn’t just for solo work.

      • Pete Cacioppi on February 27, 2022 at 5:38 pm

        I don’t want to write anything down. I want all my notes to be online, accessed from anywhere, and shared as needed. Hence, issue tracking.
        Somebody else can see the “this is fishy” issue and get to the bottom of it. Might take a good chunk of the day, but now you at least understand whats going on. What if you don’t have time to fix it? What if you want to mull it over for a while before trying to fix it? What if you want to summarize a proposed fix so that you have some clear notes to work from when you do get a chance to fix it? These are all wonderful uses for issue tracking.
        Sorry, issue tracking is mandatory for me. But I do like the idea that code should be close-to-defect free for each release.

      • Dave Alvarado on March 2, 2022 at 4:37 pm

        So you do have a bug tracking system, you just use sticky notes. Bug tracking systems are just your sticky note system scaled up to where you can find bugs in other teams’ code, or have your customers write sticky notes for you or whatever.

        When people are giving you “in the real world” pushback, I think what they’re saying is “above a given size”. “Don’t track bugs, fix them” certainly works at a certain size–we work that same way. But we don’t have tens of thousands of software engineers either.

        • Allen Holub on March 3, 2022 at 1:16 am

          If you have “tens of thousands of software engineers” on a single product or project, bugs are the least of your troubles. You need to learn how to scale the work down and how to work effectively. This many people is Parkinson’s Law in action. Lots of people spinning their wheels.

          I’ll add that my stickie notes are a to-list, items retired after, at most, a day or two. It’s a way to avoid context swaps. I’m not tracking anything.

      • John on March 2, 2022 at 5:36 pm

        So your big tracking system is postit notes on a white board instead of an electronic system.

        Revolutionary!

        • Allen Holub on March 3, 2022 at 1:13 am

          Sarcasm aside, post it notes are a to-do list, not a bug-tracking system. The notes live for a day or two. To-do lists can be a useful tool. For example, when I do TDD, things occur to me that are important but I don’t want to do this instant. They go on a todo list and are dealt with the instant I’m done with the current thing. The point is to avoid context swaps, not to track anything.

  7. Tei on February 28, 2022 at 2:26 pm

    Reading this post I feel like I enter the world of politics. See. The equivalent for software of the “You have money problems?, solution is simple, don’t be poor”. I guess theres a market somewhere for people that want to believe you can lower the number of bugs of a non-trivial software to zero, and that people have money to expend.

    • Allen Holub on February 28, 2022 at 5:20 pm

      This sort of simplistic rebuttal to the concepts I’ve presented popped up a lot on Twitter. First, the comparison is ridiculous. The problems of eliminating poverty are way larger (and more intractable) than eliminating bugs. There is some similarity, however. The root problem is largely cultural. Poverty—at least in a wealthy country—results from a culture of cruelty and selfishness (and, here in the US, racism). In a poor country, it often stems from corruption and a culture of power. There are obvious exceptions, of course.

      All of these cultural traits are evident in many companies, and they are equally destructive. A culture that incentives poor quality creates bugs. There are many companies that have instituted a zero-know-bugs-on-release police (none of which have a large number of bugs thereafter). They have a culture of quality. Companies with defective cultures that discount quality for whatever reason cannot eliminate bugs. It’s that simple. Fix that.

      I should also point out that the implied point that it’s impossible to eliminate bugs is disproven by the many companies who have done just that. Every one of those orgs that I’ve had conversations with have told me that productivity goes way up after the bugs are gone, that they are doing more better work than they ever had. The argument that quality is expensive is disproven by their experience. It’s pushed by people who have no data to back up that claim. (Tei, if you have data, please give it to me.)

  8. Amos on March 8, 2022 at 1:07 am

    Hunter industries is an irrigation company and doesn’t require complex coding solutions outside of maybe a board that calculates what time it is in order to automatically water at an appropriate time. They also need a website, which HTML, XML, and PHP as systems have back doors and day-ones just available to anyone who bothers to look for an opening. It’s extremely hard for you to say that your code is “bug free” when the systems they’re built upon are riddled with bugs and openings. You also can’t tell what bugs will arise after completion of a patch because of the variety of hardware between computer systems. Whatever you’re coding is likely for such specific hardware that vulnerabilities and bugs don’t occur to you because there’s no actual weight or importance to incentivize breaking your code.

    • Allen Holub on March 8, 2022 at 11:48 pm

      Put simply, you don’t know what you’re talking about, and clearly, you haven’t paid attention to irrigation systems recently. They do everything from predicting the amount of water you need from aggregated weather reports and sensors to accurately metering water to single plants, and if you’re talking about, say, a multi-acre University campus, the complexity is nontrivial. And they have a sales website of course, but the main use of the website is to remotely program and adjust the system. It’s part of the control system. They base their “no known bugs” on customer bug reports, internal testing, and of corse, security is part of that. $1M of industrial landscaping dying because your system fails is hardly “no actual weight.” The fact that they do that with 7 or so teams is pretty impressive.

Leave a Comment