I was in a project retrospective meeting today, and I’ve got another one tomorrow. I think we had a fairly low rate of defects compared to the complexity of the code, but we had some definite inefficiency in the amount of time it took us to detect and resolve issues. Some of the defects were clearly preventable, so it’s time to reflect on our development approaches.
Defects are an inevitable fact in any nontrivial project. If you think about it, a lot of the activities in a software project are dedicated to the detection and elimination of defects. The problem with defects is the inherent overhead associated with the administrative process of tracking defects. The efficiency of your project can be greatly affected by excessive churn as issues bounce back and forth between developers and testers. Not to mention decreased team morale (and increased management stress) as the bug count rises.
Much of our project retrospectives have involved determining ways to staunch the flow of defects by preventing them. The first step is to think about the sources of our defects. The second step is to determine what practices can be applied to cut down our defects.
Another topic dear to my heart is optimizing the total time it takes to fix a bug.
Our approach is based on Scrum and XP, so I’m naturally thinking about how and where Agile practices can both help reduce the number of defects and make defect fixes more efficient. Agile development isn’t a silver bullet for defects, but I’ve observed a quality of “smoothness” in the disciplined Agile projects I’ve been on that clearly wasn’t there on previous waterfall projects (or sloppy Agile projects for that matter).
Finding Defects Early
It’s generally accepted in the software industry that defects are easier to correct the sooner they are detected. One of the best things about Agile development is involving the testers early inside of rapid iterations. It’s so much simpler to fix a defect in code that you’ve written a couple of days previously than it is to spelunk code that’s 4-5 months old. Not to mention the difficulty of fixing a bug is much greater if it’s intertwined with a great deal of the later code (I do think that TDD alleviates this to some degree by more or less forcing developers to write loosely coupled code).
Developer Mistakes
Most defects are simply a result of developer error. Most of these defects are simple in nature. Disciplined Test Driven Development goes a long way towards eliminating a lot of bugs. Writing the tests first in the TDD manner means that our code behaves exactly the way we intended. I feel pretty confident about making the statement that TDD drastically mitigates the number of bugs due to simple developer error. Looking closely at the bugs that we had on the last project showed an obvious correlation between the areas of code with poor unit test coverage and the defects that I felt were mostly attributable to developer error.
If you’re suffering a rash of defects, it’s worth your time to think about how you’re doing unit testing and find a remedy.
Of course it’s not that hard to create incorrect unit tests, but that’s where Pair Programming should come into play. Having another developer actively involved with both the unit testing and the coding should act as a continuous code review to correct the unit tests.
There is also the issue of whether a developer fully understands the code they’re writing. Having a second mind engaged on the coding problem at hand should increase the total understanding. The simple act of talking about a coding problem with another developer can lead to a better understanding of the code.
Not Understanding Requirements
We clearly think that requirements defects have been our Achilles Heel so far. Consistently using TDD and CI means that our code mostly worked the way we developers intended, but doesn’t guarantee that we’re creating the correct functionality. As I see it, there are three issues here:
1. Determining the requirements
2. Communicating the requirements in an unambiguous manner to the developers to create a shared understanding between the analysts, developers, and testers
3. Automating the conformance to the requirements to stop the “Ping Pong” iterations of fixing defects
Our thinking right now is to utilize FitNesse as the primary mechanism to solve these three issues by doing Acceptance Test Driven Development. I’ll blog much more on this later, because we’re still figuring out how this impacts our iteration management and who’s responsible for what work and when.
In the meantime, I’d strongly recommend picking up a copy of Fit for Developing Software : Framework for Integrated Tests by Ward Cunningham and Rick Mugridge for a background on using FIT for acceptance testing. My first experience with FIT wasn’t all that positive, but I’m rapidly changing my mind after reading the book. I’m optimistic about FitNesse so far.
Edge Cases
Lately I’ve been dealing with about a dozen defects that can only be described as “edge cases.” These bugs are a combination of inputs or actions that nobody anticipated. Some of these bugs might just be from some missed analysis, but a lot of these bugs are never going to be caught until later in the project when the team has a much better understanding of the project domain. Either way, I think the appropriate action is to turn to the tester and just say “Good catch, I’ll get right on it.”
I think that Agile practices indirectly contribute to catching and eliminating these kinds of defects. By more quickly eliminating the defects in the mainline code logic with TDD and Acceptance Testing, testers *should* have more time to do the kind of intensive exploratory testing that finds problems like the one Jonathon Kohl talks about here. There’s also the very real benefit of the automated test suites acting as a safety net to mitigate the creation of new regression bugs.
This section would be a lot longer, but I think Charles Miller sums up the subject better anyway right here.
Invalid Testing Environment
Occasionally something will go wrong in the testing environment that basically invalidates any and all test runs. Maybe a testing database isn’t available, a URL to a web service is configured incorrectly (can you say scar tissue?), or a Windows service isn’t correctly installed on a test server. All of these things lead to testers either sitting on their hands waiting for you to get the test environment fixed, or report a batch of bugs that aren’t necessarily due to coding mistakes. On previous projects I’ve often been saddled with bugs that arose because the database stored procedures were updated through a different process than the middle tier code. A particularly irksome situation is when the new version doesn’t get correctly installed before the tester tries to re-test (we had an issue with this last week with an MSI installer created with WiX).
This kind of project friction needs to be eliminated. One of the best tools is an automated build script chained to a Continuous Integration practice. At this point, I would unequivocally say that any project team that doesn’t have a dependable automated build of some sort is amateurish, period.
A good automated build can shut down the chances of an invalid testing environment. Using CI should serve to keep the testers from wasting their finite time on obviously incorrect builds. CI also reduces the amount of time between checking in bug fixes and making the code push to the testing environment while simultaneously improving the reliability of the code pushes.
I’d also recommend creating a small battery of environment tests that run in your testing environment after code moves just to validate that all the moving pieces (databases, web services, Windows services, etc.) are accessible from the test application. My team just inherited a product with quite a few external dependencies that are hard to troubleshoot from the integration tests. We’ll be writing some automated tests just to diagnose environment issues before the integration tests run in the automated builds.
I’ve developed a strategy for self-validating configuration based on my StructureMap tool that’s relevant. I’ll blog on this soon.