Why Shooting for 100% Test Coverage is Important

100% test coverage may be impossible, but this week I’ve found that striving for it has many benefits.

I’ve been working on a new project that’s based on the Pyramid framework. The framework creators claim 99% test coverage. The other developer and I thought that was pretty great, and decided that setting a similar goal for our project would be worth the trouble.

I don’t think I need to go into intimate details about why tests of the various forms (unit, functional, integration, systems, load, stress) are important (I will if somebody asks :)). I think it can be summed it up in a simple truth:

In the long run, writing tests saves you time, and makes your code stronger.

I’ve spent most of this week refactoring tests in an effort to get to that mythical 100% goal, and I’ve found the experience to be extremely fruitful.

Note: we’re using Nose to run the tests and generate the coverage report.

I’ve learned a lot.

I know much more about how Pyramid works (and it’s been refreshingly straight-forward). I’ve become a fan of WebTest, and intimately familiar with WebOb. All of this was a side-effect of the need to break down the tests into units that could be adequately run at the various levels, to touch all of the code. I had to dig into the APIs for each package to figure out how to test certain things, and again, at each level. Luckily, these particular packages are very well documented, and their code is relatively clean and easy to understand, so the process was much less arduous than it might have been given a different toolset.

I had to dig into protocol-level aspects of the application. Some of more opaque parts of Pyramid, WebOb, WSGI and HTTP required closer inspection. After this week, I know so much more about multi-part form encoding (MIME encoding technically, RFC 2046), file uploads (RFC 1867), and perhaps more importantly, how Pyramid and WebOb implement these concepts.

The separation between unit and functional testing became more obvious, and more than ever, I appreciate the need for them to be separate.

The code’s gotten better.

When faced with a line of code that wasn’t covered, I had to figure out why. The answer to ‘why’ forced me to re-evaluate the way the tests were written and think critically about how the application is structured.

This is, of course, in addition to the expected benefit of catching untested code. That untested code manifested in several ways:

  • I found a handful of use cases that just weren’t being tested.
  • I found some edge cases that were not being tested (accounted for in the code, but not being executed by the tests)
  • I found API interfaces that were implemented but never executed.
  • I found tests that were passing, but not testing what they were supposed to test.
  • I found one specific edge case that was accounted for in the code, but it turned out to be effectively impossible.

What this amounts to is exposure of: false assumptions, bad tests, and some potential design flaws. In fixing and/or addressing them, the code has attained a new level of solidity. And even the tests themselves are a lot better than they were last week.

In summation…

I’d have to say working toward 100% coverage, even if you can’t get there, is more than worth the effort. Just exposing one or two bad tests or use cases that were neglected is worth it, and I gained a lot more than that from this endeavor.

Epilogue

I started writing this yesterday morning, when I had one or two stubborn lines of code that kept eluding coverage. As of about 6 o’clock last night, I’ve actually achieved my goal of 100% test coverage. I need to do a bit more digging to make sure this isn’t a misleading number, but given what’s gone into getting here, I feel pretty confident that even if it’s not as impressive an achievement as it sounds, that it is a reflection of a very good test suite, and a well put together application.

Advertisements
This entry was posted in python, Uncategorized and tagged . Bookmark the permalink.

3 Responses to Why Shooting for 100% Test Coverage is Important

  1. Cliff Dyer says:

    So you talk about 100% coverage being an unacheiveable (but worthy) goal. What do you see as major impediments to reaching that goal? Concretely, what is hard/impossible to test, and what makes it so?

    • Cliff Dyer says:

      Oops. hadn’t seen the Epilogue. So not quite unachievable eh? I guess the general thrust of the question still stands, though.

    • jjmojojjmojo says:

      The problem is sometimes, depending on the way your application is set up, it can be very hard to simulate certain cases to get the code to cover. I had that trouble myself. Mock objects are helpful (haven’t used them much yet), but sometimes you have to move the test from the unit realm up to functional or even systems or integration tests to get the code to be covered (or down from functional to unit). That can mean more overhead (you were happy with just a simple browser simulator, now you need to create systems deployments and get Selenium going), refactoring (you need to wrap code in a separate method so you can write a unit test) or giving up on that 100% goal.

      Personally, I think its worth the trouble to try to refactor; build the application so it’s as testable as it can be, tear down the layers whenever you can. But again, by doing that with nothing but 100% coverage in mind, you can miss contingencies, design flaws and real bugs. You also have to remember to put the layers back together and also test them.

      So maybe it’s not impossible… but when I got there, I think I got lucky. Our app in its current state is still pretty simple, it doesn’t have a high level of user interaction. What we have at the moment is similar to the Django Admin; model-level CRUD forms.. we’ve also got authentication middleware and a database-based implementation of Pyramid’s ACL system, but not much beyond that. Also, Pyramid is very testable. WebOb is very easy to use for request/response simulation (with an added bonus of it being what Pyramid uses under the hood). WSGI in general is a very test-friendly gateway interface, and WebTest filled in the gaps. We’re using SQLAlchemy and FormAlchemy as well, so they sort of encapsulate the model and some of the controller code in a way that can be teased out and put through its paces without the the UX stuff in the mix.

      I did come very close to giving up right around the time I started writing up the post. There were some quirks with the underlying libraries (the built-in cgi module to be exact); to get it all straightened out, I had to trace back through WebTest, then WebOb, then eventually down to the official Python source tree. It didn’t take too long, but if time was more scarce, I may have had to just live with 99% coverage. I was faced with a conundrum: is one line of code worth an extra half day (it could have been longer!) of work? Do I refactor around the line that just won’t execute? Comment it out? (boo!)

      In my case it dealt with file uploads and code I had in place to catch malformed POSTs; this is a core part of the functionality, so I felt it was justified. But I can see times when it might not be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s