One of my favorite phone screen questions goes something like this:
You’ve just joined a team that has a large test automation suite. Say something like 3000 tests. These tests are run every night against the latest build. You’ve noticed that over the past month the pass rate for the runs vary anywhere between 70% and 90%. In other words, one day the pass rate will be 72%, the next day 84%, the next day 75%, the next day 71%, etc, different everyday. How do you go about analyzing the test suite stability to get the pass rate up?
Depending on the experiences of the tester, this question can go in several directions. Naive testers will just start digging in and debug the first failure without any context, without prioritization, without understanding what they are doing. This is the most common interviewee, unfortunately. Excellent “Senior” well established testers, or those who are relatively active currently in the test community may question why we even care about pass rate; and then we’ll have a discussion about what actually is a test suite and why are so many tests run every night (e.g. why not utilize code coverage to determine which tests to run based on what code changed). In the 2 years that I’ve asked this question, I’ve never had this discussion. 😦 Good testers will start by asking lots of questions to understand the test suite better, what is getting run, whether the builds are changing each night or is the same code, are the tests prioritized, etc. Good testers will dissect the problem down to get data and methodically analyze the data in order to make sense of the chaos.
I ask this interview question because I keep running into this scenario – I keep joining teams that have large automation suites with a “some amount” of test instability. I say “some amount” in quotes because, too often, the people who own the tests don’t take the time (or don’t know how) to understand what specifically is failing and why. Sometimes, the test suites are huge (10,000 tests). But even with a suite of 500 tests, when you see about 20% of your tests failing with every run, it’s human nature to throw up your hands and move on to something else more exciting, because you’ve got other shit piling up that needs to get done for this sprint. Sure, one mitigation would be to schedule time in the sprint to address the tests. But often, people don’t know where to start in order to figure out how much time needs to be pre-allocated. This gives me sad face.
Let’s assume for purposes of this blog, that we have a test suite of regression scenarios, all are of equivalent priority (e.g. Priority 1 tests) and, for whatever reason (lack of sophistication, tooling, end of sprint full regression test, etc) we need to run all of these tests.
So what do you do? You know you have a big test suite. You know that some number of tests are always failing, some number are sometimes failing, some are always passing (and some are inconclusive/timing out). Where do you begin with your analysis? How do you prioritize your work in order to budget time for fixing stuff?
Step 1: Get the Data
If you don’t have data, you don’t have shit. If you’re not using a test runner or automation system that automatically captures test result data and stores them somewhere (SQL is a fine choice), then build one. On my current team, we use SpecFlow and run the tests via MSTest in Visual Studio. We parse the TRX files and import the data into a custom SQL database that captures some very simple data. Here’s a quick and dirty schema we whipped up to capture the basic data we need (yes, this could be greatly improved, but we wanted something simple and fast):
Step 2: Analyze Data
Now you get to figure out where you want to begin. All things being equal, attack the worst tests first. If you have tests that are higher priority for acceptance, attack those first. Figure out the criteria that you need in order to get the biggest bang for the buck. In many cases, the Pareto principle applies: 80% of the failures come from 20% of the tests. Here’s a [sanitized] graph of the data we’re currently seeing in a certain test suite:
This isn’t the first time I’ve seen this graph. I’ve seen this on literally every single team I’ve worked on in my 15year career.
Step 3: Get Off Your Ass and Fix It
You have the data. You have the graphs. You know what tests suck. So do something about it! The graph above pretty clearly shows that about 20% of the tests in this dataset suck, with about 5% sucking hard (i.e. failing all the time). And then there’s a long tail of tests that fail about 20% of the time. This may require more analysis time.
Tips for Analysis:
1. Collect meta data about the test runs. Any sort of meta data about the test run, tests, environment, configuration can help in your analysis. For example, I was on a team where we were down to analyzing failures in the ‘long tail’ of the graph. A particular set of tests would always fail on the 2nd Tuesday of the month. After digging into the test run properties, we determined that the tests were run on a German build of Windows 2008 R2 only the 2nd Tuesday of the month. Bingo! Sure enough, when we manually ran the test on German Win2K8R2, it would fail. Every. Time.
2. Make the time to fix the tests. If your team is spending the time to automate regression tests, then having them sit around constantly failing for unknown reasons is a complete waste of the team’s time. That’s time that could have been spent doing more exploratory testing, perf testing, scalability testing, etc. Stop wasting time and money.
Now the uber question is why does this all matter? What do test pass rates really tell you anyway? Well, it depends on how your team tests. You could fix all of the automated tests and make sure they pass 100% of the time, making the techniques above moot. Great. But what scenarios are you missing? What code isn’t getting covered by your tests? How much time are you spending exploring the codebase for issues? How’s the perf? The automated tests by themselves are completely pointless if the team isn’t doing testing. The regression tests are just another weapon in the testing arsenal. Make them work and work well so that the team can spend time doing some actual testing.