-
Notifications
You must be signed in to change notification settings - Fork 0
Motivations to abort failing tests
Looking at the problem from afar, I think most of us can agree, that if we knew which tests will pass and which will fail for sure (and why!), then we would stop running tests and would only rely on this magical source of information we have. I would bet, nobody likes running tests for the sake of running tests. We do it to have a sense of sanity/security/validation about the overall quality of our system.
As most likely many of us will need to continue running tests, since we don't have a better way of getting the same result, I would propose to think about tests and when could/should we skip running them.
Addressing the elephant in the room, all of the test frameworks/build tools offer some way of stopping as soon as the first test fails. This is great as it saves a lot of time. If you are used to this feature, for example used it in real life, you are familiar with a side effect. It tends to omit a great deal of information, treating the rest of the tests as the cat in Schrödinger's box, they are either passing or failing. If you are unlucky, and the cat is not alive by the time you open the lid, when you fixed the one known failing test, you can end up running the X number of passing tests you have executed last time as well plus the one you have fixed now, just to see the next one failing. This can be a soul-crushing experience if it happens more than once. Also, you will reach a point where actually running all of the failed tests in the first round, than fixing all the failures at one go, would have been done faster.
One of the motivations we can have for aborting predicted tests (and only those) is coming from exactly this situation. We need to know what is working, so that we can fix everything broken before we would retry.
If we have some core dependencies in our tests, for example if the login is not working, we will not be able to do a lot of things which are protected by the authentication. It is fair to expect, that executing the same login steps at the beginning of 30 test cases, will probably bring similar results most of the time unless you have some flaky tests. If we knew that this similar behavior, wouldn't it be great to not try at all? Especially if it takes a minute to see a single test case fail.
Let's look at some similar things we are already using in the industry and try to understand how we could reuse the experience in this context.
The concept of having dependencies in testing is not new, TestNG was doing it 8-10+ years ago. Their approach is a bit more different from the starting point we are looking at, as we don't want to define dependencies between the tests. We prefer looking at the dependencies we are using in the production application rather and inform the tests about them so that the test runner can predict what the outcome will be just in time when the execution would start.
Also, our case is simply operating with thresholds at the time we are starting the test, while the TestNG dependencies have an impact on test order, it can be impacted by reruns etc. which are all augmenting the picture a bit. Lucky for us, we don't need to find all differences, we can focus on the similarities for now.
Caching is a great example of not doing the same thing twice, just saving the outcome of the first run and reuse it while we can accept it as valid. The obvious difference probably striking everyone is the thing we are caching. Traditional caching use cases tend to cache the result of the happy cases while we want to "cache errors" when we want to assume that a feature would be broken no matter what, so we don't need to bother running the test.
The TestNG approach of having only one failed and a bunch of skipped tests if we don't think something will succeed is very tempting. It does not change the outcome of the test suite, it will still signal failure/skip; and it can save time. The "caching of failure" is a promising concept as well but it is not really easy to implement/would be a very hacky solution by the time it would be done.
We need something more sophisticated, keeping the best parts of these two ideas:
- Still reporting at least one failure
- Not spending time on known failing tests
- We can predict failure based on history, just like caching is using the results from a previous run.
Abort-Mission tries to reach all of the aforementioned goals:
- It is augmenting tests with information about the dependencies the production code uses for the feature under test. Considering what can still impact the outcome of the test on the actual level is at the discretion of the developer, you need to set up your dependencies to be accurate (e.g. unit tests have mocked dependencies, the failure if the real deal cannot impact them).
- Allows failures to happen and get reported during burn-in
- Learns from past executions related to the same dependency
- Aborts test runs only if the related dependency is considered as failing/dead
On top of these, it can bring a number of other benefits as well:
- Affordable, won't noticeably slow down your build even if everything is good and has a moderate memory footprint
- Doesn't need you to change your production code one bit
- It is compatible with the most common test frameworks like JUnit and TestNG, allowing them to use most of their features on top of the benefits from Abort-Mission
- Having it integrated is similar to bringing a parachute when you get on a plane. You may never need it, but it is great to have one when you need to jump.
- It can generate nice colorful reports after test run.
If you have the time/manpower needed for the integration, it can be either a great addition if you mess up often or it can be something you will probably not see saving time but can be useful later. What do you think?