Where did we go wrong with testing?

Posted on Posted in iOS

For a long time, I’ve had this feeling that something is wrong with how we test our applications. To be specific how we write tests. As software developers, we usually don’t get many chances to stop and really think about tests. Even when things are a bit calmer during the sprint, we tend to spend that time fixing bugs and refactoring code, but not really working on tests.

As a result, we typically proceed with writing tests in the same way as we’ve always done. However, we soon realize that these tests aren’t catching any bugs, even though the coverage looks impressive. Yet, when we attempt to refactor the code, the process quickly becomes a pure nightmare.

Tests that were supposed to help us avoid bugs when we change the code end up making it hard to actually make those changes. After every refactoring, all related tests stop compiling and we have to rewrite them, set up new mocks, and understand all the implementation details one more time just to simulate the correct behavior.

I started analyzing tests from my personal and enterprise projects, remembering problems that I encountered, reading many, many articles, and listening to some tech talks. All of this to try to understand what’s the right way to go and how can we improve our tests.

Below you’ll find a collection of my thoughts and ideas from this journey.

Three Shamans

Browsing through all kinds of resources on the internet, I found out that there are three types of Shamans that are trying to persuade us to use some specific approach that works for them.

Illustration by Sergey SerSpiriT

Integration Tests Shaman

This Shaman is trying to convince us that mocks are demons, we should almost never mock, and test everything with real implementations. I believe it could work in some specific projects and in some specific technologies, but in general, it’s rather hard to achieve.

It seems to be the perfect solution when you think about it for the first time. And… I almost got caught by this idea when I started to idealize this approach in my head and think about how it could solve all the issues with unit testing. It’s hard to spot problems unless you go deeper and start thinking about how you are really going to test specific classes against specific behaviors without mocks.

Nevertheless, there are a few good things about integration tests and a few bad things, as always :).

👍 Pros:

  • They test the system as a whole, which is very close to what you get in production.
  • They are able to spot data races and unexpected side effects because they allow components to interact with each other without artificial mocking.
  • Hypothetically, they are easier to set up and read because you don’t need mocks. You can just set up the lowest layer like a fake database and fake API, and you are ready to test.
  • They don’t rely on implementation details, therefore they don’t make your life miserable when you refactor the code.
  • They are significantly less fragile to changes than unit tests.

👎 Cons:

  • They are difficult to enforce the specific state expected in the test.
  • They are slower than unit tests because some asynchronous interactions and I/O operations aren’t mocked.
  • It’s hard to build all dependencies. You have to have access to concrete implementations. If you live in a modularized app world, it may require some extra work. Most likely you will need a DI container to inject all dependencies. Unlike in unit tests where you have a single level of dependencies, in integration tests, you have to build the whole tree of dependencies.
  • It’s hard to pass mocks when you need them. You have to make sure that all objects will start using this mock. And the mock must support more operations than what just our SUT requires.
  • Swift has its limitations and if you want to check if the dependency was called without mocking it, you can’t. Therefore, you either mock it, or assert some related values if possible, or don’t test it at all.
  • If you break something, looots of tests will fail, basically, every test which depends on this specific functionality. Imagine trying to spot what’s the problem if you get test failure only on CI and you see 50 failing tests. A single failure in your low-level class could possibly fail 70% of tests.

This approach could be promising if you manage to handle cons in an efficient way. Especially, if you figure out your way to quickly set up the whole environment and efficiently verify expected behaviors.

However, I’m not sure how to do it well in Swift. Most Shamans who are selling this approach, never get to the point of real-world examples :). Mocking the whole API and databases from top to bottom sounds like a headache.

Anyway, I think I will give it a shot in the future. Actually, I’m already doing something like that by using snapshot tests, but I’m also interested in applying this approach to test classes and functions.

Example of the Integration Tests Shaman: TDD, where did it all go wrong.
(Actually, this Shaman knows a mix of integration and BDD wizardry)

Unit Tests Shaman

Those Shamans are the most popular on the internet. They often fight with each other to define what is a unit and what isolation means :). This wizardry is quite close to my heart because I spent most of my career following some version of it.

Although, as I mentioned at the beginning, at some point I started seeing lots of flaws and problems with this approach and I started questioning this Shaman.

Unit testing assumes that we mock everything and test our functions in a fully isolated space. Unit Tests Shamans assume that if tests of A pass and tests of B pass, then our system is correct and A & B will work together. Which is obviously not true in lots of cases :).

👍 Let’s start with things that I believe are really good in this approach:

  • It allows us to test exactly the part of the code that we want.
  • If a test fails, we know what is the problem and where it is.
  • It forces us to write better code. If something is hard to test, usually it means that our code requires refactoring. I think it is one of the most important things in unit testing.
  • It is fast! Like really fast, we mock everything, time, delays, and async responses. We are the god of the environment.

👎 So what is really bad about unit tests?

  • In modern programming, most projects rely HEAVILY on dependency injection. Everything is a dependency. Therefore, we have to mock TONS of dependencies. Even though we have things like Sourcery that help us generate mocks, we still have a lot of work to set them up properly and to simulate the expected behavior.
  • Configuring mocks quickly gets out of control, we end up with mock configurations in setUp functions, in tests, in helpers, basically everywhere.
  • 95% of the time unit tests fail not because of a bug, but because of changes in implementation details, new dependencies, refactoring, bugs in the test itself, or incorrect mock setup. Basically, often they don’t prevent us from introducing bugs, but from introducing changes :).
  • Given, When, Then, right? It quickly becomes a mess. It turns out that in “Given” we have to set up some crazy closures in our mock, then later we need to add some extra call in “When”, and “Then” becomes a bunch of asserts testing arbitrarily defined magic values.
  • They don’t detect bugs :(. Most of the time.

I have very mixed feelings about classic unit tests. I have never seen them written well, always sooner or later they become a mess. Of course, if you test some pure logic and pure functions, then tests look nice. But if you test view models, and some facades that aggregate lots of dependencies then they become a nightmare.

The issue that I see is that unit tests focus heavily on implementation details which in a modularized world mean problems. If you rely on specific paths and specific interactions with lots of dependencies, you condemn yourself to not being able to refactor your code without rewriting lots of tests.

Example of the Unit Tests Shaman: Integrated Tests Are A Scam.

No Tests Shaman

I get this point to some extent. It could work in some small, short-term, developed by 1-2 devs projects. When the project is dynamically changing at the beginning, by not writing tests you get much more time to improve the code base, test it manually, features are fairly simple, and you can easily catch and fix bugs by just testing the app.

However, I think that if you rely on complex logic, the project is supposed to be developed for a long time, and features are going to grow, then you need at least some tests for core logic + snapshots to ensure that you don’t break the UI. Otherwise, every change will be scary.

I can tell from my experience, that in the past most of my personal projects didn’t have tests and I have never fallen into trouble. I always had very few bugs in production, usually nothing serious, and always my rating in the App Store was around 4.7, so I would say objectively that the quality was good. That’s why I can somehow relate to this Shaman.

However, those projects were relatively small, so I was able to go through all scenarios in the app quite easily and spot all the problems. Also, I gained a lot of experience with apps since I started developing them around 20 years ago, so I managed to evolve some sixth sense to know how to avoid some bugs and how to test apps on my own :).

I wouldn’t treat this Shaman as the way to go. I would say this is some solution for some specific cases and it may work for one person and doesn’t work for another. That’s why we tend to develop standards that work for most people, not for individuals only.

Example of the No Tests Shaman: The ONLY REASON To Unit Test.

Why Unit Tests Don’t Find Bugs?

Illustration by Louis-Philippe Desjardins

In real life, we have plenty of concurrency and side effects which are often the main reason for bugs. It’s easier for us to understand linear flow than multidimensional flow with context switching in the middle of the function, asynchronous actions and notifications, plus on top of that user inputs at the same time. Only high-level tests are able to detect these kinds of issues.

Simple Tests Don’t Find Complex Bugs

Unit tests just verify that when you call a function you either get the expected output or the expected side effect. Most of the time your tests will fail not because your code is wrong but because your test is wrong or your mock is not configured correctly. So, most of the time you fix tests, not the code.

The key challenge while introducing changes is not to break other things by producing unexpected side effects after a series of calls and interactions between components. And that’s what unit tests don’t verify.

Unit tests are focused on verifying a single call in a “laboratory environment” with a clean state at the beginning. At this point, you have no idea how multiple side effects will interact with each other during the whole lifetime of your application.

Bugs may even be caused by how you register your class in a DI container. It is not possible to detect these issues on the “unit” level.

Skewed Point Of View

If you don’t see a bug in a function, most likely you won’t be able to write a test that detects this bug. Tests that you write are based on your understanding of the implementation details.

If you write the function based on your understanding of possible cases, you most likely won’t cover other cases in your tests either.

That’s why almost always someone else tests your changes manually – to look at the problem from a different angle. That’s why we also have code reviews.

However, this is not the case with unit tests, they are skewed from the very beginning as the author tests his own changes.

We have a slightly better situation when the code is later changed by another person. At this point, at least the person who wrote the tests is someone else. But if the tests were focused on implementation details rather than behaviors, they would have to be rewritten anyway.

Testing Implementation Details Instead Of Behaviors

If we test implementation details, we can’t even get to the point when the test has an opportunity to detect a bug, because every change in the implementation requires the test to be rewritten from scratch. Especially, if you just did some refactoring.

Chasing After Code Coverage

Setting code coverage goals is a big mistake, in my opinion. This is the simplest way to start writing tests for the sake of writing tests, just to increase the coverage. It is much better to have a small number of tests that are well-curated and test what needs to be tested than thousands of tests that verify trivial things and make refactoring painful.

Asynchronous World Doesn’t Like Isolation

It’s hard to find bugs by just unit testing heavily concurrent code. During unit tests, most of your asynchronous operations will be mocked. On top of that, all interactions with dependencies will be mocked as well. This way you are not testing the concurrency in its real environment and you can’t expect to detect these kinds of issues.

What To Do? Are We Doomed?

Illustration by polo trc

Know Your Tools

There are places where unit tests are extremely effective and useful, but there are also places where unit tests are extremely useless.

The closer you are to the UI layer, user interactions, and asynchronous integration between dependencies, the less useful they are. The closer you are to algorithms, data operations, calculations, and pure business logic, the more useful they are.

For the highest layer like view models and UI itself, I absolutely love snapshot tests. They are able to test both the whole integration and what the user sees at the same time. In my case, I think they detected the most number of bugs and gave me the best comfort when changing things. Take a look at the snapshot from my application Snippety:

The one simple snapshot is verifying around 20 features at once including search, autocompletion, rendering document with placeholders, database integration, state of all settings, dark mode, localization, and many more. The cost of maintenance is almost zero, if I introduce new changes I just record the screenshot again and that’s all.

Know When To Use Which Tool

  1. When your project is in the very early phase, you are just setting up the architecture, starting to understand the domain, and everything is constantly changing then maybe try with the No Unit Tests Shaman.
  2. When your project begins to establish, you start writing more business logic, calculations, validations, etc., then maybe the Unit Tests Shaman is the way to go. Also, start covering your screens with snapshot tests to ensure that they won’t be altered without any notice.
  3. When your project is getting matured, move your focus more into integration tests to cover interactions between components and possible data races. You can try snapshot testing and/or UI automated testing.

Avoid Fragile Tests And Focus On Behaviors

When you start covering the code so furiously that your tests turn into a sophisticated checksum of your code, you create fragile tests. In other words, you test the specific implementation and the sequence of calls inside the method instead of the behavior itself.

If you need to mock lots of things just to test a simple function, it usually means one of two things: you are either trying to use the wrong type of test, or you need to refactor your code to extract the “meat” that you are trying to test.

Try testing behaviors, not implementation details unless you are implementing super core features on which many components are going to rely.

Testing implementation details works well with low-level things like algorithms, calculations, and transformations. Testing behaviors works well with high-level things that integrate many dependencies and don’t introduce much of their own complexity – like view models.

If you start testing behaviors correctly, your test won’t be broken every time you introduce some changes to the code.

Avoid Testing Trivial Code

Don’t test the language and basic data structures. Don’t test if append adds an item to an array. Don’t test if the function call calls the function. These kinds of tests have zero value and they exist only to slow down your development.

Stop Mocking & Injecting EVERYTHING

If you write tests, then for sure you are familiar with mocks, spies, fakes, and/or stubs. It seems like they are unavoidable in unit testing, but are they really?

Once you start mocking all your dependencies while testing the code, it will very quickly turn out that you are actually not testing your code but some handicapped alternative version from another universe.

I think it’s essential to find some balance between things that require dependency injection and mocking and things that don’t.

For example, if you have an email validator then most likely you don’t need to exchange its implementation and you can easily use the real implementation directly from the code. This way you don’t have to mock it in tests either.

Implement Fluent Builders For Mocks

Stop spreading mock configurations all over tests. I know that Sourcery generates mocks for us, but at least try creating a nice fluent helper to build mocks in a more readable manner.

Seeing configurations of autogenerated fields and functions everywhere makes tests hard to understand.

An example of how setting up mocks could look:

[Bonus] Control Your State And Side Effects

The majority of bugs come from the application being in an unexpected state that the developer did not predict during the development. This state often results from various side effects produced by different dependencies.

Handling these problems is tricky because changes in state occur separately from each other and often components are not aware of how their state affects all other parts of the application. Therefore, even a simple change in one place can break many things in other places.

To reduce those problems Redux-like architectures have been introduced. Recently, they become more & more popular on iOS as well, especially TCA. They ensure that state transformations are triggered only by predefined actions and in a controlled manner using pure functions called reducers. Having pure functions makes testing much easier and much more predictable.

It’s worth to give it a try and at least see how it works :). Here are some materials:

What’s Next

I would recommend getting familiar with BDD and some other good conclusions about tests. Here you will find a few interesting articles:

Boost Your Work

Psst! If you want to boost your productivity and see if my tests found all the bugs, check out this app.

Snippety is a tool that can make daily tasks more enjoyable by providing quick access to your snippets. Snippety works flawlessly with every text field! Just press ⌘⇧Space, find your snippet, and hit ↩︎. You can define also your keywords and use snippets by just typing without even opening the app!

Snippety Keyboard
App Store - Snippety