What are unit tests? Who is responsible for them? Why write them? Do we have enough of them? How do you design them? And what do you do with this hard-to-test code? A wide range of questions that I will try to answer below. 

This article is particularly aimed at developers who have questions or want to be reassured about best practices. However, it is also for all those who are directly or indirectly involved in their company’s testing strategy or “quality approach”.

The test levels 

Let’s first put unit tests in context. According to the recommendations of the ISTQB (International Software Testing Qualification Board), unit tests are the first level of dynamic testing in the software testing process. As a reminder, “dynamic” tests are the tests carried out by executing the code, as opposed to so-called “static” tests for which the code is not executed (code analysis, linting, compliance with coding rules, code reviews, etc.). 

 "white box" tests

They are “white box” tests because they are defined by having access to the internal structure of the system, unlike the “black box” tests that I explain in a previous article.

At AT Internet, “white box” tests are implemented by developers, while “black box” tests are automated by testers and acceptance tests are conducted by the customer’s representatives (the product owners). This organisation can vary from company to company with one consistent theme: unit testing is an integral part of development! 

What is a unit test? 

There are several definitions of unit tests, with one common theme – they are “a test that validates a unit of code in isolation”. This can take different forms depending on the source and can be interpreted in many ways, which may make the true nature of these tests unclear. Some questions often come up:

  • What unit of code should be considered in a unit test? 
  • How do you isolate the code you want to test? 
  • Are we “allowed” to load several functions/methods/classes in a single test? 
  • Must a single test cover the entire business of the “unit of code” under consideration? 

Personally, I think that the definition on Wikipedia is a good start, but still needs some clarification: 

[the unit test is] “a procedure to verify the proper functioning of a specific part of a software program or portion of a program (referred to as a “unit” or “module”)”. 

I think it is important to clarify certain terms: 

“The proper functioning” 

This is most often the correct application of a business rule or functional behaviour that can be written in the form of: 

should do XXXX when XXXX

It can also include non-functional aspects such as system performance or resilience. We will then have a title more along these lines: 

should run in less than X ms

should retry X times if no response from DB

“A specific part of a piece of software” 

This is the minimal code to be used to validate the targeted behaviour. Depending on the structure of the code and the language used, it can be: 

  • A method 
  • Several methods 
  • One class 
  • Several classes 
  • etc. 

The more “behaviour oriented” the code is through the use of practices such as DDD (domain driven design), BDD (behaviour driven development) or TDD (test driven development), the easier it will be to isolate the code providing a given behaviour. We will come back to this point later in this article.

Do you observe the symptoms or identify the causes?

It is important to differentiate between the symptoms and root causes of an anomaly. To do this you can use the following analogy:

Your car won’t start this morning! (symptom) So you call a breakdown mechanic who will investigate to find the causes of this problem and find a suitable solution.

Your car won't start this morning example

This is a valid process when it comes to studying a “malfunction” of an automobile. Let’s apply this to a defect in a piece of software:

Example applied on a bug

A “bug” is detected by the user: a feature does not behave as expected (symptom). A developer analyses the problem, investigates the code to find the root cause of the problem, identifies the piece of code responsible and makes a correction (bugfix).

This process is very costly for the company and can undermine project planning by “requisitioning” a developer to investigate. It is also uncomfortable for the developer who has to stop what they were doing to find a solution to the problem, often last minute. The developer’s main role is not to spend most of their time searching for the causes of a problem, unlike a mechanic whose primary task is to “find the malfunction”.

In our field, we are fortunate to be able to place the different “aspects” of our product under permanent control with unit tests. We can then benefit directly from information on the root causes of potential defects in our product and, with good tests, go from the simple observation of a symptom to the automatic identification of the cause of the problem.

Let’s go back to our broken-down car – these two messages displayed on the screen give different information: one describes a symptom (the engine does not start), the other targets the cause of the problem (injection failure):

Photo of two messages displayed on the screen give different information
“engine failure (symptom)” => “injection problem (root cause)” 

Here’s another example, on a software product like Explorer, a data mining tool in AT Internet’s Analytics Suite.

Example of Analytics suite

It is now clearer that in case of error, unit tests must allow users to directly target the faulty part of the code. This allows the developer to save precious investigation time and also to be under less pressure (and therefore be more motivated)! 

Why invest in unit testing? 

Why do we need to have such detailed information on the internal behaviour of our products, by investing in the implementation of these unit tests? There are several objectives. Among these, I can see four main reasons for this investment: 

  • Risk management 
  • Individual responsibility 
  • Development comfort 
  • Cost optimisation 

Better manage risks 

Gif of a guy jumping in an amount of snow
“When I deploy in prod without having passed the UT” (lesjoiesducode.fr

Detecting bugs before production release is a daily challenge for software publishers. At AT Internet, we have chosen to invest as early as possible in the development cycle to avoid regressions in our products. Investing in the different levels of testing allows us to manage this risk as well as possible, starting with the unit tests.

Unit test in the development cycle

The earlier the “stamp” is applied in the process, the lower the risk of discovering a bug in production.

Extending individual responsibility 

Gif of a baby who hesitates and makes faces
“When someone asks me if I have correctly written the UT” (lesjoiesducode.fr) 
  • How can you be proud of your code without having made sure it works properly? 
  • How can you feel responsible without providing your successor with a set of tests that will be used to secure future refactoring or maintenance? (and this successor can be you!) 

Pride and responsibility are two important characteristics in the developer’s job and very present in Agility. Writing unit tests serves both! We make sure that the product code works properly without entrusting someone else with this responsibility (tester, product owner, other team, other company (third party software testing)…) and at the same time, we ensure a level of quality over the entire life of the product. This longer-term vision of one’s activity is an important sign of responsibility and even professionalism!

For the rebellious, there are many “good” excuses for not writing unit tests available online. I’ll let you find out about them – there are really many that look very appealing, some of them very creative!

Improve team comfort levels

Having good unit tests means avoiding future emergencies, bugs that need to be fixed in a hurry, under pressure, dissatisfied customers, and loss of reputation for the company.

All these situations make life difficult for development teams and can have a direct impact on the motivation of developers if there are too many of them. The best way to protect yourself is to implement enough unit tests (if they are relevant).

Another equally important aspect of comfort is to reduce the amount of time spent debugging: the analysis you need to do to trace back to the line of code in question in order to apply the appropriate correction. The worst situation is to have to launch the entire application in “debug mode” to follow its behaviour step by step and finally succeed in identifying the faulty code. A good unit test allows you to directly target portions of code and validate their respective behaviour(s).

There will always be defects in production – but the goal is simply to limit their number and their impact. Any bug detected in production will give the opportunity to add the missing unit test(s) so that you don’t have to invest again all the time spent on the analysis and correction of the problem.

This is an opportunity to remind you that unit tests provide quick feedback, in a few tens of seconds. So much so that they have been systematized by the principles applied in practices such as eXtreme Programming:

eXtreme Programming feedback loops
eXtreme Programming feedback loops

By saving analysis time and anticipating the appearance of numerous bugs in production through the use of unit tests, everyone can keep the focus on adding value to the product rather than compensating for a lack of quality afterwards.

Optimising the ROI of our testing process

Schéma of optimising the ROI of our test pyramid

The higher we go up in our test pyramid, the higher the costs. This effect is multiplied because the cost increases involve both:

  • Setting up the tests
  • Maintenance of tests in place
  • The cost of executing the tests (machine time, servers…)
  • The time it takes to obtain a test result
  • Analysis time when a bug is detected

These are all good reasons to invest in the deepest layers to optimise the ROI of our testing process. Each stage will complement the tests already passed, with a lower test volume. Each test level has its importance and is essential to ensure the level of quality we want for our customers, but we have to consciously evaluate the investments at all levels so as not to be overtaken by the induced costs which can easily escalate.

The graph below also illustrates the costs and time required to achieve status throughout the development cycle. This is another good reason to invest in unit testing!

Schema of the bug cost
Bug cost (Copyright 2006-2009 Scott W. Ambler)

The “shift left” approach 

To complete the cost optimisation process and to further improve the return on investment of our testing strategy, we have adopted a “shift left” approach. This approach aims to shift test operations “to the left” to be carried out as early as possible in the development cycle. We therefore increase the effort to detect defects in the unit test phase, which enables us to limit the costs generated in the rest of the cycle: 

Gif of the "shift left" approach

It is possible to take this approach even further and try to detect more defects during the code phase itself by using static tests such as code analysis (with or without tools) or by extending peer-programming practices which are very effective in terms of the quality of the code produced, not to mention the positive impact on team spirit and skill sharing. 

Best (and not so good) practices 

It is therefore important to invest in the implementation of unit tests, but what can you do to ensure that they are relevant and effective? It is important to be aware of best practices when implementing unit tests. 

First, I think it is useful to remember that the test aims to detect if there is a difference between the expected behaviour and the implemented behaviour. It is therefore preferable to design these tests based on the specifications (whatever they are: prerequisites, technical specifications, stories…) rather than on the code itself, otherwise we risk only validating the implemented behaviours which may differ from the expected behaviours! 

The best way to avoid falling into this trap is to write the tests first, using any “test first” approach, for example in TDD (Test Driven Development). 

When designing and writing unit tests, it is a good idea to follow certain best practices or conventions: 

  • Tests must be restricted (in their functional scope): do not attempt to validate 50 things in a single test! 
  • They must be quick to run: no waiting time (60s) in a unit test! 
  • They must be automated: no manual execution of unit tests (has anyone really ever thought of doing that? 😊). 
  • Tests are clearly named and organised: today’s frameworks make it possible to perfectly organise tests and encourage you to choose relevant names, e.g. "describe CONTEXT... it SHOULD DO THIS" 
  • Tests can be run on the developer’s workstation: it must be possible to have a status of the unit tests without having to implement a complex mechanism of deployment / server startup etc. 
  • 1 test = 1 use case. Ideally, only one assertion per unit test: this makes it possible to know right away what is wrong when a test is in error (in real life, we can afford several assertions to validate the boundaries of the same behaviour). 
  • The results of a unit test must be stable over time. The test that ‘flashes’: a ‘red’ shot, a ‘green’ shot is unreliable and must be modified or deleted! 
  • The tests must be isolated and independent : each test must be able to be executed, alone, and after or before any other test. Its proper functioning should not depend on the context provided by another test. 

Other practices, however, should be avoided

  • Validating implementation details in a unit test: make sure that a method called with this parameter does not bring anything if it is not part of the behaviour of the code under test (for example, in the case of an orchestrator). This link with the implementation will make the test useless in case of refactoring the internal structure of the code: we will simply know that the implementation has been modified without knowing if the expected overall behaviour has been kept or not. 
  • Loading “too much” code for a unit test: this often depends on the design of the code, loading too much code is often a sign of too complex code or too little decoupled structures. 
  • Depend on external services: dependence on an external service will lead to a test sometimes in error, sometimes not, depending on the availability of this dependency. The use of mocks (stub, spy, fake, mock) is preferable to avoid this. 
  • Use try…catch: in unit tests, we expect this or that specific error. We prefer (when the framework allows it) the assertion on the good triggering of an error. 

I could list other good and bad practices, but I think it is already very effective to follow all these. 

What is the coverage? Where do you stop?

Depending on the context (legal obligations, company strategies, team guidelines…), you can set objectives for code coverage by unit tests (lines, branches, conditions). But watch out! It is easy to achieve very good coverage without ensuring business rules are really covered by tests.

I recently came across a file that seemed “very well tested”:

Example of a "very well tested" message

This same project, after a simple refactoring (method extraction) and a more extensive use of mocks in the tests fell to this level of coverage:

Example of a code that is not well tested

The tests in place had not evolved and the functional had not been extended: a simple reorganisation of the code and limitation of the code loaded by the tests showed that the job in this code was not as well tested as we thought!

So how do we proceed? How far do we go?

I quite like the point of view of Martin Fowler who advises (in his book “Refactoring”) to stop “as soon as you are confident that, if someone comes to introduce a regression in the code you have just produced, one of the tests in place will fall into error”.

Beyond this mindset, the extent of the unit tests that we choose to implement may depend, in my opinion, on different parameters:

  • The risks incurred in the event of failure of the code concerned: a critical project for our customers recently saw the implementation of more than 500 unit tests for 699 lines of code (with a coverage of “only” 88.9% of the code). 
  • Long-term testing comfort management: a project for which we wanted to avoid costly maintenance operations at all costs (for reasons of internal team organisation) saw its branch coverage approach 100%. 
  • Personal appreciation/team judgement: during code reviews carried out in teams, unit tests are also reviewed and the decision to have enough or not is a decision that is made by the whole team and may depend on the project, the context, the company… 

The main thing is to understand why the decision is made to stop (or continue to expand) testing. This decision must be conscious and informed to allow for a healthy life of the project over time. 

Design 

To avoid writing unnecessary tests or falling into an illusion of good coverage, it is important to design your unit tests well. For this I will distinguish two cases: designing tests on future code and designing tests on existing code (called “legacy”). 

Unit tests on new code

On the new code, we are lucky to have access to the desired behaviours: these are the behaviours to be implemented. We can then more easily define the unit test cases to implement, based on the acceptance criteria of the sprint backlog stories. We will proceed in 3 steps:

1. List the desired behaviours for a given work element. 

  • Base them on the stories and their acceptance criteria. 
  • Define input data sets and expected results. 

This can be an opportunity to challenge the product owner to define certain behaviours more precisely, if necessary. 

2. Define the set of use cases for each behaviour or “business rule”. 

  • Passing cases 
  • Error cases 
  • Borderline cases 

To do this, we can use different techniques for designing use cases such as decision trees (to go through all the paths in test cases) or equivalence classes (to avoid testing the same thing several times). 

3. Design the code to be able to test these behaviours (testable code). 

  • Ideally in a “test first” approach. 

“TDD is not a testing activity, it is a design activity” (M. Fowler). 

Unit testing on existing code

On existing code (or “legacy”), it is often more difficult to access the behaviours initially expected. It is then perilous to add unit tests when they are missing because we don’t have access to the original specification. It is then difficult to get out of a vicious circle that nobody wants to enter:

Vicious circle of a development process

Reverse engineering

The most common approach (but also the least comfortable and most time-consuming) is to engage in a brave reverse engineering to try to re-discover the behaviour of the code. This may shed some light on how the code currently works, but it doesn’t give us any information about the behaviour that is actually expected. User documentation can sometimes also help us – when it exists!

Taking advantage of developments

If we are in the presence of business users (product owner, product manager, development team, users…), a request for evolution or the detection of a bug by a user can be an opportunity to review the business in place.

Use the current characteristics of the system as a basis

We can request the code with different input data sets and simply record the resulting outputs. These outputs will be used as ‘expected behaviour’ for further testing to ensure that the current behaviour (whether correct or not) is not altered by the changes we are going to make to the code.

There is substantial literature on the subject, including Michael Feathers’ “Working effectively with legacy code”.

Mutation testing to identify gaps

When unit tests are already present and you want to extend the coverage of the business by testing, you can use mutation testing. This type of test consists of modifying the behaviour of the code by introducing “mutants” (certain conditions are modified, certain operators replaced by others, the body of certain branches emptied, etc.) and then running the existing unit tests to see if they detect the modification.

If a unit test fails, the mutant has been “killed”. Conversely, if no test fails, it means that the behaviour that has been changed is not within the scope of the unit tests in place: the mutant has “survived”.

This practice makes it possible, despite sometimes very good code coverage, to have a better view of whether the different behaviours in the code are covered or not. We can then prioritise the implementation of new unit tests on the code where the mutants have survived. Be careful however, the execution of these mutant tests is costly in machine time and can take a very long time: this is why we most often target certain carefully chosen files for these kinds of tests.

Code Testability

The question of code testability comes up fairly regularly when we talk about unit tests. The most recurring difficulties are described below. The simplest solution to these various issues is not to let them take hold, through the use of best practices or adapted tools:

  • Methods that are too complex to test: tools such as SonarQube ensure that “acceptable” complexities are not exceeded. The use of “quality barriers” even allows you to be alerted as soon as the code becomes too complex. 
  • Mocks that are too numerous and difficult to set up: the need for many mocks can be avoided by a different code structure. Good refactoring can make the code more easily testable and thus lead to tests that are simpler to design and implement. 
  • The need to load too much code to be able to test a behaviour: the TDD approach avoids having the behaviour too widespread in the code, diluted in many methods/classes. Only what is needed to render a given behaviour is coded, no more and no less. Principles such as KISS (Keep It Stupid Simple) or YAGNI (You Ain’t Gonna Need It) also promote a good match between code structure and product behaviour. 

Many refactoring techniques exist and are often integrated directly into the IDE (Extract method, extract variable, rename function etc…). I recommend Martin Fowler’s book “Refactoring”, which is a superb toolbox to modify/improve/upgrade any software code.

Keep in mind that any code that seems difficult to test will also be difficult to maintain, evolve and be subject to many bugs throughout its life.

Conclusion

In conclusion, I would stress that unit tests, if they are well designed and written to validate system behaviour, are an efficient investment, even essential to ensure the comfort and profitability of projects over their entire lifecycle. They form the defence against many potential regressions in production and make maintenance or refactoring actions safe.

At AT Internet, we have chosen to invest in this level of testing, by providing information, training, and support to the teams on the subject. We also encourage practices such as peer programming and code reviews (including test code) and listen to the teams to evaluate the problems encountered and spend time to find solutions.

Photo credits: Alex King

Author

With more than 10 years of experience in software testing strategy and implementation in an Agile environment, Alexandre is responsible for industrialising development at AT Internet. His daily challenge: guide our dev teams through implementing tools and methods with the aim of guaranteeing regular and high-quality deliveries to our customers.

Comments are closed.