Unit tests and data providers, the readable way

Thomas Dutrion
9 min readMar 31, 2022

Things in computing science are sometime complex… And I consider myself a fervent proponent of self descriptive code to limit complexity.

I won’t get back on why you should unit test at least some of your code, nor will I spend time on teaching how to write unit tests in this article. I will consider that all of you are ok with these concepts and implementations. Examples will be based on a PHPUnit implementation.

TL;DR

We can use generators, private methods and such to make tests more readable and maintainable.

<?phpdeclare(strict_types=1);require __DIR__.'/Person.php';use PHPUnit\Framework\TestCase;class Test extends TestCase
{
/**
* @dataProvider provider
*/
public function testPlainEnglishSentence(string $name, int $brothers, int $sisters, string $expected): void
{
}
public function provider(): iterator
{
}
private function provideNoSiblings(): array
{
}
private function provideSingleBrother(): array
{
}
}

Works best with code folded by default (code folding on PHPStorm).

Another option, selected by some of my coworkers, would be a single provider with named datasets:

public function provider(): iterator
{
yield 'no siblings' => [
'name' => 'John',
'brothers' => 0,
'sisters' => 0,
'expected' => 'John has no siblings',
];
yield 'single brother' => [
'name' => 'John',
'brothers' => 1,
'sisters' => 0,
'expected' => 'John has a single brother',
];
yield 'single sister' => [
'name' => 'John',
'brothers' => 0,
'sisters' => 1,
'expected' => 'John has a single sister',
];
yield '2 brothers' => [
'name' => 'John',
'brothers' => 2,
'sisters' => 0,
'expected' => 'John has 2 brothers',
];

yield '3 sisters' => [
'name' => 'John',
'brothers' => 0,
'sisters' => 3,
'expected' => 'John has 3 sisters',
];
yield 'one of each' => [
'name' => 'John',
'brothers' => 1,
'sisters' => 1,
'John has a brother and a sister',
];
yield '4 brothers and a sister' => [
'name' => 'John',
'brothers' => 4,
'sisters' => 1,
'John has 4 brothers and a sister',
];
yield '5 sisters and a brother' => [
'name' => 'John',
'brothers' => 1,
'sisters' => 5,
'John has 5 sisters and a brother',
];
yield '6 brothers and 7 sisters' => [
'name' => 'John',
'brothers' => 6,
'sisters' => 7,
'John has 6 brothers and 7 sisters',
];
}

Say we have the following problem:

A Person can have zero, one or multiple siblings. Some of them identify as Male (brothers), others as Female (sisters). For the sake of this example, let’s use this binary view, and I’ll let you put in the work to add an option for non-binary.

<?phpclass Person
{
public function __construct(
private readonly string $name,
private readonly int $brothers,
private readonly int $sisters,
) {}
public function name(): string
{
return $this->name;
}
public function brothers(): int
{
return $this->brothers;
}
public function sisters(): int
{
return $this->sisters;
}
}

In this article, our goal is to be able to output a sentence in plain English, saying one of the following:
* {name} has no siblings
* {name} has a single brother
* {name} has a single sister
* {name} has {brothers}brothers
* {name} has {sisters}sisters
* {name} has a brother and a sister
* {name} has {brothers}brothers and a sister
* {name} has {sisters} sisters and a brother
* {name} has {brothers}brothers and {sisters}sisters

Note: if you read carefully, ladies are always second, except when the number of ladies is greater than the number of men.

First thing we want to do is design a test for these cases.

composer require phpunit/phpunit

If unlike me you’re not working from a high speed train this shouldn’t take long… Then we can create our test (test.php):

<?phpuse PHPUnit\Framework\TestCase;class Test extends TestCase
{
public function testPlainEnglishSentence(): void
{
}
}

Run the test, just to get it red just like a TDD practitioner 😁

vendor/bin/phpunit test.phpPHPUnit 9.5.19 #StandWithUkraine
R
1 / 1 (100%)
Time: 00:00.013, Memory: 4.00 MB
There was 1 risky test:1) Test::testPlainEnglishSentenceThis test did not perform any assertions/Users/tdutrion/Development/lesechos/test/test.php:7OK, but incomplete, skipped, or risky tests!
Tests: 1, Assertions: 0, Risky: 1.

We can not test the first case (if you want more reading, read about AAA, Arrange, Act, Assert):

public function testPlainEnglishSentence(): void
{
// Arrange
$expected = 'John has no siblings';
$name = 'John';
$brothers = 0;
$sisters = 0;
$target = new Person($name, $brothers, $sisters);

// Act
$actual = $target->plainEnglishSiblingsDeclaration();

// Assert
$this->assertEquals($expected, $actual);
}

Do not forget to create the Person.php file to create the actual class, and require it from the test file.

Obviously, we also need to create the method we are calling from the tests, because running the tests will now result in the following error:

Error: Call to undefined method Person::plainEnglishSiblingsDeclaration()

Let’s add a basic implementation, TDD style:

public function plainEnglishSiblingsDeclaration(): string
{
return 'John has no siblings';
}

Test again, and we are supposed to go green!

vendor/bin/phpunit test.phpPHPUnit 9.5.19 #StandWithUkraine.                                                                   1 / 1 (100%)Time: 00:00.012, Memory: 4.00 MBOK (1 test, 1 assertion)

Next up: add the other tests… We want to add 8 other possible cases.

Solutions are the following:

  • 9 test methods for the 9 possible cases
  • One data provider that uses 9 times the exact same method
  • One method per case, aggregated through a provider method

While I do like the first option (for me the more descriptive as one has to actually name every method properly, and really describe what the case is supposed to be rather than just having random variables), it’s not of everyone’s taste.

The second option is pretty usual, especially when coming from Java, with a single data provider. Because solution 3 is in my opinion on improvement on top of this solution, I will implement it just now (as per the documentation).

/**
* @dataProvider provider
*/
public function testPlainEnglishSentence(string $name, int $brothers, int $sisters, string $expected): void
{
// Arrange
$target = new Person($name, $brothers, $sisters);

// Act
$actual = $target->plainEnglishSiblingsDeclaration();

// Assert
$this->assertEquals($expected, $actual);
}
public function provider(): array
{
return [
['John', 0, 0, 'John has no siblings'],
['John', 1, 0, 'John has a single brother'],
['John', 0, 1, 'John has a single sister'],
];
}

Let’s give it a go: vendor/bin/phpunit test.php

PHPUnit 9.5.19 #StandWithUkraine.FF                                                                 3 / 3 (100%)Time: 00:00.015, Memory: 4.00 MBThere were 2 failures:1) Test::testPlainEnglishSentence with data set #1 ('John', 1, 0, 'John has a single brother')Failed asserting that two strings are equal.--- Expected+++ Actual@@ @@-'John has a single brother'+'John has no siblings'/Users/tdutrion/Development/lesechos/test/test.php:232) Test::testPlainEnglishSentence with data set #2 ('John', 0, 1, 'John has a single sister')Failed asserting that two strings are equal.--- Expected+++ Actual@@ @@-'John has a single sister'+'John has no siblings'/Users/tdutrion/Development/lesechos/test/test.php:23FAILURES!Tests: 3, Assertions: 3, Failures: 2.

Oh no! It’s broken!

Fear not, remember, we haven’t edited the implementation yet, just the tests…

public function plainEnglishSiblingsDeclaration(): string
{
if ($this->brothers === 0 && $this->sisters == 0) {
return 'John has no siblings';
} elseif ($this->brothers == 1 && $this->sisters == 0) {
return 'John has a single brother';
} elseif ($this->brothers == 0 && $this->sisters == 1) {
return 'John has a single sister';
} elseif ($this->brothers == 1 && $this->sisters == 1) {
return 'John has a brother and a sister';
} elseif ($this->brothers > 1 && $this->sisters == 0) {
return "John has {$this->brothers} brothers";
} elseif ($this->brothers == 0 && $this->sisters > 1) {
return "John has {$this->sisters} sisters";
} else {
return "John has {$this->brothers} brothers and {$this->sisters} sisters";
}
}

It definitely looks ugly!! No worries, once we have proper and readable tests, we can go through a refactoring session with a little bit of algorithm.

PHPUnit 9.5.19 #StandWithUkraine...                                                                 3 / 3 (100%)Time: 00:00.015, Memory: 4.00 MBOK (3 tests, 3 assertions)

Our pre-existing tests are now green, but we haven’t tested every possibilities yet and the array is already not massively readable (though this use case is still readable, other real life examples may look more complex).

return [
['John', 0, 0, 'John has no siblings'],
['John', 1, 0, 'John has a single brother'],
['John', 0, 1, 'John has a single sister'],
];

Adding the other cases:

return [
['John', 0, 0, 'John has no siblings'],
['John', 1, 0, 'John has a single brother'],
['John', 0, 1, 'John has a single sister'],
['John', 2, 0, 'John has 2 brothers'],
['John', 0, 3, 'John has 3 sisters'],
['John', 1, 1, 'John has a brother and a sister'],
['John', 4, 1, 'John has 4 brothers and a sister'],
['John', 0, 5, 'John has 5 sisters and a brother'],
['John', 6, 7, 'John has 6 brothers and 7 sisters'],
];

If you look closely, two elseif are missing in the previous implementations, so tests are going red again, and you need to add them, but that’s out of the current topic.

Since PHP 5.5, we can also use the keyword yield (generator syntax). This will help us break the massive array we currently have, and give an opportunity to write comments if needed.

public function provider(): iterator
{
yield ['John', 0, 0, 'John has no siblings'];

// One can add a descriptive comment if needed to explain
// data used in the following example
yield ['John', 1, 0, 'John has a single brother'];

yield ['John', 0, 1, 'John has a single sister'];
yield ['John', 2, 0, 'John has 2 brothers'];
yield ['John', 0, 3, 'John has 3 sisters'];
yield ['John', 1, 1, 'John has a brother and a sister'];
yield ['John', 4, 1, 'John has 4 brothers and a sister'];
yield ['John', 1, 5, 'John has 5 sisters and a brother'];
yield ['John', 6, 7, 'John has 6 brothers and 7 sisters'];
}

Because nothing can be more descriptive than the code itself, here’s another possibility (two in fact) to label datasets in a specific way:

public function provider(): iterator
{
yield $this->provideNoSiblings();
yield $this->provideSingleBrother();
yield ['John', 0, 1, 'John has a single sister'];
yield ['John', 2, 0, 'John has 2 brothers'];
yield ['John', 0, 3, 'John has 3 sisters'];
yield ['John', 1, 1, 'John has a brother and a sister'];
yield ['John', 4, 1, 'John has 4 brothers and a sister'];
yield ['John', 1, 5, 'John has 5 sisters and a brother'];
yield ['John', 6, 7, 'John has 6 brothers and 7 sisters'];
}
private function provideNoSiblings(): array
{
return [
'name' => 'John',
'brothers' => 0,
'sisters' => 0,
'expected' => 'John has no siblings',
];
}
private function provideSingleBrother(): array
{
return [
'name' => 'John',
'brothers' => 1,
'sisters' => 0,
'expected' => 'John has a single brother',
];
}

Two things here:

  • First, we can name the arrays to describe what values actually are. We could also use variables and return an array of said variables.
  • Second, we can name private functions to describe the content of the dataset overall.

With a good IDE, you may fold methods by default when opening the files, and then see the following:

<?phpdeclare(strict_types=1);require __DIR__.'/Person.php';use PHPUnit\Framework\TestCase;class Test extends TestCase
{
/**
* @dataProvider provider
*/
public function testPlainEnglishSentence(string $name, int $brothers, int $sisters, string $expected): void
{
}
public function provider(): iterator
{
}
private function provideNoSiblings(): array
{
}
private function provideSingleBrother(): array
{
}
}

The class is then fairly simple but highly descriptive. Public functions may be use by external components (i.e. PHPUnit CLI), while private will be restricted to the current class, possibly the provider as they do also “provide”.

Each function name describes the test case.

As stated in the TL;DR section, some of my coworker do not like having many methods. We can then keep the associative array idea an name the dataset, which would result in the following:

<?phpdeclare(strict_types=1);require __DIR__.'/Person.php';use PHPUnit\Framework\TestCase;class Test extends TestCase
{
/**
* @dataProvider provider
*/
public function testPlainEnglishSentence(string $name, int $brothers, int $sisters, string $expected): void
{
}
public function provider(): iterator
{
yield 'no siblings' => [
'name' => 'John',
'brothers' => 0,
'sisters' => 0,
'expected' => 'John has no siblings',
];
yield 'single brother' => [
'name' => 'John',
'brothers' => 1,
'sisters' => 0,
'expected' => 'John has a single brother',
];
yield 'single sister' => [
'name' => 'John',
'brothers' => 0,
'sisters' => 1,
'expected' => 'John has a single sister',
];
yield '2 brothers' => [
'name' => 'John',
'brothers' => 2,
'sisters' => 0,
'expected' => 'John has 2 brothers',
];
yield '3 sisters' => [
'name' => 'John',
'brothers' => 0,
'sisters' => 3,
'expected' => 'John has 3 sisters',
];
yield 'one of each' => [
'name' => 'John',
'brothers' => 1,
'sisters' => 1,
'John has a brother and a sister',
];
yield '4 brothers and a sister' => [
'name' => 'John',
'brothers' => 4,
'sisters' => 1,
'John has 4 brothers and a sister',
];
yield '5 sisters and a brother' => [
'name' => 'John',
'brothers' => 1,
'sisters' => 5,
'John has 5 sisters and a brother',
];
yield '6 brothers and 7 sisters' => [
'name' => 'John',
'brothers' => 6,
'sisters' => 7,
'John has 6 brothers and 7 sisters',
];
}

Let me know your thoughts and practices regarding tests readability. Plenty of teams, plenty of ideas! For example you can have a look at this article regarding method naming in tests: https://mnapoli.fr/using-non-breakable-spaces-in-test-method-names/.

--

--

Freelance PHP Developer / Web architect, @scotlandphp organiser | Zend Certified Architect