Improve legacy app: limiting the scope!

Thomas Dutrion
Darkmira FR
Published in
7 min readNov 19, 2020

--

Code from a random file from phpmyadmin, line 367 to 373, that’s a lot for a single file!

Never have I ever worked in a file/function/method of more than a hundred lines… That game would probably go sideways with many developers! At least I know I wouldn’t last long, as I pride myself on working on many legacy projects.

Long chunk of code are universally recognized a problem for readability, because one has to read it in one go and keep track of the many variables. Usually it’s almost impossible to really track it all, and requires the utmost concentration, which means the developer can only work correctly for a few minutes, in a very quiet environment that is!

How bad is it doctor?

Let’s start with a little static analysis tool: PHPLOC. The aim here is to face the truth: our code is quite bad as is, but we will only be able to spare money to improve some bits, the rest of it will continue to live its life, so we need to target the worst part!

curl -o phploc.phar https://phar.phpunit.de/phploc.phar

For the following results, I have used it on a very small subset of a project I’m currently working on, here is the output:

phploc 7.0.1 by Sebastian Bergmann.

Directories 3
Files 10

Size
Lines of Code (LOC) 4776
Comment Lines of Code (CLOC) 1194 (25.00%)
Non-Comment Lines of Code (NCLOC) 3582 (75.00%)
Logical Lines of Code (LLOC) 1077 (22.55%)
Classes 798 (74.09%)
Average Class Length 133
Minimum Class Length 28
Maximum Class Length 218
Average Method Length 6
Minimum Method Length 0
Maximum Method Length 49
Average Methods Per Class 20
Minimum Methods Per Class 8
Maximum Methods Per Class 32
Functions 11 (1.02%)
Average Function Length 5
Not in classes or functions 268 (24.88%)

Cyclomatic Complexity
Average Complexity per LLOC 0.36
Average Complexity per Class 43.67
Minimum Class Complexity 6.00
Maximum Class Complexity 126.00
Average Complexity per Method 3.12
Minimum Method Complexity 1.00
Maximum Method Complexity 38.00

Dependencies
Global Accesses 21
Global Constants 19 (90.48%)
Global Variables 0 (0.00%)
Super-Global Variables 2 (9.52%)
Attribute Accesses 292
Non-Static 292 (100.00%)
Static 0 (0.00%)
Method Calls 196
Non-Static 196 (100.00%)
Static 0 (0.00%)

Structure
Namespaces 3
Interfaces 0
Traits 0
Classes 6
Abstract Classes 0 (0.00%)
Concrete Classes 6 (100.00%)
Final Classes 0 (0.00%)
Non-Final Classes 6 (100.00%)
Methods 121
Scope
Non-Static Methods 121 (100.00%)
Static Methods 0 (0.00%)
Visibility
Public Methods 121 (100.00%)
Protected Methods 0 (0.00%)
Private Methods 0 (0.00%)
Functions 2
Named Functions 2 (100.00%)
Anonymous Functions 0 (0.00%)
Constants 12
Global Constants 12 (100.00%)
Class Constants 0 (0.00%)
Public Constants 0 (0.00%)
Non-Public Constants 0 (0.00%)

Fear not, most of it does not matter for us in the context of this article, and this is some pretty messed up code too!

We can still see a few leads:

  • Not in classes or functions, 24.88%: we need to change that and make it part of smaller code parts.
  • Maximum Method Complexity, 38.00: this is way too much, early returns might be interesting…
  • Global Constants, 19: way too many, that should be about… 0!
  • Visibility: 121 public method, 0 private or protected. This can definitely be improved!

Another question: do we have tests, at least automated acceptance testing on the happy path? If not, proceed with caution when doing anything at all!

Apply SOLID and Object Calisthenics

This advice is for sure the best advice one can provide, mainly for new code bases though… Trying to blindly apply them on a legacy code base might be tricky, and could lead into a massive refactoring for which you probably have no time or budget.

From these principles, I would probably try to keep in mind single responsibility, early returns and no else/elseif, that’s what going to help you.

Consider the following code:

$variable = 'some value';
// ...
if ($variable == 'something') {
$result = 'a result';
} else {
$result = 'another result';
}
// ...

Obviously we are in a legacy code, so there’s a loose equality operator, and a if/else… But what really matters here is that we have $result set in the scope of a conditional statement.

If someone happened to add another case, and missed the $result assignation for some reason, it would result in a disaster at some point (one of the 3 possible paths).

$variable = 'some value';
// ...
if ($variable == 'something') {
//...
$result = 'a result';
} elseif ($variable == 'something else') {
// ...
// no $result assignation
} else {
// ...
$result = 'another result';
}
// ...
// What is $result?

Make it part of a script with a very high cyclomatic complexity and you’re done!

Now let’s consider what we know, what tool is available for us?

$variable = 'some value';
$handlers = [
'something' => function(): string { return 'a result'; },
'something else' => function() : string {
return 'the missing result';
},
'default' => function(): string { return 'another result'; }
];
$handlerToUse = $handlers[$variable] ?? $handlers['default'];
$result = $handlerToUse();

From that, you can improve the code by injecting the handlers array from somewhere, use an interface and handler classes instead of functions, and you’ve got some scoped subroutines rather than executing all of it in the same code.

You’re also implementing the Open/Closed principle as a side effect. Have a look at Improve your voters with the open closed principle by Vincent Monjaret if you want to see an implementation of this principle in a modern framework, and see where you want to go at the end of this improvement spree.

Too complex for you based on the first example?

$variable = 'some value';
// ...
if ($variable == 'something') {
$result = 'a result';
} else {
$result = 'another result';
}
// ...

Let’s use “inline” functions then! This way we can leverage an early return and be done with that.

$variable = 'some value';
// ...
$result = (function(string $value) {
if ($variable == 'something') {
return 'a result';
}
return 'another result';
})();
// ...

How do you fancy that? Any preferred way to use that yet? Leave a comment if you’ve got a personal preference here!

Array, iteration and scope

We’ve seen how to deal with long scripts by scoping parts of it using an anonymous function, or named functions and classes. Now let’s focus on one of the most common problem I usually encounter: array iteration!

Here’s how the code looks most of the time:

$array = [
0 => ['name' => 'example'],
1 => ['name' => 'another example'],
];
// ...
$counter = 1;
$values = [];
foreach ($array as $item) {
$name = $item['name'];
// if (...) {
// ...
// }
$values[$counter] = $name;
$counter = $counter + 1;
}
// what is the value of $counter here?
// what is the value of $name? Does $name exist?
// will we reuse $array? Should we remove it?

In the code above, if $array happened to be empty for some reason, $name would not be set at all. Also, the same would happen if a continue or a break was used before the assignation for each item…

$counter would always contain the last assigned value in the loop, which has absolutely no interest in the rest of the code… Should we unset it? Should we override it? How does that impact readability?

Let’s refactor it using array_* function. It will help us with scoping anything that’s inside the loop.

$array = [
0 => ['name' => 'example'],
1 => ['name' => 'another example'],
];
// ...
array_walk($array, function($item, $index) use (&$values) {
$name = $item['name'];
// if (...) {
// ...
// }
$values[++$index] = $name;
});
// what is the value of $counter here?
// what is the value of $name? Does $name exist?
// will we reuse $array? Should we remove it?

To be totally honest, this code is far from being good, I just wanted to demonstrate the use of array_walk. Still, the $name variable is now scoped, and is used only in the context of a specific $item.

In all fairness, the function to use in our case would be array_map:

$array = [
0 => ['name' => 'example'],
1 => ['name' => 'another example'],
];
// ...
$values = array_map(function($item){
$name = $item['name'];
// if (...) {
// ...
// }
return $name;
}, $array);
// what is the value of $counter here?
// what is the value of $name? Does $name exist?
// will we reuse $array? Should we remove it?

Now now… The value of $array is not the same! Shame on me! The keys are not incremented as in the previous solutions… So you need to be aware of that at least!

Conclusion

Scoping is not as difficult as it may seem when you know how to find refactoring opportunities. Don’t try to change everything at once, but using self called anonymous functions such as (function() {})() will help you isolate parts of the code, and then you’ll be able to move them into proper structures.

What are your best tips and tricks to help scoping variables in legacy code? Do you find any of these solutions useful? I’m waiting for your comments!

--

--

Thomas Dutrion
Darkmira FR

Freelance PHP Developer / Web architect, @scotlandphp organiser | Zend Certified Architect