The Illusion of Configuration isn’t Code

Building Great Code

One of the Finance Shops I was at happened to have a Post-Trade Management system, and it was built in Java that had all kinds of interesting capabilities to aggregate and plot different levels of aggregations and filtering on the positions, and folks really seemed to like it. One of the key components to that system was a UI toolkit from a consulting shop that was entirely driven on XML configuration files.

I can remember adding a few lines of XML, and adding a very complex dialog box to the app, and thinking - There is no way that's a 1:1 mapping of the config! And I was right. These were heavily built library modules, and adding a few lines really added entire subsystems and connections for those UI elements.

I've also worked at a Dot-Com Shop where the complete build of a machine was described in a single YAML file. This includes all the user accounts, all the software packages, all the configuration for the software packages, and all sitting in a YAML file.

I'm currently looking at a CI/CD Pipeline that is completely specified by a YAML file. This includes shell commands, with options, and variable expansion, and while it's understandable why the developers of each of these systems chose XML, or YAML, as their configuration files - there are loads of parsers that are solid and reliable, and the files can mix in simple data structures, and in those data structures, you can add shell commands and can then do variable expansion... so it makes sense.

What concerns me is that so many developers seem to feel that these configuration files are not nearly as important as the code backing them. That it's easy, safe, and simple to change the configuration and try it again... and maybe for some things it is... but chances are, if your configuration is really your code - then you need to treat it as such.

For example, it is very common to have multiple layers of configuration files. Maybe there's a company level, and overlaid on that is a project level, and overlaid on that is an environment level. These probably get parsed, and then merged one on another to allow certain things to be set at one level, and others at another, and the sum total of all the config data is what's used.

What could go wrong?

Well... imagine if one component of the config is a list of strings that have to be processed in a specific order - let's say top to bottom, as they appear in the file. But then another of the layered config files has another list - with the same name - maybe it's a typo, maybe it's not. How are these merged?

Does the latter stacked file overwrite the lower level? That's one way, but then it's hard to make sure that those lower-level commands are being run/parsed properly. Could lead to duplication in the upper-level files, and that's not really the point of the stacking, is it?

What if you simply append the upper entries to the lower entries? That could be just as bad because the writer of the lower ones may be making assumptions about the state left after the processing of the upper file.

In short - having configuration files store data structures, is fine - and it's useful... but having it include what amounts to executable code means that it needs to be treated like code. And can you imagine writing a function that's layered from multiple files, and then executed at runtime? The difficulty in tracking down errors would more than offset any gains in reuse.

So if you want to have layered configuration files - Great! Just leave it to data that's easily flattened, and tested... but if you're going to have it include executable code - make it simpler - a single layer. You'll be glad you did. 🙂