Configuring Python

A legacy of configuration DIY

Given how long Python has been around, one might mistakenly expect that, for at least a substantial period of time, it would have included within the standard library a configuration parser. After all, most fully-fledged applications and development tools benefit from some kind of configuration, whether placed inside pyproject.toml, a dedicated project configuration file, or some other file located elsewhere in the user’s home directory.

Of course, one may well argue (with some good reason) that it is unnecessary. There are other workarounds which do strictly work.

Write your own settings.py file

Developers can define settings inside Python scripts if they do not want to import a third-party package. Indeed, some very well-established Python libraries take this method. Running the Django framework’s project initializer provides the user with a settings.py file, in which various settings are placed. Django also takes the somewhat unconventional path of combining static configuration (i.e. strings, integers, lists of literals, etc.) with actual code (e.g. ABSOLUTE_URL_OVERRIDES is a dictionary mapping strings to functions).

This is very much in-line with the Suckless philosophy. Adherent projects like dwm (the Suckless group’s flagship dynamic window manager) store their configuration in a config.h header file. And while that is an acceptable solution for some developers, it certainly doesn’t sound like a universal solution. And while Python, being a scripting language, doesn’t require a recompile if you choose to store settings in a script, the Suckless philosophy is not intended to be universal. They in fact say this explicitly:

Because dwm is customized through editing its source code, it’s pointless to make binary packages of it. This keeps its userbase small and elitist. No novices asking stupid questions.

So one can, if they choose, expect their users to write a Python script, but it is obviously preferrable for the average Joe to modify a file with a universal configuration format.

The standard library graveyard

It is a well-known adage that the Python standard library is where good code goes to die. Once code ends up there, development on new features must inevitably take a back seat to bug fixes and stability. This can be to the detriment of those packages, as was the case for optparse or distutils. In the case of the former, there is almost never any need to use optparse over the more feature-complete argparse (which also subsequently was absorbed into the standard library), whereas in the case of the latter, it took many years for its eventual deprecation and removal in favour of setuptools.

While we should all be glad that projects like Flask or Pydantic will never (and I mean never) enter the Python standard library, a configuration parser is practically the textbook definition of a package that does not need feature development. Configuration standards generally change once in a blue moon, before becoming set in stone. Some standards haven’t changed in decades.

So many configuration formats!

One might also argue that there are multiple configuration formats to choose from. To name a few:

And while it would be perfect if there were support for all of them, it would be an unnecessary drain on the resources of the Python Software Foundation to write and maintain so many parsers, a job that is easier to delegate to third-party software developers.

The inclusion of XML and JSON on this list is somewhat dubious anyhow. Neither are true configuration languages so much as languages for data encoding, transmission, and decoding. This becomes especially obvious when JSON parsers reject trailing commas, and new parsers with less strict syntactical rules are invented to parse pseudo-JSON (better suited to configuration than raw JSON) instead. Furthermore, why anyone would want to use INI configuration nowadays is a wholly different question, better suited to psychiatrists than software engineers.

Choosing between YAML and TOML is a difficult, but not impossible task. YAML is more commonly used, but ultimately more complex, even insecure, as it allows the arbitrary execution of code.

So in the spirit of not allowing the perfect to be the enemy of the good, with the adoption of PEP 518 in 2016, Python began to consolidate its resources around TOML, and it has continued to accelerate in this direction ever since.

One language to rule them all

After the introduction of pyproject.toml, its usage took off, and it is not hard to see why. Writing a code linter, and you want the user to be able to configure the linter’s rules? You could use pyproject.toml to store the tool’s configuration. Yes, you can keep the TOML in a separate configuration file, but if the user wants their project root not to be swamped in configuration files, pyproject.toml comes to the rescue.

Some time later with the release of Python 3.11 in 2022, the tomli library was absorbed into the Python standard library as tomllib, the final nail in the coffin for other configuration langauges. TOML was the winner, and its rise has been meteoric. All manner of tools now parse TOML configuration files, whether Poetry, uv, Ruff, mypy, Pyright… the list goes on.

https://commons.wikimedia.org/wiki/File:TOML_Logo.svg
TOML - Tom's Obvious Minimal Language

But before we celebrate this success, we need to address a couple of… considerations. Firstly, how do we represent None in TOML? Secondly, what if we have genuinely good reason for our configuration to be somewhat more dynamic?

Representing None in TOML

TOML does not have an understanding of null values, as JSON, XML, and YAML all do. This is not entirely without good reason. Suppose someone writes a program in C-like language which parses a TOML configuration file and the parser indicates that the key hello is associated with a null value. Does that mean that the key hello is absent from the configuration? Or does that mean that the key hello has been set to a null value?

Because TOML does not have null values, there is no ambiguity. In this case, the key hello is absent from the configuration. But if TOML could have null values, it wouldn’t be obvious. The parser would have to provide a flag to indicate whether or not a null value indicates absence or being explicitly set to null. This makes configuration parsing unwieldly for static languages.

A scripting language like Python doesn’t struggle with this, but if Python is to use a universal configuration language, these are the concessions it is forced to make. So no native None values.

The truth is that 90% of the time, that’s okay. A setting left out of a TOML configuration file indicates implicit null-ness. But suppose someone wants their program to be configured with a nullable setting that defaults to a non-null value? The general answer is: too bad. Don’t do that. It’s probably bad design. The problem is, I have found myself in a situation where I needed to do this, specifically when I was trying to represent configuration for Pelican using TOML. Certain Pelican settings can be set to None, but are not defaulted to None.

One solution is to use a placeholder value for None. If a configuration key is mapped to a value of "None", you can convert it to a literal None after parsing the configuration. It is, however, a safe bet that with enough users, you may eventually find someone who wants to store a literal "None" string in their configuration. So maybe you should use a more unique placeholder, such as MyTOMLNone. This is ugly, but at least there is no foreseeable reason why anyone would ever want to use this string.

There is a better solution yet however. What if you don’t just configure your settings, but the parser itself? Define a setting with the TOML key null_placeholder and allow the user to decide what a placeholder for None should be. For example:

[pyproject.tool.mytool]
x = "null" # mytool treats this as `None`.
y = "None" # mytool treats this as `"None"`.

[pyproject.tool.mytool.parser]
null_placeholder = "null" # Configure the parser here.

Obviously, this isn’t quite as clean as having a native null or nil like other configuration languages, but it gets the job done.

Dynamic configuration

Suppose you write a program which needs configuration that is a little more complicated than can be represented by simple strings and numeric values. For example, suppose you want to allow the user to filter files with their own function that takes the filename as input. Obviously, Python scripts are the only way to do this. But at this point, is someone forced to throw their hands in the air and give up on TOML configuration?

The answer, thankfully, is no. A little creativity helps us resolve this issue. You will need your user to write Python scripts if they want to configure such settings, however. Consider that we want our filter function to be called filter_files, which defaults to None. You could expect a certain module to be in a certain location, containing a function with a certain name. But that could result in your user having too much unnecessary clutter in their project root directory, or unintuitive locations for their code. The user knows better where their code belongs.

Let’s suppose that I have defined in my project the following function:

# projectroot/src/mytool/hello/world.py


def filter_files_by_extension(filename: str) -> bool:
    return filename.endswith(".txt")

In my TOML configuration, I could write:

[pyproject.tool.mytool]
filter_files = "@hello_world:filter_files_by_extension"

This indicates that we want to use a callable with the name filter_files_by_extension, but we also have a prefix there @mymodule:. How does @hello_world: tell the parser that the function is contained within projectroot/src/mytool/hello/world.py? Simple. Much as we did when we configured the parser’s placeholder for None, we can configure certain prefixes to be associated with certain modules.

[project.tool.mytool.parser]
module_prefix = {"@hello_world:" = "mytool.hello.world"}

We can then modify our parser to find all strings beginning with one of the defined prefixes, and then substitute the string with the function imported from the associated module. This does require that the module is on the Python path, but that is a very reasonable expectation.

Of course, these solutions all require some manual post-processing in your Python code, but rest assured that using TOML does not limit your users' ability to configure your tool.

Related
Python