A legacy of configuration DIY
Given how long Python has been around, one might mistakenly expect that, for at
least a substantial period of time, it would have included within the standard
library a configuration parser. After all, most fully-fledged applications and
development tools benefit from some kind of configuration, whether placed
inside pyproject.toml
, a dedicated project configuration file, or some other
file located elsewhere in the user’s home directory.
Of course, one may well argue (with some good reason) that it is unnecessary. There are other workarounds which do strictly work.
Write your own settings.py
file
Developers can define settings inside Python scripts if they do not want to
import a third-party package. Indeed, some very well-established Python
libraries take this method. Running the Django framework’s project initializer
provides the user with a settings.py
file, in which various settings are
placed. Django also takes the somewhat unconventional path of combining static
configuration (i.e. strings, integers, lists of literals, etc.) with actual
code (e.g. ABSOLUTE_URL_OVERRIDES
is a dictionary mapping strings to
functions).
This is very much in-line with the
Suckless philosophy. Adherent projects like
dwm (the Suckless group’s flagship dynamic window
manager) store their configuration in a config.h
header file. And while that
is an acceptable solution for some developers, it certainly doesn’t sound like
a universal solution. And while Python, being a scripting language, doesn’t
require a recompile if you choose to store settings in a script, the Suckless
philosophy is not intended to be universal. They in fact say this explicitly:
Because dwm is customized through editing its source code, it’s pointless to make binary packages of it. This keeps its userbase small and elitist. No novices asking stupid questions.
So one can, if they choose, expect their users to write a Python script, but it is obviously preferrable for the average Joe to modify a file with a universal configuration format.
The standard library graveyard
It is a well-known adage that the Python standard library is where good code
goes to die. Once code ends up there, development on new features must
inevitably take a back seat to bug fixes and stability. This can be to the
detriment of those packages, as was the case for optparse
or distutils
. In
the case of the former, there is almost never any need to use optparse
over
the more feature-complete argparse
(which also subsequently was absorbed into
the standard library), whereas in the case of the latter, it took many years
for its eventual deprecation and removal in favour of setuptools
.
While we should all be glad that projects like Flask or Pydantic will never (and I mean never) enter the Python standard library, a configuration parser is practically the textbook definition of a package that does not need feature development. Configuration standards generally change once in a blue moon, before becoming set in stone. Some standards haven’t changed in decades.
So many configuration formats!
One might also argue that there are multiple configuration formats to choose from. To name a few:
- XML
- JSON
- INI
- YAML
- TOML
And while it would be perfect if there were support for all of them, it would be an unnecessary drain on the resources of the Python Software Foundation to write and maintain so many parsers, a job that is easier to delegate to third-party software developers.
The inclusion of XML and JSON on this list is somewhat dubious anyhow. Neither are true configuration languages so much as languages for data encoding, transmission, and decoding. This becomes especially obvious when JSON parsers reject trailing commas, and new parsers with less strict syntactical rules are invented to parse pseudo-JSON (better suited to configuration than raw JSON) instead. Furthermore, why anyone would want to use INI configuration nowadays is a wholly different question, better suited to psychiatrists than software engineers.
Choosing between YAML and TOML is a difficult, but not impossible task. YAML is more commonly used, but ultimately more complex, even insecure, as it allows the arbitrary execution of code.
So in the spirit of not allowing the perfect to be the enemy of the good, with the adoption of PEP 518 in 2016, Python began to consolidate its resources around TOML, and it has continued to accelerate in this direction ever since.
One language to rule them all
After the introduction of pyproject.toml
, its usage took off, and it is not
hard to see why. Writing a code linter, and you want the user to be able to
configure the linter’s rules? You could use pyproject.toml
to store the
tool’s configuration. Yes, you can keep the TOML in a separate configuration
file, but if the user wants their project root not to be swamped in
configuration files, pyproject.toml
comes to the rescue.
Some time later with the release of Python 3.11 in 2022, the tomli
library
was absorbed into the Python standard library as tomllib
, the final nail in
the coffin for other configuration langauges. TOML was the winner, and its rise
has been meteoric. All manner of tools now parse TOML configuration files,
whether Poetry, uv, Ruff, mypy, Pyright… the list goes on.
But before we celebrate this success, we need to address a couple of…
considerations. Firstly, how do we represent None
in TOML? Secondly, what if
we have genuinely good reason for our configuration to be somewhat more
dynamic?
Representing None
in TOML
TOML does not have an understanding of null values, as JSON, XML, and YAML all
do. This is not entirely without good reason. Suppose someone writes a program
in C-like language which parses a TOML configuration file and the parser
indicates that the key hello
is associated with a null value. Does that mean
that the key hello
is absent from the configuration? Or does that mean that
the key hello
has been set to a null value?
Because TOML does not have null values, there is no ambiguity. In this case,
the key hello
is absent from the configuration. But if TOML could have null
values, it wouldn’t be obvious. The parser would have to provide a flag to
indicate whether or not a null value indicates absence or being explicitly set
to null. This makes configuration parsing unwieldly for static languages.
A scripting language like Python doesn’t struggle with this, but if Python is
to use a universal configuration language, these are the concessions it is
forced to make. So no native None
values.
The truth is that 90% of the time, that’s okay. A setting left out of a TOML
configuration file indicates implicit null-ness. But suppose someone wants
their program to be configured with a nullable setting that defaults to a
non-null value? The general answer is: too bad. Don’t do that. It’s probably
bad design. The problem is, I have found myself in a situation where I needed
to do this, specifically when I was trying to represent configuration for
Pelican using TOML. Certain Pelican settings can be set to None
, but are not
defaulted to None
.
One solution is to use a placeholder value for None
. If a configuration key
is mapped to a value of "None"
, you can convert it to a literal None
after
parsing the configuration. It is, however, a safe bet that with enough users,
you may eventually find someone who wants to store a literal "None"
string in
their configuration. So maybe you should use a more unique placeholder, such as
MyTOMLNone
. This is ugly, but at least there is no foreseeable reason why
anyone would ever want to use this string.
There is a better solution yet however. What if you don’t just configure your
settings, but the parser itself? Define a setting with the TOML key
null_placeholder
and allow the user to decide what a placeholder for None
should be. For example:
[pyproject.tool.mytool]
x = "null" # mytool treats this as `None`.
y = "None" # mytool treats this as `"None"`.
[pyproject.tool.mytool.parser]
null_placeholder = "null" # Configure the parser here.
Obviously, this isn’t quite as clean as having a native null
or nil
like
other configuration languages, but it gets the job done.
Dynamic configuration
Suppose you write a program which needs configuration that is a little more complicated than can be represented by simple strings and numeric values. For example, suppose you want to allow the user to filter files with their own function that takes the filename as input. Obviously, Python scripts are the only way to do this. But at this point, is someone forced to throw their hands in the air and give up on TOML configuration?
The answer, thankfully, is no. A little creativity helps us resolve this issue.
You will need your user to write Python scripts if they want to configure
such settings, however. Consider that we want our filter function to be called
filter_files
, which defaults to None
. You could expect a certain module
to be in a certain location, containing a function with a certain name. But
that could result in your user having too much unnecessary clutter in their
project root directory, or unintuitive locations for their code. The user knows
better where their code belongs.
Let’s suppose that I have defined in my project the following function:
# projectroot/src/mytool/hello/world.py
def filter_files_by_extension(filename: str) -> bool:
return filename.endswith(".txt")
In my TOML configuration, I could write:
[pyproject.tool.mytool]
filter_files = "@hello_world:filter_files_by_extension"
This indicates that we want to use a callable with the name
filter_files_by_extension
, but we also have a prefix there @mymodule:
. How
does @hello_world:
tell the parser that the function is contained within
projectroot/src/mytool/hello/world.py
? Simple. Much as we did when we
configured the parser’s placeholder for None
, we can configure certain
prefixes to be associated with certain modules.
[project.tool.mytool.parser]
module_prefix = {"@hello_world:" = "mytool.hello.world"}
We can then modify our parser to find all strings beginning with one of the defined prefixes, and then substitute the string with the function imported from the associated module. This does require that the module is on the Python path, but that is a very reasonable expectation.
Of course, these solutions all require some manual post-processing in your Python code, but rest assured that using TOML does not limit your users' ability to configure your tool.
Python