The Pitfalls of Writing Configuration Management Code
- November 21, 2022
When it comes to managing the configuration of your cloud resources, there are often a bevy of tools to choose from. Suites like Puppet, Ansible, Chef and a host of others give you the ability to configure your resources at scale. Drilling deeper, within those options there is often a blossoming of secondary options for how to use the tool and get the job done.
This poverty of choice often leads to complicated, redundant and confusing configuration management code. When learning configuration management tools for the first time, it is tempting to just read the online documentation and start coding. In fact, this is what most engineers will do. The problem with this approach is that before you know it, you have a bunch of randomly architected, snowflake configuration repos scattered throughout your organization with no throughline of coherence. This makes testing, deploying and reading the configuration management code exponentially more difficult. In this blog post we will explore the common pitfalls of these tools and how to avoid them.
Too Many Ways to Skin the Cat
By their very nature, a lot of these tools provide multiple ways to solve the same problem. For example, two engineers, new to a configuration management tool might approach the following problem in two different ways:
Problem: Install Apache, its dependencies, and manipulate some configuration files on an instance.
Solution 1: Use existing bash script, call Bash Script from configuration management tool to handle the install.
Solution 2: Use the tools domain specific language to complete the installation from start to finish.
We see this all the time in our clients’ configuration code bases. While the difference between the two might seem innocuous at first, the problem often is that one of these solutions doesn’t scale as well as the other. The first solution, for example, now requires the next update to this code to understand two languages: the configuration management tool language (sometimes a domain-specific language) and Bash. Bash will also often be more difficult to unit test. Furthermore, using the configuration management tool’s built-in install operation is far more robust and heavily tested than a home-grown Bash Script. In this case, the inherent resiliency provided by the configuration management tool is absent from the first solution.
Too Much Overlap
Often, particularly in cloud deployments, we see a lot of confusion around which tools should handle the installation and configuration of packages, etc. Common questions you might get from a new DevOps engineer:
- Is this something that should be baked into the image?
- Should this go in the startup script?
- Is there already configuration code that touches what I’m doing that needs to be updated?
- How do I test this?
There are many answers to these questions and again that is often what leads to problems at scale. You can have multiple tools doing the same job without a clear delineation of which tool covers which domain throughout the configuration lifecycle. Worse, you might get a different answer of where a new piece of code should go depending on whom you ask.
Lack of Testing
This is one of the most ubiquitous and troublesome pitfalls we see in our clients’ configuration management systems. Often, there is no isolated test harness available for the configuration specific code. This means teams will test their config code by deploying resources, installing the application and testing functionality. This can be time consuming, expensive and ultimately unreliable. While skipping testing of the configuration management code may be enough to get by at first, at scale this methodology will spiral out of control. Test deployment of resources will become more complicated and time consuming as growth happens in the cloud. If you are relying exclusively on functional tests for configuration management code, this will mean long development cycles, hampering the organizations ability to innovate and time to market.
Solution 1: An Opinionated Framework
Between the questions around how to get started, where to put the code and how to test it, there are lots of unknowns that need answering before someone can start writing configuration management code for the tool of choice. The best way to handle this is by creating an opinionated framework for code creation. This means that essentially when creating configuration management code, there will be guardrails, templates, and living documentation to guide the developer in making the right choices.
As an example, we’ve leveraged Cookiecutter to help clients in creating Ansible roles from templates. This means that the developer uses the framework to create a new role: testing, directory structure and even some prewritten modules are automatically generated meaning the developer is not starting from scratch every time. You can even have these auto-generate READMEs for new code repositories. While this may seem daunting to do as a small team or first-time user, it is imperative build this type of foundation. It will not be perfect at first, but over time this type of solution allows development teams to iterate and improve over time much more so than a bring your own code type policy.
Solution 2: Documentation and Standards
Along with an opinionated framework, it is essential to have good documentation for your configuration management system. Well written user guides, detailed READMEs and FAQs all help to alleviate some of the confusion around where to start and how to execute. What you don’t want, is someone who needs to make a configuration management change, having to first go on a treasure hunt to find out the last person to change the module or deploy the code (tribal knowledge). Living documentation should reside alongside the code so that someone can quickly determine how best to execute a new feature and update them based on end user feedback.
I've worked on teams where we even recorded videos for how to run unit and acceptance tests for puppet modules ensuring that future generations of DevOps engineers could easily pick up where we left off, iterate and improve without having to dig for answers.
Your team should have high standards for configuration management code. Just like an application developer, these modules, playbooks and scripts should be part of a rigorous yet efficient testing cycle, ensuring idempotency when necessary and improved resiliency.
Single Source of Truth
It is essential as an Engineering team to establish a single source of truth for both documentation and release version of your configuration management code. This centralization is key in preventing a fragmented, hard to test code base. Make sure you have a gold standard template repository, wiki page or another central hub that is the launching point for all things configuration management. That way, when a release engineer front end developer or key stakeholder needs to know what version of the configuration is running in prod or what features are currently being tested in dev you can easily determine that yourself or better yet, point them to the documentation.