Single-Minded Configuration
A stabilized approach to systems orchestration
December 19, 2020
Abstract
Configuration management is term that is usually used to describe a declarative approach to systems administration. Declarative configuration allows you to write down the intended state of a system. Changes are communicated to the rest of the team through a series of commits. The benefits of these tools are immense, but validating changes is slow and prone to error. Sequencing operations is also possible, but at the expense of eye-watering complexity. In this session we explore the following topics:
- Developing a "stabilized approach"
- How systems configuration is comparable to building software
- Why service orchestration appears to be difficult
- What sort of solutions are too simple
- Steps to creating your own orchestration and configuration management framework
- Overview of rset(1) : pln(5)
- Advances in networking that facilitate push-based configuration
Preamble
I will never be a pilot, but I have a good deal of admiration for these men and women. One of the features of this profession that inspires me is the personal nature of working tight formation with the rest of the crew. Another is their habit of verbally cross-checking their actions with procedures and information they are receiving. These verbal and manual confirmations work just as well when operating solo as they do when working with a copilot.
Pilots also learn some outstanding methods of ordering priorities. The motto that most directly speaks to this is,
Aviate, Navigate, Communicate
What does it mean to "Aviate"? That means fly the airplane.
Advanced Qualification Program
Simulator training
Never bust minimums
Train for all scenarios known to be problematic
Dan Gryder is a flight instructor who has taken up the cause of studying and learning from accident reports in general aviation. Several hard personal experiences in his life put him on a mission to give pilots in single-engine aircraft everything they need to avoid a loss of control.
Part of his technique is to import the habits and tools that commercial airlines have. One of the interesting experiments he conducted was to ask airline pilots which skill was more important: a) stall recovery or b) energy management. Can you guess what they said?
The important skill is avoiding loss of control. Do you think the pilots who work for Southwest trade stories at the bar about adventures while stalling a 737? I hope not!
Here is what Dan had to say to those in general aviation:
Learn to define and honor the 1.3 buffer at all times. Define it, placard it, honor it. It is what the airlines do every day. Do you think they memorize all those speeds? No, they are clearly defined and placarded for them at all times.
What is the "tool" or the "placard" he is referring to?
On a small aircraft, the tool [in this case] is a bright piece of tape on the airspeed indicator. Now when the pilot is distracted or under pressure, one thing he does not have to remember is the minimum maneuvering speed. This maneuvering speed is calculated ahead of time to allow up to a 30 degree bank angle. When something unexpected happens that piece of tape on the airspeed indicator will help him keep the machine flying--all the way end.
Dan continues,
The airline record is impressive...as they now train and check all possible scenarios (called maneuvers) known to be problematic over the course of time.
Procedures for Software Engineering
Build, Test, Integrate
| Build/test source code | < 10 seconds |
| Trial deployment | < 90 seconds |
| Push to all users | < 10 minutes |
Software engineers don't have standard operating procedures, but every well-managed project has a substantial list of rules and processes to follow. Don't believe me? Try submitting a patch to your favorite open-source project and find out how much correction you receive.
The strategy a software project employs should be composed of everything they need to maintain a stabilized approach:
- It has tests, and they pass
- Each change is a coherent commit
- And continuously integrate changes into a real environment
I'm assuming that you ship tests with the code, but maybe this doesn't make sense for what you're doing. There are many times where you put together some specialized or very slow regressions and then throw them away after they have served their purpose.
The point is that your approach to development is able to transition from one stable condition to another stable condition. To go back to the example of aviation: a stabilized approach lightens the workload and gives you situational awareness. How so? You've done all the figuring in advance.
Authoring Systems Configuration
| Test configuration on an existing host | < 10 seconds |
| Provision new infrastructure | < 90 seconds |
| Commit, propagate | < 10 minutes |
This slide shows systems configuration when framed from the perspective of software engineering.
I think there is a strong case to be made that productivity in software development and systems administration correlates directly with the time it takes to validate a change. Delay in feedback changes your approach to development.
If testing a change to a host takes more than, say 10 seconds, it's not providing interactive feedback. If you can't iterate on a problem efficiently you will inevitably compensate by manually testing fragments of code and configuration outside of your repository.
As with software development, there is a strategy to systems administration
- Continuously apply changes to an existing host
- Verify changes apply correctly to newly provisioned hosts
- Use source control as a means of communicating state changes with the rest of your team
Declarative Configuration
{% for file in ["miniupnpd.conf", "dhcpd.conf", "mail/smtpd.conf"] %}
/etc/{{file}}:
file.managed:
- source: salt://home/{{file|replace('/', '_')}}
{% endfor %}
Templates, variables
Massive APIs
Modules and extensions
Product roadmap (features, bugs)
The longer I worked with large frameworks like Salt the more I valued their capabilities, and the more I found the framework itself to be an obstacle to what I was trying to accomplish.
- Templates are great for building documents, terrible for general-purpose programming.
- Using primitive data structures to call functions is taxing way to write programs. Typically you are using a DSL or writing YAML that maps to a framework's API.
- Adding your own behavior requires you to become a platform expert. Almost as if you were a third-party integrator.
Far too often we commit the change in order to test the change.
Declarative Orchestration?
# step 1 /usr/local/bin/mysql_install_db: cmd.run: - creates: /var/mysql # step 2
Dependencies, sequencing
Progressive status?
Arid programming environment
Eventually configuration management frameworks grow large enough to be called an orchestration framework. What's not to like?
- Expressing an action is not too difficult, but chaining events based on the result of a previous step is a nightmare
- No mechanism for printing progressive status messages
- Trying to sequence events or make complex decisions without a real programming language is difficult
Orchestration is an advanced topic for configuration management, but all of you do this already. It's called scripting.
Oversimplification
ssh 10.5.5.1 < base-cfg.sh ssh 10.5.5.1 < configure-wordpress.sh
What's missing?
Let's try our hand at configuring a system using some scripts. This is a solution that is too simple. The first reason this is too simple because we didn't ship resources the scripts will need:
- Configuration files
- Miscellaneous utilities
We also need a convention for associating configuration with each host, and hopefully a means of running only the part we are trying to test.
Configuration Fundamentals
|
Adding/upgrade packages Install files, directories, symlinks Enable/start/restart services |
} | Map units of work into a profile for each host |
There are few operations that configuration management systems must be able to accomplish. We need a mechanism for adding packages, installing files, and controlling services. With only these three things, you can accomplish some valuable tasks, such as
- Rebuild or replace servers
- Make uniform changes across a group of machines
Also, we have a versioned history of configuration changes that the rest of your team can follow. These are critical capabilities, and it is for good reason that configuration management has become mainstream.
With with these fundamentals in mind, let's try again.
A Minimal Framework
alias rexec="ssh -T -S /tmp/control $1" rexec -fN -M # start control master rexec mkdir /tmp/staging # scratch space case $1 in 192.168.0.2) # copy files, run scripts tar cf - util wpconfig | rexec tar xf - -C /tmp/staging rexec < base-cfg.sh rexec < configure-wordpress.sh ;; esac rexec rm -r /tmp/staging # clean up rexec -O exit # end control master
As crude as this is, this tiny framework is, it has some advantages:
- You can use any mix of scripting languages
- The control master allows you to break up configuration into logical chunks without paying a noticeable performance penalty
- Error and informational messages are piped back immediately
- No remote dependencies other than a basic UNIX environment
- Very flexible: ship utilities, data and configuration
What are we missing?
- Manually sprinkle in print statements to show what stage you're running
- No obvious way to run part of a configuration
- We would benefit from a utility that knows how to install/update/print diff of files
- The current working directory is not set; scripts have to find the staging location in
/tmp
In short, this works! But notice this:
- You can test your configuration by running it
- It's fast enough to run every time you save changes to a file
rset(1): Remote Staging Execution Tool
Stages configuration and utilities on the remote machine
Secure access to remote files: rinstall(1)
Ability to run everything, or use a pattern to match labels: pln(5)
rset(1) is a tool that provides conventions and a way to execute scripts with access to particular resources.
As we have already observed, the ability to execute scripts is not sufficient. You also need a collection of utilities or utility libraries. rset(1) creates a temporary directory populated with the tools and materials you will need for the task at hand.
It has with a server for access to large files, and some built-in utilities know how to install or modify files.
rset(1) uses it's own container format. This is different, and I think sets it apart from other attempts at a minimalist configuration management systems.
pln(5): Progressive Label Notation
Blocks of configuration can be selected individually
Labels names beginning with
[0-9a-z]
are excluded by default:
root_tasks: → crontab - <<-EOF → ~ 1 * * * /usr/local/bin/renewcert → EOF
Parameters apply to subsequent labels
interpreter=/bin/sh -x
Progressive Label Notation is a tab-indented file format that allows you to organize configuration.
- Labels provide a uniform way to run a subset of the configuration
- Each label is an independent script that is piped to the remote interpreter
- Not key-value, it is evaluated in the order you write it
- It does not interfere with most syntax highlighting; go ahead an use an editor modeline
The content of each label is indented with tab indentation. If you do not have a capable text editor, this might a problem for you. Why tabs?
- Because shell allows you to strip off leading tabs in a heredoc (Ruby can do spaces as well)
Configuration Mapping
routes.pln
associates configurations with each hostname
vm2.eradman.com: vm2/ → vm2.pln → wordpress.pln 172.16.0.5: alpine/ → alpine_vm.pln
Directories listed after the hostname labels are copied to the staging directory
The "top-level" configuration file is called routes.pln by default.
Paths after the : are directories (configuration files, scripts,
libraries, anything) that you want staged on the remote host.
Dynamic inventory is a feature that you can handle yourself. rset
reads a file. Use any means you'd like to generate this file if need
be.
Directory Structure
├── _rutils # utilities always staged │ ├── rinstall │ └── rsub ├── _sources # files served over http └── routes.pln # configuration mapping
Extend core functionality on all hosts using
_rutils
Built-in web server serves files under
_sources
Put everything else in directories specified in
routes.pln
rset(1) has three methods of providing access to files:
- Anything in
_rutilswill show up in the current working directory on the remote host - Anything under
_sourcesis accessible over a local port-forward to a built-in web server (access is not restricted, but large files are fine) - Directories specified in
routes.plnwill be copied (push only, remote hosts cannot request this content)
Standard Utilities
rinstall(1)
./rinstall xa10/pf.conf /etc/pf.conf \
&& pfctl -f /etc/pf.conf
rsub(1)
./rsub /etc/firefox/unveil.main <<-CONF /usr/local/heimdal/lib r /usr/lib r CONF
Some solutions are too simple. Landing on a remote host with a staging
directory with /bin/cp is not enough. Two built-in utilities are
automatically shipped:
rinstall:
- Knows how to install file from staging area
- Print diff or notice that a new file was created
- Optionally set owner and mode
- Fetch large files on-demand using HTTP
- Exit
0only if file was installed or changed! [Very important]
rsub:
- use a regex to modify a single line
- or use
stdinto create a block of managed text
Client/Server Configuration
Development environment is not authoritative
Critical systems on the edge of a network
Observation requires a terminal on both sides
In aviation there's a funny term for making a final approach with the engines idling. This is some sometimes called "dead stick" landing and bringing a functional aircraft down this way is not very safe. There are a couple reasons jets land under power, the most significant is that it takes precious time to spool up the engines. A powered final approach gives the pilot the capability to adjust or abort.
In some environments every configuration task is like a forced landing because your personal test environment is slow to react and is not the same as the official deployment mechanism from top of tree.
This is what running configuration before you commit does; it gives you the ability assess your current approach and to adjust course.
There are some solutions that client-server systems seem to be well suited for. I don't like agent-based configuration several reasons:
- You are putting the most venerable system [the configuration host] on the edge of a network.
- A meaningful test environment is nearly impossible to configure.
- Debugging is usually difficult log output is mainly on the remote host.
Jumphosts & Roaming Clients
# jumphost/hostname.wg0
wgport 111 wgkey JUMP_HOST_PRIVATE_KEY
wgpeer ROAMING_HOST1_PUBLIC_KEY wgaip 10.0.0.20/32
wgpeer ROAMING_HOST2_PUBLIC_KEY wgaip 10.0.0.21/32
inet 10.0.0.1/24
# thinkpad10/hostname.wg0
wgkey ROAMING_HOST1_PRIVATE_KEY
wgpeer JUMP_HOST_PUBLIC_KEY wgendpoint proxy.xyz.com 111 wgaip 0.0.0.0/0
inet 10.0.0.20/24
Even if you have mobile clients, you probably don't need a pull-based configuration scheme. I say this because we now have WireGuard to build links. WireGuard is revolutionary in it's simplicity
- Either side can initiate the connection
- In-kernel
wg(4)interface guarantees the identity of interfaces
The only thing you might to add is a cron job that sends a ping in order to establish the tunnel.
rset(1) doesn't provide flags for controlling connection options.
It doesn't need to, because options such as ConnectTimeout,
ProxyJump and anything else you can imagine can be specified in
ssh_config(5).
Conclusion
Factor out common or complex operations into dedicated utilities
Stage configuration data, scripts, and utilities on the remote host
Map units of work into a profile for each host
I have heard it said that there is a paradox with respect to learning a subject. The first maxim is this: the more you know, the more you can see there is to learn. The second is that once you've mastered a topic you finally see how simple it was all along.
May I submit that configuration management simply means that we have a way of associating configuration with each host. Orchestration is really a flamboyant term for scripting with configuration data, scripts, and utilities already staged.