Ansible at scale

How to introduce the concept of “environments” to effortlessly manage multiple different types of Ansible deployments at scale.

Press enter or click to view image in full size

Generated with AI.

Ansible is often chosen for its simplicity and ease of use. Compared to some of the alternatives, Ansible is a “easy to learn, hard to master” Configuration Management tool. It doesn’t have to be hard to master, however. With a few simple tricks, this article demonstrates how you can structure your Ansible environments to allow for scalability and reusability. Before getting to the main course, small appetizers will be served to briefly recap on some of the key concepts of Ansible: playbooks, inventories, variables and roles.

Ps: I loathe “Hello World” examples so let’s do better, shall we?

A fully functional project of installing Apache2 and neovim can be found on my GitHub. As I recommend reading through this blog post first, for the context that is, I’ve put the link to this repo at the end.

Recap: playbooks

My first “gotcha” when learning Ansible some years ago was how playbooks work. I’m sure it was for mostly everyone else as well. My first playbooks consisted mostly of tasks with names and maybe the occasional role if I were so lucky that somebody had shared exactly what I needed on GitHub.

As I soon discovered, although too late (the scripts were “in prod” already, and why fix something that isn’t broken), if you treat a playbook like a shell script you’re doing it wrong.

Let’s look at a simple playbook which could in theory be deploying an Ubuntu server:

---
- name: Ubuntu Server, debug environment
  hosts: testing
  roles:
    - role: ubuntu/neovim
    - role: ubuntu/apache- name: Ubuntu Server, production
  hosts: production
  roles:
    - role: ubuntu/apache

Instead of having an ubuntu-test.yml and an ubuntu-prod.yml playbook with all of the tasks needed to create these machines, we define one common ubuntu.yml playbook which favors reusability by extracting the hard work to individual roles.

This makes the playbook readable and very easy to work with if we, say, wanted to introduce another “kind” of Ubuntu server, e.g., a pre-prod server. The first step towards scalability is making sure our playbooks are easy to maintain and not cluttered with tasks that should have been separate roles.

Recap: inventories

Inventories are where we keep information about our fleet of machines to provision. We can group these machines together under common group names so that when we decide to run a playbook, we can decide to target all of the machines or a subset of the machines by group name. The is convenient and lets us at first glance remind ourselves which machines are to be used in test, in production and so forth. Better yet, it is one of the fundamental building blocks allowing us to scale.

Let’s first consider this inventory file hosts:

[ubuntu:children]
production
testing[production:children]
prod_a
prod_b
[prod_a]
web01.birb.local
[prod_b]
web02.birb.local
[testing]
test01.birb.local

In this example the two production machines have been split into an “a” and a “b” list, suggesting that the production environment supports A/B testing in some capacity — possibly with a load balancer in front. This just goes to show that even though we split up the machines like this, both of web01.birb.local and web02.birb.local are by inheritance included in the production group. Furthermore we have one machine in the testing group.

In the playbook above we reference “production” and “testing” hosts. In each case, the machines in the hosts inventory above are the machines that will have the roles executed. If we instead were to target ubuntu rather than production or testing, all of the machines would be in scope by inheritance.

Recap: variables

Ironically, this is where the fun begins. Why ironically? Well, variables are typically among the first topics to be taught in virtually every topic where where they are supported, and yet here I am, excited as ever. Let’s talk variables!

In its simplest form, variables can be stored in a YAML file as a list of key-value pairs as shown here:

---
html_root: "/var/www/html"
prod_port: 443
test_port: 8443

At first glance this seems to be all fine. But if we define the HTTPS port for production and for testing as two separate variables, most likely we will be forced to handle this as a “special case”, e.g., by looking up which inventory group the playbook is running in. Somehow. With duct tape. It’s ugly, and we can do better.

Ansible supports defining variables inside the hosts file, so that each environment can have their own version of a variable. This allows us to replace prod_port and test_port with a more generic https_port variable that is different in different groups. We remove the old port variables from the vars.yml file (or whichever the name) leaving us with just:

---
html_root: "/var/www/html"

and introduce a https_port variable in each of the production and testing groups in the hosts file:

[ubuntu:children]
production
testing[production:children]
prod_a
prod_b
[production:vars]
https_port=443
[prod_a]
web01.birb.local
[prod_b]
web02.birb.local
[testing]
test01.birb.local
[testing:vars]
https_port=8443

We define the relevant value of the variable depending on the group by adding the following additions to the production group

[production:vars]
https_port=443

and the testing group:

[testing:vars]
https_port=8443

In other words, whenever the playbook runs for the production group, the https_port is set to 443 and whenever it runs for the testing group the variable is set to 8443.

This is OK — better at least — but as we will see during the main event of the blog post, we can do much better than this as well.

For now, let’s move on with a recap of roles.

Recap: roles

Roles in Ansible are self-contained components that can be reused across multiple playbooks, as in our case. They have a predefined folder structure which can be found on the official documentation, but for our needs the tasks folder is enough.

Get Birb’s stories in your inbox

Join Medium for free to get updates from this writer.

Let’s look at an example role that I created to install apache web server:

---
- name: Install apache2
  become: yes
  ansible.builtin.apt:
    name: apache2
    update_cache: yes
    state: latest- name: Set listening port for HTTPS
  become: yes
  ansible.builtin.replace:
    path: /etc/apache2/ports.conf
    regexp: '.*Listen [8]?443'
    replace: '\tListen {{ https_port }}'
- name: Set virtualhost port
  lineinfile:
    dest: /etc/apache2/sites-available/000-default.conf
    regexp: '^<VirtualHost .*>'
    line: '<VirtualHost *:{{ https_port }}>'
- name: Restart apache2
  ansible.builtin.service:
    name: apache2
    state: restarted

This role has four tasks:

Install apache2
Configure /etc/apache2/ports.conf as needed
Configure /etc/apache2/sites-available/000-default.conf as needed
Restart the apache2 service for the changes to take effect

If you read the code carefully you will see that we reference our ports variable {{ https_port }} in the script. Here comes the magic: by using a role like this, we can use the same steps in both our production and our testing environments, but having the value of the https_ports defined in the relevant environment, e.g., testing.

It is now time to talk about project structure and how getting this right can help us use Ansible at scale with ease.

The beauty of project structures

Before we get ahead of ourselves there’s one last important thing to know about Ansible variables, which I promised to return to. Let’s go through that now.

Ansible variables — group_vars and host_vars

Let’s return to the variables discussion for a short while. Recall how we can define variables for an environment (or group, strictly speaking) inside the hosts file like so:

[production:vars]
https_port=443

We can do the same using using a group_vars and host_vars folder structure. These names are specific names of folders, which content will be picked up by Ansible when running a playbook.

Let’s consider the following folder structure example:

...
│
├── group_vars
│   ├── all
│   │   └── vars
│   │       └── main.yml
│   ├── production
│   │   └── vars
│   │       └── main.yml
│   └── testing
│       └── vars
│           └── main.yml
└── hosts

Inside the folder containing our hosts file, we can create a folder for each inventory group. Shown above are the three groups we can reference in Ansible: the default all covering all machines in all groups and our production and testing groups. Ansible will read the contents of each of these folders and load the variables it finds such that:

Variables in all can be used in all groups
Variables in production and testing override the same variables in all, in case of duplicates

This enables us to set a default value of a variable in the all group and then override this value in each relevant “environment” (group) that we need, e.g., we could define a variable of the version of a tool which is applicable to all groups by default, but then also override this if needed in a testing environment (group) or staging environment (group) if we needed to.

On top of this we can use host_vars which lets us override variables defined in either of the group_vars. A specific host could have a host_vars configuration explicitly setting the https_port to an entirely different port number for whichever reason. This would be defined in a per-host configuration in a structure such as the one shown here:

...
│
├── group_vars
│   ├── all
│   │   └── vars
│   │       └── main.yml
│   ├── production
│   │   └── vars
│   │       └── main.yml
│   └── testing
│       └── vars
│           └── main.yml
├── hosts
└── host_vars
    ├── test01.birb.local
    │   └── vars
    │       └── main.yml
    ...

In this example any variable defined in the YAML configuration at host_vars/test01.birb.local/vars/main.yml would override the same found in either group_vars/testing/... and group_vars/all/....

This may seem confusing because of the sheer amount of folders and files, but let’s pretend that we needed to work on the testing server and had to update its configuration. Our file explorer would be nice and clean:

...
│
├── ubuntu
│   ├── group_vars
│   │   ├── all
│   │   ├── production
│   │   └── testing
│   │       └── vars
│   │           └── main.yml
│   ├── hosts
│   └── host_vars
│       ├── test01.birb.local
│       │   └── vars
│       │       └── main.yml
│       ├── web01.birb.local
│       └── web02.birb.local
...

The risk of affecting the production configuration is significantly limited simply because we can collapse all other folders in our working tree, other than the ones we need to look at: testing.

Organizing for scale — the “Gotcha”

We’ve seen how we can structure our working tree such such that the configuration of our different inventory groups is separated in a clean, reusable manner. We’ve also seen how this structure co-exists nicely with a hosts inventory file. Now, what if we needed a different type of host for our production environment, say a Windows based host? Where would we re-create this structure to support Windows, but also maintain the neat overview and structure? You may have have guessed it but if not this may sound too trivial to be really effective. But it really is! We create another folder.

If we put ubuntu and with all its content inside an environments folder (any name will do, in this case it is just a namespace), we are able to create a new “environment” called windows and redo all of the above steps for Windows-based hosts: create a hosts file andgroup_vars and host_vars folders with relevant configuration inside.

It would look something like this:

...
|
├── environments
│   ├── ubuntu
│   │   ├── group_vars
│   │   ├── hosts
│   │   └── host_vars
│   └── windows
|       ├── group_vars
│       ├── hosts
│       └── host_vars
├── playbooks
└── roles

When running a playbook we can simply specify which inventory (environment) we would like to use together with a specific playbook, e.g.:

ansible-playbook -i environments/ubuntu playbooks/ubuntu.yml

This will run the ubuntu.yml playbook against the ubuntu environment (note: Ansible will automatically detect the hosts file inside ubuntu).

That’s it! How simple is that?

Following this style allows us to maintain a clean and easy-to-use project structure that allows us to scale our Ansible deployments to countless environments and even more machines, all with significant reduced risk of operational incidents due to working in a fragile and cluttered code base.

Demo

A fully functional demo of this structure with working Ansible code that lets you provision three Ubuntu Server 24.04 machines with Apache2 (and neovim for the test machine) can be found on my GitHub at: https://github.com/HeckerBirb/ansible-at-scale-demo

Following the instructions you should be able to be greeted the same way I was when visiting http://web01.birb.local/:

Press enter or click to view image in full size

Working POC of our Apache2 web server.

Acknowledgements

This blog post would not have been possible to make without the help from Jeff Geerling and his YouTube series and book Ansible for DevOps, both of which I highly recommend.

Thank you

Thank you for reading along. If you liked this article, please consider liking it or leaving a comment. Until next time!

— Birb