Efficient Sharing Data in Ansible

2025-02-10

In certain scenarios, you may need to retrieve data from a cloud provider or perform complex calculations, which can be time-consuming. Let’s assume you use an API for launching a virtual machine, and then you would like to reuse data and share among other roles. We assume, in the beginning, we don’t know the IP address of the machine. Once we know the IP, we want to launch configuring only one machine. For different reasons. It might be the speeding up of the execution and the desire not to waste time.

Example 1:

     YAML 
   
 
 
   - name: Configure frontends
  hosts: frontends
  gather_facts: no
  roles:
  - Create users
  - Setup sshd
- name: Configure backends
  hosts: backends
  gather_facts: no
  roles:
  - Create users
  - Setup sshd 
  

Note. Play is a set of tasks or roles targeted against a set of hosts. Playbook consists of one or a few plays. In example 1, there are two plays: Configure frontends and Configure backends.

Variables defined in the play Configure frontends cannot be shared to Configure backends, because variables among plays can be shared for a specific host(s). The scope of the variables (when you use set_fact) is limited by one play targeted against a set of hosts.

In other words, if you generate a password in play Configure frontends in the role Create users, these passwords cannot be used in a role in Configure backends role (because we wish to have identical passwords).

I would like to demonstrate how this restriction can be bypassed.

Approach 1

Let’s have a look at the simple playbook:

Example 2:

     YAML 
   
 
 
   - name: Create server
  hosts: localhost
  gather_facts: no
  vars: 
    server_type: cpx21
  roles:
   - prepare_server
   - add_host_to_zabbix
   - lynis
 
  

What do we have in the playbook above?

We launch the playbook on localhost
We don’t collect data from hosts
We set some variables
We launch specific roles

Everything in the playbook is necessary for creating a virtual machine. This playbook is launched locally on the Ansible manager machine, which uses cloud provider API and creates a machine, then configures it. Seems fine. But there is a problem. After preparing_server we have an IP address, which we didn’t know and got from the provider. For launching the next roles for configuring the machine(lynis) we should launch ansible tasks on the remote machine and pass the IP address of the machine to add_host_to_zabbix role because it is necessary for configuring the monitoring system.

But how can other roles be launched with the right parameters and on the right target?

Someone can say, “reload your dynamic inventory, use a name for the server,” and then use delegate_to. That's possible since you pass the name while the server is being created as an external variable like:

     Shell 
   
   $ ansible-playbook -e "server_name=myserver.cloud" prepare_server.yml

Or export environment variable. However, it works only in this particular case. Therefore, it is not suitable for all. Moreover, it is not convenient. What if we don’t pass the server’s name but generate it?

If you paid attention, this makes running a role slower because Ansible must gather data from the provider one more time, which we wanted to get rid of.

Special for this purpose, Ansible has a module add_host. This module allows you to add a host to a special temporary inventory file and use it as a temporary storage for your variables and so on associated with the new host that is not available if you just reload your inventory.

A code snippet:

     YAML 
   
 
 
     - name: Add host to group 'just created'
     add_host:
       name: "{{ server_name }}"
       groups: just_created
       var: "{{ myvalue }}" 
  

So, here we add a virtual machine to group just_created and set up some additional variables, which can be accessed later, like this:

     YAML 
   
   - set_fact: 
    server_ip: "{{ hostvars[server_name].var }}"

Or:

     YAML 
   
   - name: Show all the hosts matching the group just_added
  debug:
    msg="host is {{ hostvars[item]['inventory_hostname'] }}, var is {{ hostvars[item]['var'] }}"
  with_items:  "{{ groups['just_added'] }}"

Thus you can add and share any type of data (key-value or dicts).

We can rewrite the playbook above as follows:

Example 3:

     YAML 
   
 
 
   - name: Create server 
  hosts: localhost
  gather_facts: no
  vars: 
    server_type: cpx21
  roles:
   - prepare_server
- name: Install Lynis and Zabbix
  hosts: just_created
  gather_facts: no
  roles:
   - add_host_to_zabbix
   - lynis
 
  

Approach 2

Another approach I would like to consider is using a well-known old good way. This is using temporary files.

Let’s say we have a playbook and wish to share generated openssl pre-shared key:

Example 4:

     YAML 
   
 
 
   - name: Create server 
  hosts: localhost
  gather_facts: no
  vars: 
    server_type: cpx21
  roles:
   - prepare_server
   - add_host_to_zabbix_server
- name: Install Zabbix
  hosts: just_created
  gather_facts: no
  roles:
   - install_zabbix_agent
   - lynis
 
  

Role add_host_to_zabbix_server has a task that generates a preshared key for establishing an encrypted connection between Zabbix server and Zabbix agent; therefore, this key must be propagated among these two hosts.

In the role, we should save this value somewhere. The role add_host_to_zabbix_server does the following for saving psk:

     YAML 
   
 
 
   - name: Create a temporary file
  tempfile:
    state: file
  register: tempfile
- name: Save ip addr to a temp file
    copy content={{ openssl_key }} dest=”{{ tempfile.path }}” 
  

Where we save the openssl_key for further use.

Note. As well as in programming, any allocated resources must be released when they are not needed and not used. The last action must be this temporary file deletion. Especially if it contains sensitive data. I omit this action in the article. Don’t forget.

Once we save the openssl_key, there is a way to read it. So, here is a code snippet for using in install_zabbix_agent role:

     YAML 
   
    - set_fact:
     psk: "{{lookup('file', tempfile.path) }}"

set_fact is used for defining a variable that will be used further in the same play(in the example we have lynis left).

In the approach we just considered, we can store data persistently. That is, these data cannot be lost between playbook launches unless you’ve deleted it:

     YAML 
   
   - name: Remove the temporary file
  file:
    path: /tmp/ansible.*
    state: absent

Of course, you can use your own name for temporary files.

Approach 3

This is the most convenient method to exchange data between roles and even playbooks. This approach allows for storing data permanently. You probably guessed it was external storage. It can be Redis, Consul, or other key-value storage, even a relational database. I prefer to use Redis. I put passwords and API access tokens in Redis, and also I store intermediate data like IP addresses or generated keys. It is the most versatile method for storing and sharing variables.

All previous examples were trivial and didn’t require storing massive data.

Just imagine you need to store the data of all visited hosts between launches, or you would like to run Ansible across a different bunch of hosts and move some data from one group of servers to another. This is the significant difference between approach 1 and approach 2.

For this goal, Redis is most suitable.

There are two modules in Ansible for working with Redis:

One for saving data into Redis
Another for looking up data by its key

So, let’s consider a playbook. The task is to add all servers under Zabbix monitoring.

Example 5:

     YAML 
   
 
 
   - name: Install Zabbix agent
  hosts: all
  gather_facts: no
  roles:
   - install_zabbix_agent
- name: Configure Zabbix server
  hosts: all
  gather_facts: no
  roles:
   - add_host_to_zabbix_server
 
  

In example 5, as you can see, the first play works on each server, where PSKs are generated, and for configuring the Zabbix server, we should store all these PSKs and associated servers’ names (actually we need some additional info like TLSPSKIdentity and so on, It is omitted).

In the install_zabbix_agent/tasks/main.yml

     YAML 
   
 
 
   - name: Generating tls psk
  shell: /usr/bin/openssl rand -hex 32
  register: tls_psk
- name: Store psk in Redis
  delegate_to: localhost 
  community.general.redis_data:
    login_host: 127.0.0.1
    key: "{{ server_ip }}"
    value: "{{ tls_psk.stdout }}"
    tls: false 
    state: present 
  

We generate and save generated pre-shared key in Redis with a key equal to the server name.

In the role, add_host_to_zabbix we search for pre-shared key:

     YAML 
   
   - set_fact:
    tls_psk: "{{ lookup('redis', server_ip) }}"

And finally, we delete it:

     YAML 
   
 
 
   - name: Deleting tls psk from Redis
  delegate_to: localhost
  community.general.redis_data:
    tls: false
    key: "{{ server_ip }}"
    state: absent 
  

Pay attention that working with Redis, you should specify a host for connecting. In the case above, Redis is working on the same host as Ansible, and I use delegate_to: localhost

In your case, you must specify the correct host to connect to.

An advantage of using Redis is the possibility of exchanging data with other applications as well. Then your application can request necessary information in Redis, which is impossible in the two first approaches. Additionally, Redis can be used as a cache for facts that extremely accelerate playbook running.

Last but not least, here is my personal recommendation to consider the Ansible module ‘block in file.’

That can be used in a particular case of using a temporary file if your program can parse the final file. For example, this module is pretty useful for collecting data in one file for WireGuard.

Conclusion

Sharing data can be a challenging task in certain scenarios, and many DevOps professionals struggle to find the most effective solutions. However, I have demonstrated several techniques that can be used to efficiently accomplish this task.