Skip to content
Advertisement

The best way to authorize ssh key of each node to all nodes in the cluster

I want to create a cluster infrastructure that each node communicates with others over shh. I want to use ansible to create a idempotent playbook/role that can be executed when cluster initialized or new nodes added to cluster. I was able to think of 2 scenarios to achieve this.

 First Scenario

  • task 1 fetches the ssh key from a node (Probably assigns it to a variable or writes to a file).
  • Then task 2 that executed locally loops over other nodes and authorizes the first node with fetched key.

This scenario supports free strategy. Tasks can be executed without waiting for all hosts. But it also requires all nodes to have related user and public key. Because if you are creating users within the same playbook (due to free strategy), when the task 2 starts running there may be users that are not created on other nodes in the cluster.

First Scenario

Although i am a big fan of free strategy, i din’t implement this scenario due to efficiency reasons. It makes n^2 connections for n node cluster.

 Second Scenario

  • task 1 fetches the ssh key from all nodes in order. Then writes each one to a file which name is set according to ansible_hostname.
  • Then task 2 that executed locally loops over other nodes and authorizes all keys.

This scenario only supports linear strategy. You can create users within same playbook thanks to linear strategy, all users will be created before task 1 starts running.

Second Scenario

I think it is an efficient scenario. It makes only 2n connections for n node cluster. I did implement it and i put the snippet i wrote.

JavaScript

Maybe i can use lineinfile instead of fetch but other than that i don’t know if it is the right way. It takes so long when cluster size getting larger (Because of the linear strategy). Is there a more efficient way that i can use?

Advertisement

Answer

When Ansible loops through authorized_key, it will (roughly) perform the following tasks:

  1. Create a temporary authorized_key python script on the control node
  2. Copy the new authorized_key python script to the managed node
  3. Run the authorized_key python script on the managed node with the appropriate parameters

This increases n2 as the number of managed nodes increases; with 1000 boxes, this task is performed 1000 times per box.

I’m having trouble finding specific docs which properly explains exactly what’s going on under-the-hood, so I’d recommend running an example script get a feel for it:

JavaScript

This should be ran with the triple verbose flag (-vvv) and with the output piped to ./ansible.log (ex. ansible-playbook example-loop.yml -i hosts.yml -vvv > example-loop-output.log). Searching through those logs for command.py and sftp will help get a feel for how your script scales as the list retrieved by "{{ groups['cluster_node'] }}" increases.

For small clusters, this inefficiency is perfectly acceptable. However, it may become problematic on large clusters.

Now, the authorized_key module is essentially just generating an authorized_keys file with a) the keys which already exist within authorized_keys and b) the public keys of each node on the cluster. Instead of repeatedly generating an authorized_keys file on each box individually, we can construct the authorized_keys file on the control node and deploy it to each box.

The authorized_keys file itself can be generated with assemble; this will take all of the gathered keys and concatenate them into a single file. However, if we just synchronize or copy this file over, we’ll wipe out any non-cluster keys added to authorized_keys. To avoid this, we can use blockinfile. blockinfile can manage the cluster keys added by Ansible. We’ll be able to add new keys while removing those which are outdated.

JavaScript

As-is, this solution isn’t easily compatible with roles; roles are designed to only handle a single value for hosts (a host, group, set of groups, etc), and the above solution requires switching between a group and localhost.

We can remedy this with delegate_to, although it may be somewhat inefficient with large clusters, as each node in the cluster will try assembling authorized_keys. Depending on the overall structure of the ansible project (and the size of the team working on it), this may or may not be ideal; when skimming a large script with delegate_to, it can be easy to miss that something’s being performed locally.

JavaScript
Advertisement