Docker Swarm 1.12 Cluster Orchestration with SaltStack

Starting in v1.12.0 of Docker Engine, there is a new way to manage your container orchestration layer in swarm mode. For starters, it now comes with its own service discovery baked in. With the drastic improvements to the simplicity of swarm mode, let’s see how we can automate the spin-up of a cluster using SaltStack.

Getting Started

A couple things before we get rolling. We need a way to programmatically target our minions. Any method is acceptable, but this guide uses node groups for targeting. If applicable, set up the node groups on our master.

/etc/salt/master.d/nodegroups.conf:

nodegroups:
  swarmmanager: 'manager*'
  swarmworker: 'worker*'

Also, this guide makes use of Pillar data, but we will not be walking though that as it’s pretty straightforward. Refer to Salt documentation for examples.

State files

We need to make six states for Docker Swarm:

Add Docker repository and install Docker Engine
Set Salt Mine data
Drain Swarm managers
Create a new Swarm cluster
Add managers to the Swarm cluster
Add workers to the Swarm cluster

Organizationally, the state file structure will look like:

docker/
  engine.sls
  mine.sls
  manager/
    first.sls
    join.sls
    drain.sls
  worker/
    join.sls

Let’s walk through each of these before we tackle orchestration.

1. Add Docker repository and install Docker Engine

No special sauce here. This contains the basics for getting Docker Engine installed onto our minions. This guide uses RHEL 7 for repository management, but the same principles can be applied to any other distro supported by Salt’s pkgrepo state.

docker/engine.sls:

docker:
  pkgrepo.managed:
    - humanname: {{ pillar['repositories']['repos']['docker']['humanname'] }}
    - baseurl: {{ pillar['repositories']['repos']['docker']['baseurl'] }}
    - gpgkey: {{ pillar['repositories']['repos']['docker']['gpgkey'] }}
    - gpgcheck: 1
  service.running:
    - enable: True
    - require:
      - pkg: docker-engine

docker-engine:
  pkg.installed:
    - version: {{ pillar['repositories']['pkgs']['docker-engine']['version'] }}
    - require:
      - pkgrepo: docker

2. Set Salt Mine data

I highly encourage you to read up on how the Salt Mine works – suffice to say that we are setting some mine data right now that will be important when we get to orchestration.

docker/mine.sls:

/etc/salt/minion.d/swarm.conf:
  file.managed:
    - source: salt://docker/files/etc/salt/minion.d/swarm.conf
    - require:
      - pkg: docker-engine

/etc/salt/minion.d/swarm.conf :

mine_functions:
  manager_token:
    - mine_function: cmd.run
    - 'docker swarm join-token manager -q'
  manager_ip:
    - mine_function: network.ip_addrs
    - eth0
  worker_token:
    - mine_function: cmd.run
    - 'docker swarm join-token worker -q'

3. Drain Swarm managers

We’ll use this state in just a moment. Stay tuned!

docker/manager/drain.sls:

drain manager:
  cmd.run:
    - name: 'docker node update --availability drain {{ grains['id'] }}'

4. Create a new Swarm cluster

This state installs Docker Engine and sets the mine data before initializing a Docker Swarm cluster. After the cluster is created, the node will be drained such that containers cannot be scheduled on it.

docker/manager/first.sls:

include:
  - docker.engine
  - docker.mine
  - docker.manager.drain

init new swarm cluster:
  cmd.run:
    - name: 'docker swarm init --advertise-addr '
    - require:
      - pkg: docker-engine
    - require_in:
      - cmd: drain manager

5. Add managers to the Swarm cluster

Every joining manager node needs to retrieve a token from a current cluster manager in order to join. The joining manager retrieves this through Salt Mine as seen in the salt['mine.get'] call. Similar to the first manager node, the last thing joining manager nodes do is get drained.

docker/manager/join.sls:

{% set join_token = salt['mine.get']('*', 'manager_token').items()[0][1] %}
{% set join_ip = salt['mine.get']('*', 'manager_ip').items()[0][1][0] %}

include:
  - docker.engine
  - docker.mine
  - docker.manager.drain

join cluster:
  cmd.run:
    - name: 'docker swarm join --token {{ join_token }} {{ join_ip }}:2377'
    - require:
      - pkg: docker-engine
    - require_in:
      - cmd: drain manager

6. Add workers to the Swarm cluster

Finally, this state joins the cluster just like our manager nodes, but there are a few differences. Specifically, the workers:

get a different token
do not get mine data
are not drained

docker/worker/join.sls:

{% set join_token = salt['mine.get']('*', 'worker_token').items()[0][1] %}
{% set join_ip = salt['mine.get']('*', 'manager_ip').items()[0][1][0] %}

include:
  - docker.engine

join cluster:
  cmd.run:
    - name: 'docker swarm join --token {{ join_token }} {{ join_ip }}:2377'
    - require:
      - pkg: docker-engine

Orchestrator

Fantastic! Now we have the scaffolding in place for us to get some power out of an orchestrate runner. The simplest way to think about orchestration in Salt is to think of it like an abstraction layer on top of our Salt states. Where we might normally execute a single salt 'web*' state.apply to describe the entire state of a minion, orchestration allows us to chain multiple states and other functions together in order to achieve our desired end-state. Let’s see what that looks like in practice.

docker/bootstrap.sls:

{% for manager in salt['saltutil.runner']('cache.grains', tgt='swarmmanager', expr_form='nodegroup') %}

{% if loop.first %}
{% set manager_sls = 'docker.manager.first' %}
{% else %}
{% set manager_sls = 'docker.manager.join' %}
{% endif %}

bootstrap swarm manager {{ manager }}:
  salt.state:
    - sls: {{ manager_sls }}
    - tgt: {{ manager }}

update mine for {{ manager }}:
  salt.function:
    - name: mine.update
    - tgt: '*'
    - require:
      - salt: bootstrap swarm manager {{ manager }}

{% endfor %}

{% for worker in salt['saltutil.runner']('cache.grains', tgt='swarmworker', expr_form='nodegroup') %}

bootstrap swarm worker {{ worker }}:
  salt.state:
    - sls: docker.worker.join
    - tgt: {{ worker }}

{% endfor %}

Let’s walk through the previous snippet because there are a few things happening there. There are two for-loops: one for the manager nodes and one of the worker nodes, and each of them are using the cache.grains runner to iterate over the node groups swarmmanager and swarmworker, respectively. The runner itself isn’t very important – what’s important is that it is a runner that allows us to iterate over members of a target group.

There was a recent bug that prevented runners from matching on node groups that has been fixed and closed out as of publishing this article. If you’re still getting the error Failed matching available minions with nodegroup pattern, updating to the latest release of 2016.3 should fix it.

The manager for-loop takes a special path on the first circuit. On the first iteration, we’re running docker.manager.first. As we showed earlier, docker.manager.first has the duty of initializing a new Swarm cluster for us. Every subsequent iteration instead calls docker.manager.join which, as the name implies, joins the existing cluster.

You’ll also notice in the manager for-loop that each iteration runs a mine.update before progressing to the next iteration. This is the crucial step that tells Salt to fetch the latest data from all minions broadcasting their mine data. Because we are including docker.mine in both docker.manager.first and docker.manager.join states, each joining manager can query the Salt Mine to retrieve the IP address of existing cluster manager and a manager token generated by one of the Swarm managers.

The same principle applies to the worker for-loop. In docker.worker.join, we query the Salt Mine for a manager’s IP address and a worker token generated by one of the Swarm managers.

To see how this works in practice, let’s run the orchestrator from our Salt master:

saltmaster ~$ salt-run state.orchestrate docker.bootstrap

And a few minutes later we have our cluster up and ready to accept services:

manager3 ~$ docker node ls
ID                           HOSTNAME               STATUS  AVAILABILITY  MANAGER STATUS
0x4u15u7v571z70qfdmiuh3sa    manager1               Ready   Drain         Reachable
1r652lsw5ek86zdhg1i4zjl5u    manager2               Ready   Drain         Reachable
cret7zwxpv77uiznb2krnv7sl *  manager3               Ready   Drain         Leader
dka8yshu1rs2pi24dy2762ib4    worker1                Ready   Active
esh8k1mpzt5csxkrt4uzk1z8y    worker2                Ready   Active
esh8k1mpzt5csxkrt4uzk1z8y    worker3                Ready   Active

Summary

There you have it, a turn-key approach to getting our Docker Swarm cluster up and running. What we did not discuss was what the top file contains, but it might look something like:

base:
  'swarmmanager':
    - match: nodegroup
    - docker.engine
    - docker.mine
    - docker.drain

  'swarmworker':
    - match: nodegroup
    - docker.engine

We have to be careful about what we choose to place in the top file. For instance, it probably does not make sense to include docker.manager.first, docker.manager.join, nor docker.worker.join in the top file since those are more for one-time operations that fit better in orchestrators, but that’s up for you to decide. Your mileage may vary. For more information on orchestration, see Salt’s orchestrate runner docs. For more information on Docker Engine in swarm mode, see Swarm mode overview.

from miller import blake

Hello, I'm a Seattle-based site reliabililty engineering manager, I get paid to do what I love, I like Python, I'm in an abusive relationship with JavaScript, I'm a fan of good design, and I don't think things always have to be so stupid.

You can subscribe via RSS and email me.