Consul Upgrade Path

Hello,

I have a Consul cluster running as Docker containers in a group of 5 virtual machines. It’s currently on version 1.2.3, which is quite dated. I’d like to perform a step-wise upgrade to the latest (or near latest) version. This Consul also serves as backend storage for Vault (version 1.6.5).

In the past I’ve followed the general upgrade process guide to perform upgrades successfully. After doing the appropriate backups, I perform the upgrade on each node (leaving the leader node for last). The basic process per node is this:

  1. Enable maintenance mode
  2. Stop the old Consul container (which forces it to leave the cluster)
  3. Remove the old Consul container. Note that the data volume is persistent, so the state is saved even if the container is removed
  4. Start a new Consul container with the new Docker image version. This is done via an Ansible playbook so that I can run the container by simply changing one variable - the Consul version
  5. Join the node back into the cluster (which usually happens automatically since the new container uses the same persistent volume, which has the previous stored state)

I’ve gone through the changelog and version-specific upgrade guide. I was able to upgrade my test cluster to 1.3.1 without any problems using the procedure I outlined above. However, I assume the next upgrade is going to be more involved because the ACL system was overhauled in 1.4.

FYI my current configuration looks like this (slightly truncated):

{
  "leave_on_terminate": true,
  "skip_leave_on_interrupt": true,
  "rejoin_after_leave": true,
  "retry_interval": "30s",
  "dns_config": {
    "service_ttl": {
      "*": "5m"
    },
    "node_ttl": "30s",
    "allow_stale": true,
    "max_stale": "30m"
  },
  "recursors": [
    "10.x.x.x"
  ],
  "acl_datacenter": "east-dc",
  "acl_default_policy": "deny",
  "acl_down_policy": "deny",
  "acl_master_token": "the-master-acl-token",
  "acl_enforce_version_8": false,
  "cert_file": "my.crt",
  "key_file": "my.key",
  "ports": {
    "http": 8500,
    "https": 8501
  },
  "performance": {
    "raft_multiplier": 3
  }
}

So I have a couple questions:

  1. Has anyone done the 1.3 → 1.4 upgrade? If so, do you have any general tips or potential pitfalls that I should be aware of?
  2. It appears that I’ll need to migrate the legacy tokens to the new ACL system after the upgrade. Seems rather straightforward from the documentation, but does anyone have experience with this? Does the consul acl translate-rules command actually do a good job at converting the rules?
  3. Since I’m using Consul as the backend for Vault, does this new ACL system have any impact on that?

Any other recommendations beyond just the 1.4 upgrade would be appreciated as well. I can provide more information if needed. Thank you very much.