Upgrade patch consul from 1.8.6 to latest 1.14.x

Hi guys,
I am looking to upgrade our consul instances, the servers and agents are in version 1.8.6.
We use consul only for service discovery ( no servicemesh , no acl, no encryption)
My upgrade path based on the documentation looks like this:

Now I have 2 questions:

  • Can I upgrade from the latest 1.10.x to directly the latest 1.14.x ?
  • Does the consul agent v1.8.6 still compatible with consul server 1.10.x and 1.14.x ?

Thank you in advance for your help.
Lud

Hi @lud97x ,

I have asked HashiCorp support similar questions.

Unfortunately their usual response is to refuse to commit, and respond with some flavour of “you can test it yourself if you like but we still recommend what’s in the docs”.

Even when pressed, they won’t give me solid technical justifications for the specific intermediate versions they have selected. :frowning:

I have personally chosen to ignore the instructions to upgrade from 1.8.1 to latest 1.8.x before moving onwards in an upgrade I worked on, because I was able to determine from changelogs and supporting documentation, that that only applied to certain Enterprise licensing configurations. We moved straight from that to the latest 1.10.x at the time.

From 1.10.x to 1.14, I have no personal experience to share, although I see no reason why hopping straight to 1.14 couldn’t work, having reviewed changelogs - hence why I was trying, unsuccessfully, to get a yes or no from HashiCorp about whether there were any actual technical blockers.

Hi @maxb , thank for you reply, i have done some testing and been able to proceed my upgrade following this path:

  • server: 1.8.6 to 1.10.12 and 1.10.12 to 1.14.2
  • agent: once the servers were in 1.14.2, upgrade agent from 1.8.6 to 1.14.2

We use only consul for discovery so consul connect and peering have been disabled on the server side.

No majors issues encounter during the upgrade except CPU saturation during rhe agent upgrade on one of our big cluster ( + 2000 members)

1 Like

Hi @lud97x and @maxb

Consul server version: 1.8.9
Consul agent version: 1.2.9

Can I upgrade from 1.8.9 to 1.16.3
and then upgrade clients from 1.2.9 to 1.16.3 after upgrading consul server to 1.16.3?

What should be the path to follow?

hi @lud97x and @maxb,

During testing:
Able to upgrade from 1.8.9 to 1.16.3 → didn’t face any issues.

Observations:

  1. backward compatibility to 1.8.9 is not possible.
  2. new consul server with 1.8.9 not able to join cluster which is already upgraded to 1.16.3.
    Error: [ERROR] agent.server.raft: failed to restore snapshot: error=“failed to restore snapshot 4-16384-1702446548539: Unrecognized msg type 31”

Note: It is showing up in consul members but looks like data restoration is failing.

other errors during upgrade:
2023-12-11T12:08:23.379Z [WARN] agent: using enable-script-checks without ACLs and without allow_write_http_from is DANGEROUS, use enable-local-script-checks instead, see Protecting Consul from RCE Risk in Specific Configurations
2023-12-11T12:08:23.385Z [WARN] agent.auto_config: using enable-script-checks without ACLs and without allow_write_http_from is DANGEROUS, use enable-local-script-checks instead, see Protecting Consul from RCE Risk in Specific Configurations

2023-12-11T12:06:25.494Z [ERROR] agent.server.cert-manager: failed to handle cache update event: error=“leaf cert watch returned an error: rpc error making call: Connect must be enabled in order to use this endpoint”

023-12-11T12:06:13.047Z [WARN] agent: error getting server health from server: server=consul-2 error=“context deadline exceeded”
2023-12-11T12:06:17.916Z [WARN] agent.leaf-certs: handling error in Manager.Notify: error=“rpc error making call: Connect must be enabled in order to use this endpoint” index=1

Team,
Upgrade from 1.8.9 to 1.16.4:

We are getting following errors:
1.8.9
2024-03-19T20:33:42.915Z [ERROR] agent.server.rpc: unrecognized RPC byte: byte=8 conn=from=x.x.x.x:44244

1.16.4 → wrt to wan
whenever we do restart of consul, wan is not able to connect to other instances sometimes (transient)
Deleting and adding new instance fixed the issue.

In 1.16.4 Consul server:
2024-03-12T20:33:46.101Z [WARN] agent: [core][Channel #1 SubChannel #61] grpc: addrConn.createTransport failed to connect to {Addr: “x.x.x.x:8300”, ServerName: “aaaaaaaa”, }. Err: connection error: desc = “error reading server preface: EOF”

While doing upgrade we got acl error though acl is not enabled.
2024-03-19T17:36:18.967Z [ERROR] agent.server.raft: failed to restore snapshot: id=153483-900786585-1710868811563 last-index=911702585 last-term=153343 size-in-bytes=1800899 error=“failed inserting acl token: missing value for index ‘accessor’”

Can you please guide.
Is it possible to safely ignore above errors?