Consul HTTP 500 during upgrades 1.12 -> 1.13

I am trying to upgrade a cluster running 1.12.8 to 1.13.6. After upgrading the first server (out of 5 servers), we start to see a small amount of HTTP errors in the Consul logs.

A local process (using go package github.com/hashicorp/consul/api v1.13.1) reaches out to Consul to through this API call: GET /v1/health/service/mysql?cached=&passing=1&stale=&tag=sd%3Ajob%3Dspot. This results in a HTTP 500.

Consul logs:

2023-03-30_12:19:19.37318 2023-03-30T12:19:19.373Z [ERROR] agent.rpcclient.health: subscribe call failed: err="rpc error: code = InvalidArgument desc = Key is required" fai
lure_count=13 key=mysql topic=ServiceHealth
2023-03-30_12:19:46.55046 2023-03-30T12:19:46.550Z [ERROR] agent.rpcclient.health: subscribe call failed: err="rpc error: code = InvalidArgument desc = Key is required" fai
.0.1:46000 latency=406.345µs

A few things:

  • I am able to reproduce this on the node by curl-ing the same URI
  • this seems intermittent. curl-ing a short while later will return HTTP 200, but yet another service will run into the same issue
  • bypassing cache (removing the cached query parameter) seem to always work.

Has anyone encountered this before? Are there any hints as to why this can happen?

The information you’ve gathered does hint rather strongly at a compatibility problem whilst performing the upgrade. I would recommend you report it as a GitHub issue.