Service-resolver failover to secondary and tertiary datacenters

Consul Failover Service-Resolver

Consul Version: v1.11.3

Currently I have 3 datacenters all governed by Consul. All consul servers are healthy and aware of each other.

Goal: When I take one of my service pods offline and curl it I would like the ability for Consul to route the call to one of the other datacenters in my Consul configuration.

Experiment: When I delete the pod causing it to obviously not be able to serve the request when curl is executed I receive “no healthy upstream”

Following the documentation service-resolver

My service-resolver was applied to the primary and stated synced to the other datacenters.
Here is the service resolver.

kubectl get -n prd serviceresolvers.consul.hashicorp.com service-ci-prd -o yaml
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceResolver
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"consul.hashicorp.com/v1alpha1","kind":"ServiceResolver","metadata":{"annotations":{},"name":"service-ci-prd","namespace":"prd"},"spec":{"connectTimeout":"15s","defaultSubset":"default","failover":{"*":{"datacenters":["k-dc-g1","k-dc-g3","k-dco-g1"]}},"subsets":{"default":{"filter":"Service.Meta.environment == prd"}}}}
  creationTimestamp: "2022-04-06T18:41:41Z"
  finalizers:
  - finalizers.consul.hashicorp.com
  generation: 17
  name: service-ci-prd
  namespace: prd
  resourceVersion: "137557685"
  selfLink: /apis/consul.hashicorp.com/v1alpha1/namespaces/prd/serviceresolvers/service-ci-prd
  uid: 9b5521a8-eb25-46e2-90c3-81434a5fc15f
spec:
  connectTimeout: 15s
  defaultSubset: default
  failover:
    '*':
      datacenters:
      - k-dc-g1
      - k-dc-g3
      - k-dco-g1
  subsets:
    default:
      filter: Service.Meta.environment == prd
status:
  conditions:
  - lastTransitionTime: "2022-04-08T18:23:06Z"
    status: "True"
    type: Synced
  lastSyncedTime: "2022-04-08T18:23:06Z"

In the consul UI in all datacenters it shows the dcs and the failover applied. However when the pod is killed and curled I receive no healthy upstream. Note it also Synced correctly per above. In the UI I am going to the service and then instances and under Meta I am using the data there environment prd. Does anyone else know what else I need to check in order to get the routing to work when the one instance in the primary is down to route to one of the others. For all intensive purposes it looks like it is aware of the other dcs and instances of the services.

1 Like

Hi, I think this might be a bug. Would you be okay to create a github issue?

I have same this problem do you have any soluation until now

Are you able to reproduce this issue on a more recent Consul version?

The service failover behavior was changed significantly in Consul 1.14 (hashicorp/consul#14178). If this is a bug, I’m wondering whether it is still present in the current supported releases.