Service-resolver failover to secondary and tertiary datacenters

lkendrickd · April 8, 2022, 6:29pm

Consul Failover Service-Resolver

Consul Version: v1.11.3

Currently I have 3 datacenters all governed by Consul. All consul servers are healthy and aware of each other.

Goal: When I take one of my service pods offline and curl it I would like the ability for Consul to route the call to one of the other datacenters in my Consul configuration.

Experiment: When I delete the pod causing it to obviously not be able to serve the request when curl is executed I receive “no healthy upstream”

Following the documentation service-resolver

My service-resolver was applied to the primary and stated synced to the other datacenters.
Here is the service resolver.

kubectl get -n prd serviceresolvers.consul.hashicorp.com service-ci-prd -o yaml

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceResolver
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"consul.hashicorp.com/v1alpha1","kind":"ServiceResolver","metadata":{"annotations":{},"name":"service-ci-prd","namespace":"prd"},"spec":{"connectTimeout":"15s","defaultSubset":"default","failover":{"*":{"datacenters":["k-dc-g1","k-dc-g3","k-dco-g1"]}},"subsets":{"default":{"filter":"Service.Meta.environment == prd"}}}}
  creationTimestamp: "2022-04-06T18:41:41Z"
  finalizers:
  - finalizers.consul.hashicorp.com
  generation: 17
  name: service-ci-prd
  namespace: prd
  resourceVersion: "137557685"
  selfLink: /apis/consul.hashicorp.com/v1alpha1/namespaces/prd/serviceresolvers/service-ci-prd
  uid: 9b5521a8-eb25-46e2-90c3-81434a5fc15f
spec:
  connectTimeout: 15s
  defaultSubset: default
  failover:
    '*':
      datacenters:
      - k-dc-g1
      - k-dc-g3
      - k-dco-g1
  subsets:
    default:
      filter: Service.Meta.environment == prd
status:
  conditions:
  - lastTransitionTime: "2022-04-08T18:23:06Z"
    status: "True"
    type: Synced
  lastSyncedTime: "2022-04-08T18:23:06Z"

In the consul UI in all datacenters it shows the dcs and the failover applied. However when the pod is killed and curled I receive no healthy upstream. Note it also Synced correctly per above. In the UI I am going to the service and then instances and under Meta I am using the data there environment prd. Does anyone else know what else I need to check in order to get the routing to work when the one instance in the primary is down to route to one of the others. For all intensive purposes it looks like it is aware of the other dcs and instances of the services.

lkysow · May 10, 2022, 11:44pm

Hi, I think this might be a bug. Would you be okay to create a github issue?

Mahmoudalziem · April 18, 2023, 6:04pm

I have same this problem do you have any soluation until now

blake · May 19, 2023, 11:50pm

Are you able to reproduce this issue on a more recent Consul version?

The service failover behavior was changed significantly in Consul 1.14 (hashicorp/consul#14178). If this is a bug, I’m wondering whether it is still present in the current supported releases.

Topic		Replies	Views
Upstream across Kubernetes DC and consul VMs DC Consul	4	391	October 5, 2020
WAN Federation Mesh Gateway could not resolve host issue Consul	3	41	January 17, 2025
Kubernetes - Service Mesh - Access to service instances Consul k8s , service-mesh	5	1191	October 30, 2020
Issue connecting to services in remote dc when using federated consul cluster Consul	2	324	April 7, 2022
Single Consul Datacenter in Multiple Kubernetes Clusters Connection Failure Consul connect , first-time-question , service-mesh	3	714	April 28, 2022

Service-resolver failover to secondary and tertiary datacenters

Consul Failover Service-Resolver

Related topics