How to reach a service subset from the service mesh

I’m working on a Postgres cluster (using patroni), I could get all my 3 instances up and running, and reachable through the service mesh. Each instance has a tag indicating its current role, either master or replica.

Now, I’d like to reach the current master (for write access) or one replica (eg, for backups) through the service mesh.

I’ve created a service-resolver like this :

Kind = "service-resolver"
Name = "pg"
# DefaultSubset = "pg-master"
Subsets = {
  "pg-master" = {
    filter = "\"master\" in Service.Tags"
  }
  "pg-replica" = {
    filter = "\"replica\" in Service.Tags"
  }
}

But now I’m stuck : how can I specify one of those subset in the upstream section of another service ?

    service {
      name = "backup"
      connect {
        sidecar_service {
          proxy {
            upstreams {
              # I want it to point on the replica subset of the pg service
              destination_name = "pg-replica"
              local_bind_port = 5432
            }
          }
        }
      }
    }

I couldn’t find anything in the documentation on how to use subsets defined in a service-resolver

1 Like

OK, Found this thread which put me on track. So, I now have 3 service-resolver

  • The main one which split my pg service into two subsets (master and replica)
Kind = "service-resolver"
Name = "pg"
DefaultSubset = "master"
Subsets = {
  "master" = {
    filter = "\"master\" in Service.Tags"
  }
  "replica" = {
    filter = "\"replica\" in Service.Tags"
  }
}
  • A virtual service for pg-master
Kind = "service-resolver"
Name = "pg-master"
Redirect {
  Service = "pg"
  ServiceSubset = "master"
}
  • A virtual service for pg-replica
Kind = "service-resolver"
Name = "pg-replica"
Redirect {
  Service = "pg"
  ServiceSubset = "replica"
}

So everything should be in place. Yet, if I try to use “pg-master” or “pg-replica” in an upstream (as destination_name), I can’t get it to work (I will get something like “read tcp 127.0.0.1:60284->127.0.0.1:5432: read: connection reset by peer”).

One thing that’s not clear in the doc : the intention should target the virtual service “pg-master”, or the real service “pg” (I’ve added both anyway for now, so the problem shouldn’t be on the intentions.

Anyone could help ?

Setting Envoy’s logs to debug, I can see

[2022-10-05 09:40:58.607][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:370] [C41] Creating connection to cluster pg-master.default.dc1.internal.c3d20f32-f916-9621-b828-7111cdf716d3.consul
[2022-10-05 09:40:58.607][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1761] no healthy host for TCP connection pool
[2022-10-05 09:40:58.607][15][debug][connection] [source/common/network/connection_impl.cc:139] [C41] closing data_to_write=0 type=1
[2022-10-05 09:40:58.607][15][debug][connection] [source/common/network/connection_impl.cc:250] [C41] closing socket: 1
[2022-10-05 09:40:58.607][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:198] [C42] new tcp proxy session
[2022-10-05 09:40:58.607][15][debug][filter] [source/common/tcp_proxy/tcp_proxy.cc:370] [C42] Creating connection to cluster pg-master.default.dc1.internal.c3d20f32-f916-9621-b828-7111cdf716d3.consul
[2022-10-05 09:40:58.607][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1761] no healthy host for TCP connection pool
[2022-10-05 09:40:58.607][15][debug][connection] [source/common/network/connection_impl.cc:139] [C42] closing data_to_write=0 type=1
[2022-10-05 09:40:58.607][15][debug][connection] [source/common/network/connection_impl.cc:250] [C42] closing socket: 1

Which I find strange because I’m not using the default “.consul” domain name on my consul agents so I’m not sure why it’s trying to connect to pg-master.default.dc1.internal.c3d20f32-f916-9621-b828-7111cdf716d3.consul

I finally got it working. Everything was OK on the service-resolver side, my issue was just an error on my side (sidecar_service accidentaly commented out).
Just one thing to keep in mind when using this to split service : the intention must use the real service destination, not the virtual one corresponding to a subset (in my case, I must use pg as destination in the intention, not pg-master, not pg-replica)

Just another side note : when using Service.Tags in a filter like this, it’s the tags of the sidecar service which matters, not the service itself. In my case, the service is pg, the sidecar is pg-sidecar-proxy. And I must ensure my tags (master & replica) are added to the corresponding pg-sidecar-proxy service, not the pg service (well, I push it on both, but for the mesh to route correctly, only the pg-sidecar-proxy is needed)

Hi @dbd,
I am trying to set up a Patroni cluster in Nomad and struggle with the service resolver.

My main goal is to be able to sidecar the replica subset so services could reach any of the replicas for read only operations.
I’m not interested in exposing the Patroni (nor Postgres) outside of the cluster. Only to other services in the same cluster.

From what I see Patroni does tag the various instances with the corresponding primary or replica tags, but service is still inaccessible.

I tried manually configuring the resolvers but the main router to the service seems broken in the Consul UiI

Any chance you could assist?

Thanks,
M.

That’s hard to say. What do you mean by the main router to the service seems broken ?

Thanks for trying to assist @dbd.

After applying the service resolvers, the sidecard doesn’t seem to resolve neither the subsets (primary or replica) nor the main service.

Do you happen to have a working example where you can sidecar the (i.e) replica?

Thanks again,
Mati