Connect proxy works fine but Connect envoy does not

Hi, https://learn.hashicorp.com/consul/getting-started/connect
I followed the above link, and saw “connect proxy” worked but “connect envoy” did not.

To be exact, the “connect envoy” with “socat” works but “connect envoy” with the dependent service does not work.

envoy debug log says:

[2020-07-02 13:36:59.195][21936][debug][filter] [external/envoy/source/common/tcp_proxy/tcp_proxy.cc:395] [C30] Creating connection to cluster socat.default.yz.internal.69e9281f-9351-099c-613a-48349ccde8c3.consul
[2020-07-02 13:36:59.195][21936][debug][upstream] [external/envoy/source/common/upstream/cluster_manager_impl.cc:1288] no healthy host for TCP connection pool

I also consul monitor with debug loglevel, and nothing was of interest.

2020-07-02T13:53:52.023+0800 [DEBUG] agent.envoy: generating cluster for: cluster=socat.default.yz.internal.69e9281f-9351-099c-613a-48349ccde8c3.consul
2020-07-02T13:53:52.054+0800 [DEBUG] agent.envoy: generating endpoints for: cluster=socat.default.yz.internal.69e9281f-9351-099c-613a-48349ccde8c3.consul

I was using Consul 1.8.0 (ACL enabled, Consul as CA provider), envoy 1.14.2.
All services are expected on host network of 2 machines.

{
  "service": {
    "name": "socat",
    "id": "socat",
    "port": 8181,
    "token": "8627106a-31be-3dff-e920-7c17dbd20e81",
    "connect": {
      "sidecar_service": {
       }
    }
  }
}
{
  "service": {
    "name": "socat-dep",
    "id": "socat-dep",
    "token": "8627106a-31be-3dff-e920-7c17dbd20e81",
    "connect": {
      "sidecar_service": {
        "proxy": {
          "upstreams": [
            {
              "destination_name": "socat",
              "local_bind_port": 9191
            }
          ]
        }
      }
    }
  }
}

and for the token of 8627106a-31be-3dff-e920-7c17dbd20e81:

service_prefix "socat" {
  policy = "write"
}

Still, I create intention allowing from socat-dep to socat.
(I even tried intention allowing * to *)

Any idea?

I see the same behavior but deploying with nomad.

Did you find something useful for overcome this?

sadly no.
I gave up and tried ahead with nomad, still not working.

I’ve got working this right now, in my repository in Github: https://github.com/pitakill/consul-training/tree/2-datacenters in the branch 2-datacenters

My problem was the ip address where envoy was listening.

I think we don’t have the exact same problem, but in the repository, I have an example of consul connect working and an example of consul mesh gateway working.

Maybe worth a look at it

Thank you, I will take a look!

The ACL tokens used by your proxies might need more rules added. If you are using the sidecar_service shorthand then the token used for registering the socat service needs

service "socat" { policy = "write" }
service "socat-sidecar-proxy" { policy = "write" }
# nothing else since it doesn't appear to have any upstreams

The one for socat-dep needs:

service "socat-dep" { policy = "write" }
service "socat-dep-sidecar-proxy" { policy = "write" }
service "socat" { policy = "read" }
node "<any node that socat is running on>" { policy = "read" }

In slightly less restrictive environments you can usually just be fine with wildcard read-all on services and names:

service "socat-dep" { policy = "write" }
service "socat-dep-sidecar-proxy" { policy = "write" }
service_prefix "" { policy = "read" }
node_prefix "" { policy = "read" }

And if you’re ok with that policy, then you can skip that entirely and just use a ServiceIdentity when defining the token: https://www.consul.io/docs/acl/acl-system#acl-service-identities

I suspect you are missing the node component, since the service_prefix wildcard should handle both writes and reads since both services are sharing a token and their names use the same prefix.

The node component allows the service discovery request made by the sidecar to actually resolve node information (ip address), so if you don’t have it, you can’t actually get enough info to populate the endpoint list in envoy to dial out.

It totally works as you said!!

Can you clarify what your problem and solution was? I’m having this problem right now. I looked through your repo and couldn’t connect the dots from your comment here to any commit in your code that solved the problem.

I haven’t enabled ACL, so I don’t think my problem is related to ACL. Right? It’s not like ACL is required to be enabled, and if it’s not enabled, then the default is to allow everything?