Envoy Tracing with Datadog in AWS ECS Fargate

I’m not able to get Trace spans in Datadog from Envoy when running in Fargate. I get traces from the app, but when calling another app with a similar task definition, I don’t get the Envoy Spans I expect to see.

With the closest I get, the envoy container logs an error:

failed to generate xDS resources for
 "type.googleapis.com/envoy.config.listener.v3.Listener": 
  Any JSON doesn't have '@type'

The task definition I’m using has 4 containers:
1 - the app
2 - consul
3 - envoy
4 - datadog

Consul & Envoy containers are the same image:
nicholasjackson/consul-envoy:v1.11.2-v1.20.1
and they both use CONSUL_LOCAL_CONFIG to pass the configuration.

The consul agent container starts with:

"command": [
        "agent",
        "-bind",
        "{{ GetInterfaceIP \"eth1\" }}",
        "-data-dir",
        "/consul/data",
        "-retry-join",
        "DATACENTER"
      ]

The Envoy Container starts with:

      "command": [
        "consul",
        "connect",
        "envoy",
        "-sidecar-for",
        "SERVICE",
        "-token",
        "CONSUL_ACL_TOKEN"
      ]

The CONSUL_LOCAL_CONFIG variable contains a json blob containing the sidecar proxy config with envoy tracing like:

{
  "service": {
    "connect": {
      "sidecar_service": {
        "proxy": {
          "destination_service_name": "SERVICE",
          "local_service_port": 9090,
          "config": [
            {
              "envoy_public_listener_json": "{\"address\":{...}}",
              "envoy_tracing_json": "{\"http\":{...}}",
              "envoy_extra_static_clusters_json": "{\"name\":\"datadog_local\",...}"
            }
          ]
        }
      }
    }
  }
}

Those vars expand like:
envoy_extra_static_cluster:

{
  "name": "datadog_local",
  "connect_timeout": "3.000s",
  "lb_policy": "ROUND_ROBIN",
  "dns_lookup_family": "V4_ONLY",
  "load_assignment": {
    "cluster_name": "datadog_local",
    "endpoints": [
      {
        "lb_endpoints": [
          {
            "endpoint": {
              "address": {
                "socket_address": {
                  "address": "127.0.0.1",
                  "port_value": 8126,
                  "protocol": "TCP"
                }
              }
            }
          }
        ]
      }
    ]
  }
}

envoy_public_listener:

{
  "address": {
    "socket_address": {
      "address": "127.0.0.1",
      "port_value": 9090
    }
  },
  "filter_chains": [
    {
      "filters": [
        {
          "name": "envoy.filters.network.http_connection_manager",
          "typed_config": {
            "@type": "type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager",
            "generate_request_id": "true",
            "request_id_extension": {
              "typed_config": {
                "@type": "type.googleapis.com/envoy.extensions.request_id.uuid.v3.UuidRequestIdConfig",
                "use_request_id_for_trace_sampling": "false"
              }
            },
            "codec_type": "auto",
            "stat_prefix": "public_listener",
            "route_config": {
              "name": "local_route",
              "virtual_hosts": [
                {
                  "name": "backend",
                  "domains": [
                    "*"
                  ],
                  "routes": [
                    {
                      "match": {
                        "prefix": "/"
                      },
                      "route": {
                        "cluster": "service1"
                      }
                    }
                  ]
                }
              ]
            },
            "http_filters": [
              {
                "name": "envoy.filters.http.health_check",
                "typed_config": {
                  "@type": "type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck",
                  "pass_through_mode": "false",
                  "headers": [
                    {
                      "exact_match": "/healthcheck",
                      "name": ":path"
                    }
                  ]
                }
              },
              {
                "name": "envoy.filters.http.router",
                "typed_config": {}
              }
            ],
            "use_remote_address": "true"
          }
        }
      ]
    }
  ]
}

envoy_tracing_json:

{
  "http": {
    "name": "envoy.tracers.datadog",
    "typed_config": {
      "@type": "type.googleapis.com/envoy.config.trace.v3.DatadogConfig",
      "collector_cluster": "datadog_local",
      "service_name": "envoy"
    }
  }
}

The Datadog config being referenced is from datadog’s doc

The envoy log shows warnings generating the xDS listener resource pointing to a misconfigured listener:

[warning][config] [./source/common/config/grpc_stream.h:196] DeltaAggregatedResources gRPC config stream closed since 46s ago: 14, failed to generate all xDS resources from the snapshot: failed to generate xDS resources for "type.googleapis.com/envoy.config.listener.v3.Listener": Any JSON doesn't have '@type'

Traces generate for the app, but envoy spans are not visible.

I’ve been able to get this to work in docker containers from my local machine using nicholasjacksons fake-service demo tracing repo and opened a bug on his fake-service repo referencing that. But I can’t seem to get this working in Fargate.

Any guidance on this Envoy config in Fargate would be appreciated.

Hi @mlindes,

Its not clear from the log output which listener this error relates to. Can you look at the Envoy config dump to see if it shows in more detail which listener it is having problems configuring?

$ curl --silent "localhost:19000/config_dump?resource=dynamic_listeners"

Thanks.

Hi @blake,

The dynamic_listeners section is empty. Looks like that error prevents the listener from being created.

The rest of the config_dump (minus extensions) is here:

{
  "configs": [
    {
      "@type": "type.googleapis.com/envoy.admin.v3.BootstrapConfigDump",
      "bootstrap": {
        "node": {
          "id": "fake-web-3-sidecar-proxy",
          "cluster": "fake-web-3",
          "metadata": {
            "partition": "default",
            "namespace": "default"
          },
          "user_agent_name": "envoy",
          "user_agent_build_version": {
            "version": {
              "major_number": 1,
              "minor_number": 20,
              "patch": 1
            },
            "metadata": {
              "build.type": "RELEASE",
              "revision.sha": "ea23f47b27464794980c05ab290a3b73d801405e",
              "ssl.version": "BoringSSL",
              "revision.status": "Clean"
            }
          },
          "extensions": [
          ]
        },
        "static_resources": {
          "clusters": [
            {
              "name": "local_agent",
              "type": "STATIC",
              "connect_timeout": "1s",
              "http2_protocol_options": {},
              "load_assignment": {
                "cluster_name": "local_agent",
                "endpoints": [
                  {
                    "lb_endpoints": [
                      {
                        "endpoint": {
                          "address": {
                            "socket_address": {
                              "address": "127.0.0.1",
                              "port_value": 8502
                            }
                          }
                        }
                      }
                    ]
                  }
                ]
              }
            },
            {
              "name": "datadog_local",
              "connect_timeout": "3s",
              "dns_lookup_family": "V4_ONLY",
              "load_assignment": {
                "cluster_name": "datadog_local",
                "endpoints": [
                  {
                    "lb_endpoints": [
                      {
                        "endpoint": {
                          "address": {
                            "socket_address": {
                              "address": "127.0.0.1",
                              "port_value": 8126
                            }
                          }
                        }
                      }
                    ]
                  }
                ]
              }
            }
          ]
        },
        "dynamic_resources": {
          "lds_config": {
            "ads": {},
            "resource_api_version": "V3"
          },
          "cds_config": {
            "ads": {},
            "resource_api_version": "V3"
          },
          "ads_config": {
            "api_type": "DELTA_GRPC",
            "grpc_services": [
              {
                "envoy_grpc": {
                  "cluster_name": "local_agent"
                },
                "initial_metadata": [
                  {
                    "key": "x-consul-token",
                    "value": "<token>"
                  }
                ]
              }
            ],
            "transport_api_version": "V3"
          }
        },
        "tracing": {
          "http": {
            "name": "envoy.tracers.datadog",
            "typed_config": {
              "@type": "type.googleapis.com/envoy.config.trace.v3.DatadogConfig",
              "collector_cluster": "datadog_local",
              "service_name": "envoy"
            }
          }
        },
        "admin": {
          "access_log_path": "/dev/null",
          "address": {
            "socket_address": {
              "address": "127.0.0.1",
              "port_value": 19000
            }
          }
        },
        "stats_config": {
          "stats_tags": [
            {
              "tag_name": "consul.destination.custom_hash",
              "regex": "^cluster\\.(?:passthrough~)?((?:([^.]+)~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.service_subset",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:([^.]+)\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.service",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:[^.]+\\.)?([^.]+)\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.namespace",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.([^.]+)\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.partition",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:([^.]+)\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.datacenter",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?([^.]+)\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.routing_type",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.([^.]+)\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.trust_domain",
              "regex": "^cluster\\.(?:passthrough~)?((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.([^.]+)\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.target",
              "regex": "^cluster\\.(?:passthrough~)?(((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+)\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.destination.full_target",
              "regex": "^cluster\\.(?:passthrough~)?(((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+)\\.consul\\.)"
            },
            {
              "tag_name": "consul.upstream.service",
              "regex": "^(?:tcp|http)\\.upstream\\.(([^.]+)(?:\\.[^.]+)?(?:\\.[^.]+)?\\.[^.]+\\.)"
            },
            {
              "tag_name": "consul.upstream.datacenter",
              "regex": "^(?:tcp|http)\\.upstream\\.([^.]+(?:\\.[^.]+)?(?:\\.[^.]+)?\\.([^.]+)\\.)"
            },
            {
              "tag_name": "consul.upstream.namespace",
              "regex": "^(?:tcp|http)\\.upstream\\.([^.]+(?:\\.([^.]+))?(?:\\.[^.]+)?\\.[^.]+\\.)"
            },
            {
              "tag_name": "consul.upstream.partition",
              "regex": "^(?:tcp|http)\\.upstream\\.([^.]+(?:\\.[^.]+)?(?:\\.([^.]+))?\\.[^.]+\\.)"
            },
            {
              "tag_name": "consul.custom_hash",
              "regex": "^cluster\\.((?:([^.]+)~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.service_subset",
              "regex": "^cluster\\.((?:[^.]+~)?(?:([^.]+)\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.service",
              "regex": "^cluster\\.((?:[^.]+~)?(?:[^.]+\\.)?([^.]+)\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.namespace",
              "regex": "^cluster\\.((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.([^.]+)\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.datacenter",
              "regex": "^cluster\\.((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?([^.]+)\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.routing_type",
              "regex": "^cluster\\.((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.([^.]+)\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.trust_domain",
              "regex": "^cluster\\.((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.([^.]+)\\.consul\\.)"
            },
            {
              "tag_name": "consul.target",
              "regex": "^cluster\\.(((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+)\\.[^.]+\\.[^.]+\\.consul\\.)"
            },
            {
              "tag_name": "consul.full_target",
              "regex": "^cluster\\.(((?:[^.]+~)?(?:[^.]+\\.)?[^.]+\\.[^.]+\\.(?:[^.]+\\.)?[^.]+\\.[^.]+\\.[^.]+)\\.consul\\.)"
            },
            {
              "tag_name": "local_cluster",
              "fixed_value": "fake-web-3"
            },
            {
              "tag_name": "consul.source.service",
              "fixed_value": "fake-web-3"
            },
            {
              "tag_name": "consul.source.namespace",
              "fixed_value": "default"
            },
            {
              "tag_name": "consul.source.partition",
              "fixed_value": "default"
            },
            {
              "tag_name": "consul.source.datacenter",
              "fixed_value": "us-east-2"
            }
          ],
          "use_all_default_tags": true
        }
      },
      "last_updated": "2022-03-24T13:57:14.397Z"
    },
    {
      "@type": "type.googleapis.com/envoy.admin.v3.ClustersConfigDump",
      "static_clusters": [
        {
          "cluster": {
            "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
            "name": "datadog_local",
            "connect_timeout": "3s",
            "dns_lookup_family": "V4_ONLY",
            "load_assignment": {
              "cluster_name": "datadog_local",
              "endpoints": [
                {
                  "lb_endpoints": [
                    {
                      "endpoint": {
                        "address": {
                          "socket_address": {
                            "address": "127.0.0.1",
                            "port_value": 8126
                          }
                        }
                      }
                    }
                  ]
                }
              ]
            }
          },
          "last_updated": "2022-03-24T13:57:15.409Z"
        },
        {
          "cluster": {
            "@type": "type.googleapis.com/envoy.config.cluster.v3.Cluster",
            "name": "local_agent",
            "type": "STATIC",
            "connect_timeout": "1s",
            "http2_protocol_options": {},
            "load_assignment": {
              "cluster_name": "local_agent",
              "endpoints": [
                {
                  "lb_endpoints": [
                    {
                      "endpoint": {
                        "address": {
                          "socket_address": {
                            "address": "127.0.0.1",
                            "port_value": 8502
                          }
                        }
                      }
                    }
                  ]
                }
              ]
            }
          },
          "last_updated": "2022-03-24T13:57:15.009Z"
        }
      ]
    },
    {
      "@type": "type.googleapis.com/envoy.admin.v3.ListenersConfigDump"
    },
    {
      "@type": "type.googleapis.com/envoy.admin.v3.SecretsConfigDump"
    }
  ]
}
#

Hi @mlindes,

My apologies for not replying. I lost track of this thread.

I recently updated the escape-hatch overrides section of Consul’s Envoy integration docs page was recently updated to include example configurations for the various escape hatches.

The example envoy_public_listener_json has been tested and is known to work. After comparing that to your configuration, I believe you just need to add the @type field to the listener object in order to resolve the error from Envoy.