Cannot disable TLS comms between worker and controller: "tls: first record does not look like a TLS handshake"

Hi there Everyone,

I’m attempting to setup boundary in an AKS Cluster in Azure. The controller starts up fine. When the worker starts and attempts to communicate with the controller, I can see this error:

tls: first record does not look like a TLS handshake

I’ve attempted to set tls_disable = true on all the listeners, so I’m not sue why there’s still TLS communication being attempted. Here are my configs:

Controller Config:

# --------------------------------------------------------
# Azure Creds will be passed via environment variables:
# --------------------------------------------------------
#
# AZURE_TENANT_ID:                  Azure Tenant ID
# AZURE_CLIENT_ID:                  Azure App ID 
# AZURE_CLIENT_SECRET:              Azure App Password
# AZUREKEYVAULT_WRAPPER_VAULT_NAME: Key Vault Name
# BOUNDARY_POSTGRES_URL:            Postgres connection string

disable_mlock = true
log_level     = "trace"

controller {
  name        = "env://HOSTNAME"
  description = "A controller for a demo!"
  database {
    url = "env://BOUNDARY_POSTGRES_URL"
  }
  public_cluster_addr = "#POD_IP#:9200"
}

# API config
listener "tcp" {
  purpose              = "api"
  tls_disable          = true
  cors_enabled         = true
  cors_allowed_origins = ["*"]
  address              = "#POD_IP#"
}

# Cluster config
listener "tcp" {
  purpose     = "cluster"
  tls_disable = true
  address              = "#POD_IP#"
}

# Root KMS configuration block: this is the root key for Boundary
# Using Azure Key Vault
kms "azurekeyvault" {
  purpose  = "root"
  key_name = "root"
}

# Worker authorization KMS
# Using Azure Key Vault
kms "azurekeyvault" {
  purpose  = "worker-auth"
  key_name = "worker"
}

# Recovery KMS block: configures the recovery key for Boundary
# Using Azure Key Vault
kms "azurekeyvault" {
  purpose  = "recovery"
  key_name = "recovery"

}

Here’s my worker config:

# --------------------------------------------------------
# Azure Creds will be passed via environment variables:
# --------------------------------------------------------
#
# AZURE_TENANT_ID:                  Azure Tenant ID
# AZURE_CLIENT_ID:                  Azure App ID 
# AZURE_CLIENT_SECRET:              Azure App Password
# AZUREKEYVAULT_WRAPPER_VAULT_NAME: Key Vault Name
# BOUNDARY_POSTGRES_URL:            Postgres connection string

listener "tcp" {
  address     = "#POD_IP#:9200"
  purpose     = "proxy"
  tls_disable = true
}

worker {
  # Name attr must be unique
  public_addr = "env://BOUNDARY_PUBLIC_ADDR"
  name        = "env://HOSTNAME"
  description = "A default worker created for demonstration"
  controllers = ["boundary-controller.acme.internal:9201"] # private dns
}

# Worker authorization KMS
# Using Azure Key Vault
kms "azurekeyvault" {
  purpose  = "worker-auth"
  key_name = "worker"
}

Finally, here’s the complete error in the worker:

{
    "id": "B90goaJhT4",
    "source": "https://hashicorp.com/boundary/boundary-worker-dbf9d7547-m7x2l",
    "specversion": "1.0",
    "type": "error",
    "data": {
        "error": "rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing unable to write connection nonce: tls: first record does not look like a TLS handshake\"",
        "error_fields": {},
        "id": "e_3VG55uvUQo",
        "version": "v0.1",
        "op": "worker.(Worker).sendWorkerStatus",
        "info": {
            "msg": "error making status request to controller"
        }
    },
    "datacontentype": "application/cloudevents",
    "time": "2022-05-26T14:21:46.308688227Z"
}

Any help would be appreciated. Thank you as always.

Hi there Everyone,

After reading this, I realized disabling tls wasn’t really an option.

To enable tls, I did the following:

  • Generated self signed certs
  • Deployed them for use with the controller and worker
  • Ensured the self-signed certs were part of the trusted ca.
  • Ensure the certs matched the internal DNS names I’m using for controller/worker

Now at the controller, I receive the following error:

]  controller: http: TLS handshake error from 172.28.17.98: 58686: EOF
2022-05-29T08: 15: 20.800Z [INFO
]  controller: http: TLS handshake error from 172.28.16.69: 56820: tls: client requested unsupported application protocols ([v1workerauth-00-Ct8Hr28UpyH3u+gM5

At the worker, I continue to receive this:

{
  "id": "6yIyHLQCgZ",
  "source": "https://hashicorp.com/boundary/boundary-worker-dbf9d7547-m7x2l",
  "specversion": "1.0",
  "type": "error",
  "data": {
    "error": "rpc error: code = Unavailable desc = last connection error: connection error: desc = \"transport: Error while dialing unable to write connection nonce: tls: first record does not look like a TLS handshake\"",
    "error_fields": {},
    "id": "e_gopa2oZCDT",
    "version": "v0.1",
    "op": "worker.(Worker).sendWorkerStatus",
    "info": {
      "msg": "error making status request to controller"
    }
  },
  "datacontentype": "application/cloudevents",
  "time": "2022-05-26T14:21:44.765060431Z"
}

Any thoughts/advice would be much appreciated. Thank you so much.

Hi there Everyone,

Once I had the correct TLS certificates generates (i.e. with correct domains, certs matching host names for internal DNS and public DNS, ensuring CN, Subject Alternate domain etc.), things started working - yaaay :slight_smile: !

My worker now finally connects to the controller:

{"id":"lOOjc0AoYH","source":"https://hashicorp.com/boundary/boundary-worker.nprod.corp.internal","specversion":"1.0","type":"system","data":{"version":"v0.1","op":"worker.(Worker).createClientConn","data":{"address":"boundary-controller.acme.corp:9201","msg":"connected to controller"}},"datacontentype":"application/cloudevents","time":"2022-05-31T13:26:40.001761474Z"}

Thanks everyone.