I am running boundary v0.11.0 on k8s with 3 controllers and 3 workers. When I start boundary controller and worker everything works fine but after sometime workers become unresponsive.
controller config
disable_mlock = true
controller {
name = "boundary-controller"
description = "Boundary Controller"
public_cluster_addr = "boundary-controller-hl.boundary.svc.cluster.local"
database {
url = "postgres url"
}
}
listener "tcp" {
purpose = "api"
address = "0.0.0.0"
tls_disable = true
public_addr = "boundary-controller-hl.boundary.svc.cluster.local"
}
listener "tcp" {
address = "0.0.0.0"
purpose = "cluster"
tls_disable = true
public_addr = "boundary-controller-hl.boundary.svc.cluster.local"
}
kms "aead" {
purpose = "root"
aead_type = "aes-gcm"
key = "key"
key_id = "global_root"
public_addr = "boundary-controller-hl.boundary.svc.cluster.local"
}
kms "aead" {
purpose = "worker-auth"
aead_type = "aes-gcm"
key = "key"
key_id = "global_worker-auth"
}
kms "aead" {
purpose = "recovery"
aead_type = "aes-gcm"
key = "key="
key_id = "global_recovery"
}
worker config
disable_mlock = true
listener "tcp" {
purpose = "proxy"
address = "0.0.0.0"
tls_disable = true
}
worker {
name = "env://HOSTNAME"
description = "Boundar k8s worker"
initial_upstreams = ["boundary-controller-hl.boundary.svc.cluster.local"]
public_addr = "public DNS"
tags {
region = ["k8s"]
}
}
# Worker authorization KMS
# Use a production KMS such as AWS KMS for production installs
# This key is the same key used in the worker configuration
kms "aead" {
purpose = "worker-auth"
aead_type = "aes-gcm"
key = "key"
key_id = "global_worker-auth"
}
It generally works for a day and then stops working until i restart the pod.
I see these error logs before worker stops working
{“id”:“3HyVunohBm”,“source”:“https://hashicorp.com/boundary/boundary-worker-8qdn8/worker",“specversion”:“1.0”,“type”:“error”,“data”:{“error”:"failed to read protobuf message: failed to get reader: failed to read frame header: EOF”,“error_fields”:{},“id”:“e_I9lbsVu4LP”,“version”:“v0.1”,“op”:“worker.(Worker).handleProxy”,“info”:{“msg”:“error reading handshake from client”}},“datacontentype”:“application/cloudevents”,“time”:“2022-11-07T15:37:06.905432729Z”}
{“id”:“fl6DyubRiA”,“source”:“https://hashicorp.com/boundary/boundary-worker-8qdn8/worker",“specversion”:“1.0”,“type”:“error”,“data”:{“error”:"failed to close WebSocket: failed to write control frame opClose: WebSocket closed: failed to read frame header: EOF”,“error_fields”:{},“id”:“e_8WduHdtyWv”,“version”:“v0.1”,“op”:“worker.(Worker).handleProxy”,“info”:{“msg”:“error closing client connection”}},“datacontentype”:“application/cloudevents”,“time”:“2022-11-07T15:37:06.906030265Z”}
One more thing that i saw is, when it is working if i do
curl -v 0.0.0.0:9202 i immediately get
* Trying 0.0.0.0:9202...
* Connected to 0.0.0.0 (127.0.0.1) port 9202 (#0)
> GET / HTTP/1.1
> Host: 0.0.0.0:9202
> User-Agent: curl/7.79.1
> Accept: */*
>
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server
but when its not working the server does not reply anything, it gets stuck on
* Trying 127.0.0.1:9202...
* Connected to 127.0.0.1 (127.0.0.1) port 9202 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:9202
> User-Agent: curl/7.79.1
> Accept: */*
>
What can be the issue?
Thanks in advance.