I have consul cluster running in kubernetes with following configuration
manifest
apiVersion: v1
kind: ConfigMap
metadata:
name: consul
namespace: kube-public
data:
config.json: |
{
"log_level": "INFO",
"bind_addr": "0.0.0.0",
"client_addr": "0.0.0.0",
"disable_host_node_id": true,
"data_dir": "/consul/data",
"datacenter": "dev",
"domain": "cluster.local",
"ports": {
"https": 8443
},
"server": true,
"bootstrap_expect": 3,
"retry_interval": "30s",
"telemetry": {
"prometheus_retention_time": "5m"
},
"ui": true
}
---
kind: StatefulSet
apiVersion: apps/v1
metadata:
name: consul
namespace: kube-public
spec:
serviceName: consul
replicas: 3
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: consul
template:
metadata:
labels:
app: consul
spec:
securityContext:
fsGroup: 1000
containers:
- name: consul
image: consul:1.9
imagePullPolicy: Always
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: GOSSIP_ENCRYPTION_KEY
valueFrom:
secretKeyRef:
name: consul
key: gossip-encryption-key
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
args:
- "agent"
- "-advertise=$(POD_IP)"
- "-retry-join=consul.$(NAMESPACE).svc.cluster.local"
# - "-retry-join=consul-0.consul.$(NAMESPACE).svc.cluster.local"
# - "-retry-join=consul-1.consul.$(NAMESPACE).svc.cluster.local"
# - "-retry-join=consul-2.consul.$(NAMESPACE).svc.cluster.local"
- "-config-file=/etc/consul/config/config.json"
- "-encrypt=$(GOSSIP_ENCRYPTION_KEY)"
volumeMounts:
- name: data
mountPath: /consul/data
- name: config
mountPath: /etc/consul/config
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- consul leave
ports:
- containerPort: 8500
name: ui
- containerPort: 8400
name: alt
- containerPort: 53
name: udp
- containerPort: 8443
name: https
- containerPort: 8080
name: http
- containerPort: 8301
name: serflan
- containerPort: 8302
name: serfwan
- containerPort: 8600
name: consuldns
- containerPort: 8300
name: server
volumes:
- name: config
configMap:
name: consul
volumeClaimTemplates:
- metadata:
name: data
labels:
app: consul
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: aws-gp2
resources:
requests:
storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
name: consul
namespace: kube-public
labels:
name: consul
spec:
clusterIP: None
ports:
- name: http
port: 8500
targetPort: 8500
- name: https
port: 8443
targetPort: 8443
- name: rpc
port: 8400
targetPort: 8400
- name: serflan-tcp
protocol: "TCP"
port: 8301
targetPort: 8301
- name: serflan-udp
protocol: "UDP"
port: 8301
targetPort: 8301
- name: serfwan-tcp
protocol: "TCP"
port: 8302
targetPort: 8302
- name: serfwan-udp
protocol: "UDP"
port: 8302
targetPort: 8302
- name: server
port: 8300
targetPort: 8300
- name: consuldns
port: 8600
targetPort: 8600
selector:
app: consul
---
Vault HA is running consul-agent as sidecar with following configuration
manifest
kind: ConfigMap
apiVersion: v1
metadata:
name: vault-config
namespace: dev-backend
labels:
app: vault
data:
config.json: |
{
"listener": {
"tcp":{
"address": "0.0.0.0:8200",
"tls_disable": "true"
}
},
"storage": {
"consul": {
"address": "consul.kube-public.svc.cluster.local:8500",
"path": "dev-vault/",
"disable_registration": "true",
"ha_enabled": "true"
}
},
"max_lease_ttl": "720h",
"default_lease_ttl": "336h",
"ui": true
}
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: vault
namespace: dev-backend
labels:
app: vault
spec:
replicas: 1
selector:
matchLabels:
app: vault
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: vault
spec:
containers:
- name: vault
image: vault:1.6.1
imagePullPolicy: Always
command: ["vault", "server", "-config", "/vault/config/config.json"]
securityContext:
capabilities:
add:
- IPC_LOCK
env:
- name: VAULT_ADDR
value: 'http://127.0.0.1:8200'
volumeMounts:
- name: vault-config
mountPath: /vault/config/config.json
subPath: config.json
ports:
- name: vault
containerPort: 8200
- name: consul-agent
image: consul:1.9
imagePullPolicy: Always
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: GOSSIP_ENCRYPTION_KEY # Required to connect to consul cluster
valueFrom:
secretKeyRef:
name: consul
key: gossip-encryption-key
args:
- "agent"
- "-retry-join=consul.kube-public.svc.cluster.local"
- "-encrypt=$(GOSSIP_ENCRYPTION_KEY)"
- "-domain=cluster.local"
- "-datacenter=dev"
- "-disable-host-node-id"
- "-node=vault-1"
volumes:
- name: vault-config
configMap:
name: vault-config
items:
- key: config.json
path: config.json
---
apiVersion: v1
kind: Service
metadata:
name: vault
namespace: dev-backend
labels:
app: vault
spec:
ports:
- name: vault
port: 8200
targetPort: 8200
selector:
app: vault
---
When I performed upgrade on vault
, pod restarted. Then it tried to connect to consul cluster, where I noticed following error message
[ERROR] agent.client.memberlist.lan: memberlist: Conflicting address for vault-1. Mine: 10.2.17.93:8301 Theirs: 10.2.18.5:8301 Old state: 0
[ERROR] agent.client.serf.lan: serf: Node name conflicts with another node at 10.2.18.5:8301. Names must be unique! (Resolution enabled: true)
...
[ERROR] agent.client: RPC failed to server: method=Catalog.Register server=10.2.18.46:8300 error="rpc error making call: rpc error making call: failed inserting node: Error while renaming Node ID: "18afec1a-ae83-1dde-1271-a77ebd26dbd5": Node name vault-1 is reserved by node d1eeb692-ea90-9eb3-a7b5-34082a297dfc with name vault-1 (10.2.18.5)"
[WARN] agent: Syncing node info failed.: error="rpc error making call: rpc error making call: failed inserting node: Error while renaming Node ID: "18afec1a-ae83-1dde-1271-a77ebd26dbd5": Node name vault-1 is reserved by node d1eeb692-ea90-9eb3-a7b5-34082a297dfc with name vault-1 (10.2.18.5)"
[ERROR] agent.anti_entropy: failed to sync remote state: error="rpc error making call: rpc error making call: failed inserting node: Error while renaming Node ID: "18afec1a-ae83-1dde-1271-a77ebd26dbd5": Node name vault-1 is reserved by node d1eeb692-ea90-9eb3-a7b5-34082a297dfc with name vault-1 (10.2.18.5)"
and this consul-agent (vault-1
) failed to show-up under Nodes on consul UI.
I have set -disable-host-node-id
for consul-agent. What else I am missing here ?