After configuring the self sigh TLS following this git issue
After installing NOT in standalone but rather using storage “raft” + ha as in this beginner tutorial
I’m getting in each pod installed those errors:
Failed to initiate raft retry join, "failed to create tls config to communicate with leader node \"http://vault-0.vault-internal:8200\": tls: private key does not match public key"
==> Vault server configuration:
Api Address: https://10.101.0.160:8200
Cgo: disabled
Cluster Address: https://vault-1.vault-internal:8201
Go Version: go1.14.4
Listener 1: tcp (addr: "[::]:8200", cluster address: "[::]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level: info
Mlock: supported: true, enabled: false
Recovery Mode: false
Storage: raft (HA available)
Version: Vault v1.5.0+ent
Failed to initiate raft retry join, "failed to create tls config to communicate with leader node \"http://vault-0.vault-internal:8200\": tls: private key does not match public key"
i really would double check the certificates. This looks more like a certificate issue to me but im not sure.
I also saw that in your extraEnvironmentVars you maybe reference the wrong path. You reference the path vault/userconfig/tls-ca/vault.crt for your VAULT_CACERT environment variable while for Raft storage you use the path vault/userconfig/vault-server-tls/vault.crt.
Edit: you also supply TLS configuration while your listener disables TLS. You should double check this also.
Error initializing listener of type tcp: error loading TLS cert: open : no such file or directory
i dont understand … i only created the secret in k8s and didnt now upload and files ,
from where do it takes them ?
what is this path /vault/userconfig/vault-server-tls/ ?
if i do :
kubectl get secret vault-server-tls -n vault-foo -o yaml
im getting :
# Vault Helm Chart Value Overrides
global:
enabled: true
tlsDisable: false
injector:
enabled: true
# Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
image:
repository: "hashicorp/vault-k8s"
tag: "latest"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
server:
# Use the Enterprise Image
image:
repository: "hashicorp/vault-enterprise"
tag: "1.5.0_ent"
# These Resource Limits are in line with node requirements in the
# Vault Reference Architecture for a Small Cluster
resources:
requests:
memory: 8Gi
cpu: 2000m
limits:
memory: 16Gi
cpu: 2000m
# For HA configuration and because we need to manually init the vault,
# we need to define custom readiness/liveness Probe settings
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
# extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
# used to include variables required for auto-unseal.
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-server-tls/vault.crt
# extraVolumes is a list of extra volumes to mount. These will be exposed
# to Vault in the path .
#extraVolumes:
# - type: secret
# name: tls-server
# - type: secret
# name: tls-ca
# - type: secret
# name: kms-creds
extraVolumes:
- type: secret
name: vault-server-tls
# This configures the Vault Statefulset to create a PVC for audit logs.
# See https://www.vaultproject.io/docs/audit/index.html to know more
auditStorage:
enabled: true
standalone:
enabled: false
# Run Vault in "HA" mode.
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
#tls_disable = 1
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "http://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "http://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_cert_file = "/vault/userconfig/vault-server-tlsr/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "http://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
}
service_registration "kubernetes" {}
# Vault UI
ui:
enabled: true
serviceType: "LoadBalancer"
serviceNodePort: null
externalPort: 8200
# For Added Security, edit the below
#loadBalancerSourceRanges:
# - < Your IP RANGE Ex. 10.0.0.0/16 >
# - < YOUR SINGLE IP Ex. 1.78.23.3/32 >
the secret needs to be created before rolling out the Vault Helm Chart. It also needs to be created in the same Kubernetes namespace as you are rolling out the helm chart.
In your case the secret is in Kubernetes namespace vault-foo so you need to ensure you also deploy Vault in that namespace.
Check the correct mounting of the secret as a volume:
kubectl exec -i -t vault-0 -n vault-foo – ls /vault/userconfig/vault-server-tls
The secret is created before the helm runnig .
once created it is pressistend in the k8s
in both kubectl command im getting ;
$ kubectl exec -it vault-0 -n vault-foo – /bin/sh
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("vault")
$ kubectl exec -i -t vault-0 -n vault-foo – ls /vault/userconfig/vault-server-tls
Unable to use a TTY - input is not a terminal or the right kind of file
error: unable to upgrade connection: container not found ("vault")
looks like the dockers not even started.
but i do see them :
$ kubectl get pods -n vault-foo
NAME READY STATUS RESTARTS AGE
vault-0 0/1 CrashLoopBackOff 5 3m36s
vault-1 0/1 CrashLoopBackOff 5 3m36s
vault-2 0/1 CrashLoopBackOff 5 3m36s
vault-agent-injector-d54bdc675-79ll7 1/1 Running 0 3m36s
Yes these commands can only succeed if the containers are running. You need to be able to troubleshoot why your containers are not running. Only then are you able to troubleshoot vault.
It easy to find out in Kubernetes why a container doesn’t start. Most of the time the events are sufficient.
kubectl get events -n vault-foo
I think you need to ramp up your general Kubernetes knowledge a bit before you will be able to successfully troubleshoot an application like vault on Kubernetes.
Thanks!!! i learn as i do … this is the best way .
do you see here something ? i this is as verbose as it can ?
LAST SEEN TYPE REASON OBJECT MESSAGE
2m44s Normal WaitForFirstConsumer persistentvolumeclaim/audit-vault-3 waiting for first consumer to be created before binding
2m44s Normal WaitForFirstConsumer persistentvolumeclaim/audit-vault-4 waiting for first consumer to be created before binding
2m44s Normal WaitForFirstConsumer persistentvolumeclaim/data-vault-3 waiting for first consumer to be created before binding
2m44s Normal WaitForFirstConsumer persistentvolumeclaim/data-vault-4 waiting for first consumer to be created before binding
50m Normal Scheduled pod/vault-0 Successfully assigned vault-foo/vault-0 to ip-10-101-2-224.ec2.internal
50m Normal SuccessfulAttachVolume pod/vault-0 AttachVolume.Attach succeeded for volume "pvc-ab03859c-4a23-41b1-b7ee-1b0638c5b322"
50m Normal SuccessfulAttachVolume pod/vault-0 AttachVolume.Attach succeeded for volume "pvc-edb15fcd-445f-40e8-8f2a-470280419a03"
50m Normal Pulling pod/vault-0 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
50m Normal Pulled pod/vault-0 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
48m Normal Created pod/vault-0 Created container vault
49m Normal Started pod/vault-0 Started container vault
48m Normal Pulled pod/vault-0 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
16s Warning BackOff pod/vault-0 Back-off restarting failed container
50m Normal Scheduled pod/vault-1 Successfully assigned vault-foo/vault-1 to ip-10-101-0-51.ec2.internal
50m Normal SuccessfulAttachVolume pod/vault-1 AttachVolume.Attach succeeded for volume "pvc-acfc7e26-3616-4075-ab79-0c3f7b0f6470"
50m Normal SuccessfulAttachVolume pod/vault-1 AttachVolume.Attach succeeded for volume "pvc-19d03d48-1de2-41f8-aadf-02d0a9f4bfbd"
48m Normal Pulled pod/vault-1 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
48m Normal Created pod/vault-1 Created container vault
49m Normal Started pod/vault-1 Started container vault
12s Warning BackOff pod/vault-1 Back-off restarting failed container
50m Normal Scheduled pod/vault-2 Successfully assigned vault-foo/vault-2 to ip-10-101-0-96.ec2.internal
50m Normal SuccessfulAttachVolume pod/vault-2 AttachVolume.Attach succeeded for volume "pvc-fb91141d-ebd9-4767-b122-da8c98349cba"
50m Normal SuccessfulAttachVolume pod/vault-2 AttachVolume.Attach succeeded for volume "pvc-95effe76-6e01-49ad-9bec-14e091e1a334"
50m Normal Pulling pod/vault-2 Pulling image "hashicorp/vault-enterprise:1.5.0_ent"
50m Normal Pulled pod/vault-2 Successfully pulled image "hashicorp/vault-enterprise:1.5.0_ent"
48m Normal Created pod/vault-2 Created container vault
49m Normal Started pod/vault-2 Started container vault
48m Normal Pulled pod/vault-2 Container image "hashicorp/vault-enterprise:1.5.0_ent" already present on machine
19s Warning BackOff pod/vault-2 Back-off restarting failed container
52m Normal Killing pod/vault-agent-injector-6657477c46-lh8hc Stopping container sidecar-injector
52m Warning Unhealthy pod/vault-agent-injector-6657477c46-lh8hc Readiness probe failed: Get https://10.101.2.134:8080/health/ready: dial tcp 10.101.2.134:8080: connect: connection refused
50m Normal Scheduled pod/vault-agent-injector-d54bdc675-79ll7 Successfully assigned vault-foo/vault-agent-injector-d54bdc675-79ll7 to ip-10-101-2-224.ec2.internal
50m Normal Pulled pod/vault-agent-injector-d54bdc675-79ll7 Container image "hashicorp/vault-k8s:latest" already present on machine
50m Normal Created pod/vault-agent-injector-d54bdc675-79ll7 Created container sidecar-injector
50m Normal Started pod/vault-agent-injector-d54bdc675-79ll7 Started container sidecar-injector
50m Normal SuccessfulCreate replicaset/vault-agent-injector-d54bdc675 Created pod: vault-agent-injector-d54bdc675-79ll7
50m Normal ScalingReplicaSet deployment/vault-agent-injector Scaled up replica set vault-agent-injector-d54bdc675 to 1
59m Normal UpdatedLoadBalancer service/vault-ui Updated load balancer with new hosts
52m Normal DeletingLoadBalancer service/vault-ui Deleting load balancer
51m Warning PortNotAllocated service/vault-ui Port 31694 is not allocated; repairing
51m Warning ClusterIPNotAllocated service/vault-ui Cluster IP 172.20.125.243 is not allocated; repairing
51m Normal DeletedLoadBalancer service/vault-ui Deleted load balancer
50m Normal EnsuringLoadBalancer service/vault-ui Ensuring load balancer
50m Normal EnsuredLoadBalancer service/vault-ui Ensured load balancer
55m Normal NoPods poddisruptionbudget/vault No matching pods found
50m Normal NoPods poddisruptionbudget/vault No matching pods found
50m Normal SuccessfulCreate statefulset/vault create Pod vault-0 in StatefulSet vault successful
50m Normal SuccessfulCreate statefulset/vault create Pod vault-1 in StatefulSet vault successful
50m Normal SuccessfulCreate statefulset/vault create Pod vault-2 in StatefulSet vault successful
Yes, I think the container crash loop is not caused by Kubernetes. Which means that the application/vault probably terminates the container.
You can check this out by looking at the pod logs:
kubectl logs vault-0 -n vault-foo
You maybe want to take a look at octant. Octant provides a UI for inspecting Kubernetes cluster which makes it somewhat easy to inspect applications on Kubernetes
Okay so Vault crashes because it cannot configure its listener because he cannot find the certificate. This can either be because the secret is not mounted as a volume to the pod or other reasons which I cannot think of.
You can double-check whether your secret is mounted properly as a volume to the container.
This is how my volumes for the vault pods look like (dev-environment, tls enabled):
If the secret is mounted correctly you need to investigate why vault cannot find the files you specified. To do that I would recommend to remove the TLS configuration (tcp listener and raft) to ensure the container can start and then inspect the files on the filesystem.
Thanks you for the idea i have now some progress but sadly new errors in the vault log
That means the container is up but in each pod I’m getting this errors :
Vault is sealed"
2021-01-22T14:27:36.680Z [ERROR] core: failed to retry join raft cluster: retry=2s
2021-01-22T14:27:38.680Z [INFO] core: security barrier not initialized
2021-01-22T14:27:38.680Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://vault-0.vault-internal:8200
2021-01-22T14:27:38.684Z [INFO] core: join attempt failed: error="error during raft bootstrap init call: Error making API request.
URL: PUT http://vault-0.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge
Code: 503. Errors:
* Vault is sealed"
2021-01-22T14:27:38.684Z [INFO] core: security barrier not initialized
2021-01-22T14:27:38.684Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://vault-1.vault-internal:8200
2021-01-22T14:27:38.685Z [INFO] core: join attempt failed: error="error during raft bootstrap init call: Error making API request.
URL: PUT http://vault-1.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge
Code: 503. Errors:
* Vault is sealed"
2021-01-22T14:27:38.685Z [INFO] core: security barrier not initialized
2021-01-22T14:27:38.685Z [INFO] core: attempting to join possible raft leader node: leader_addr=http://vault-2.vault-internal:8200
2021-01-22T14:27:38.688Z [INFO] core: join attempt failed: error="error during raft bootstrap init call: Error making API request.
URL: PUT http://vault-2.vault-internal:8200/v1/sys/storage/raft/bootstrap/challenge
remove all TLS from configuration template like this: