I am attempting to automate the deploy of vault in k8s on aws using eks. Terraform is used to provision the k8s cluster, storage and also a KMS key to be used for unseal. Once the workers are deployed we stand up various services using helm.
The vault nodes (3 of them) are not booting up completely due to a couple errors. Initially we are getting an access denied accessing the KMS key. It is indicating that an assumed role does not have access to the describe key. I have no idea where this assumed roles came from, it does not exist on our systems and is very different from the service account defined in the helm manifests. Here is the error message:
Error parsing Seal configuration: error fetching AWS KMS wrapping key information: AccessDeniedException: User: arn:aws:sts::##############:assumed-role/eks-imply.<my account>/i-0a5ec9371c6d6eee0 is not authorized to perform: kms:DescribeKey on resource: arn:aws:kms:us-west-2:############:key/16bad6d5-19fd-4dc3-a4dd-4cfc6b9f63eb because no identity-based policy allows the kms:DescribeKey action
status code: 400, request id: c5bb43ff-f527-4a84-a593-35f83d0d7d8
We are also experiencing errors rejoining the cluster when a single node is deleted (even though it never comes up all the way). It is not finding the tls config in order to communicate with the leader node. These files are not created since none of the nodes come up. We are also unable to exec into the pods for more detailed troubleshooting.
Failed to initiate raft retry join, "failed to create tls config to communicate with leader node (retry_join index: 0): failed to read CA file: open /vault/userconfig/tls-ca/ca.crt: no such file or directory"2022-04-14T18:44:15.745Z [WARN] storage.raft.fsm: raft FSM db file has wider permissions than needed: needed=-rw------- existing=-rw-rw----
Any recommendations, suggestions or questions are welcome. Thanks in advance.