Permission denied: Missing service:write on vault

I’m not really sure where I went wrong. I’m attempting to stand up a Vault environment with HA according to the recommended 2 Availability Zone model, with Consul. This is my first time using Hashicorp products. Most of the troubleshooting I’ve seen online seems to be from the approach of using these products in a containerized environment, but mine are currently running in AWS as traditional servers.

I have managed to install Consul and Vault on all the respective servers, and I’m at least getting somewhere, judging from my primary Vault server - I can at least get to the UI. But the backend is a mess and I don’t know where I went wrong.

The Vault service is running properly on my primary Vault server (which I’m using as a test before fixing the things I’ve figured out are wrong on the other servers). The consul agent service fails with the error in the subject. Syslog is just spitting out the following two messages over and over, and the Vault is (obviously) locked:

Nov 13 06:25:07 vault-alpha vault[2791]: 2020-11-13T06:25:07.230Z [WARN]  service_registration.consul: check unable to talk with Consul backend: error="Unexpected response code: 500 (Unknown check "vault:172.31.119.46:8200:vault-sealed-check")"

Nov 13 06:25:07 vault-alpha vault[2791]: 2020-11-13T06:25:07.779Z [WARN]  service_registration.consul: reconcile unable to talk with Consul backend: error="service registration failed: Unexpected response code: 403 (Permission denied)"

Can someone help me figure out how to right this ship? I would be very grateful.

Is there a reason you’re using Consul vs Integrated Storage? IS is much simplier.
First glance looks like Vault can’t talk to the Consul agent client…

For anyone to help troubleshooting, it’d be good to post:
Your config files
Output of vault status
Output of consul members
Output of consul operator raft list-peers

Also, the learn.hashicorp.com site has tutorials for non-containerized deployments. Unsure which you’re following…

Hi there, thanks for stepping in.
Using Consul because that was the way it was written in the guide. I’m using this model for my deployment.

vault.hcl on my Vault server:

listener "tcp" {
  address       = "127.0.0.1:8200"
  tls_cert_file = "/etc/vault.d/vault_appgate_self.pem"
  tls_key_file  = "/etc/vault.d/vault_appgate_self_key.pem"
}

listener "tcp" {
  address       = "172.31.119.46:8200"
  tls_cert_file = "/etc/vault.d/vault_appgate_self.pem"
  tls_key_file  = "/etc/vault.d/vault_appgate_self_key.pem"
}

# Advertise the non-loopback interface
api_addr = "https://172.31.119.46:8200"
cluster_addr = "https://172.31.119.46:8201"

storage "consul" {
  address = "172.31.119.46:8500"
  path = "vault/"
  token = "-redacted-"

consul.hcl on the Vault server (non-superfluous lines):

data_dir = "/opt/consul"
client_addr = "0.0.0.0"
ui = true

datacenter = "US-E1"
data_dir = "/opt/consul"
encrypt = "-redacted-"
ca_file = "/etc/consul.d/consul-agent-ca.pem"
cert_file = "/etc/consul.d/US-E1-client-consul-0.pem"
key_file = "/etc/consul.d/US-E1-client-consul-0-key.pem"
verify_incoming = true
verify_outgoing = true
verify_server_hostname = true

acl = {
  enabled = true
  default_policy = "allow"
  enable_token_persistence = true
}

retry_join = ["172.31.105.254"]

Output of vault status:

Error checking seal status: Get "https://127.0.0.1:8200/v1/sys/seal-status": x509: cannot validate certificate for 127.0.0.1 because it doesn't contain any IP SANs

Output of consul members:

Node                 Address              Status  Type    Build  Protocol  DC     Segment
consul-alpha-leader  172.31.105.254:8301  alive   server  1.8.4  2         us-e1  <all>
consul-bravo         172.31.119.198:8301  alive   server  1.8.4  2         us-e1  <all>
consul-charlie       172.31.119.98:8301   alive   server  1.8.4  2         us-e1  <all>
consul-delta         172.31.101.101:8301  alive   server  1.8.4  2         us-e1  <all>
consul-echo          172.31.107.165:8301  alive   server  1.8.4  2         us-e1  <all>
vault-alpha          172.31.119.46:8301   alive   client  1.8.5  2         us-e1  <default>
vault-bravo          172.31.126.217:8301  alive   client  1.8.5  2         us-e1  <default>
vault-charlie        172.31.105.114:8301  alive   client  1.8.5  2         us-e1  <default>

Output of consul operator raft list-peers

Node                 ID                                    Address              State     Voter  RaftProtocol
consul-bravo         c5a0fa39-0972-e1df-0cc7-88c43ab7f6f5  172.31.119.198:8300  follower  true   3
consul-charlie       1f5ca595-6ec7-d720-1128-5b77994630db  172.31.119.98:8300   follower  true   3
consul-delta         e6df3951-d0fa-87f9-8501-2446b3813ebf  172.31.101.101:8300  follower  true   3
consul-echo          6e21bd35-ddd8-f159-bb4e-b83a25f9e6ec  172.31.107.165:8300  follower  true   3
consul-alpha-leader  bc69ceb5-997d-db1e-10a8-633b6a14c499  172.31.105.254:8300  leader    true   3

For reference, I’ve been using this guide (which references this consul guide as a prerequisite)

That vault status command is handy, and that givees me something new to chase down. I did a self-signed cert based on the DNS name I intend to use for this service, but right now it’s just routing via IP addresses. I’ll try to fix that and see what happens.

Update:
vault status now shows the following message after I (temporarily) set an IP SAN cert in place (seems to have cleared the original error, but this may still be related):

Error checking seal status: Get "https://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused

From your first post I would have said it’s the wrong acl policy attached to the token. But your default_policy is to allow everything. So that shouldn’t be the blocking factor.

You are using ips instead of hostnames… Maybe this is wrong:

verify_server_hostname = true

But I don’t really know if your error message could point to this.

Your api_address points to the not-loopback-address. Did you try using the status against this address?

VAULT_ADDR="https://172.31.119.46:8200" vault status

Thanks everyone for helping out.
The ‘connection refused’ error was because I needed to chmod my cert key file properly. Once I did that, I got another error stating

x509: certificate signed by unknown authority

I played around with this for a while, and gave up trying to use a self-signed cert. I was planning on using a DNS cert anyway (but hadn’t during the setup phase), and once I installed it, I got rid of that error.

Once I finally got the vault.hcl properly reconfigured for the DNS name and redid all my EXPORT declarations, it looks like it’s working. Both services are currently running without errors. I can get to the UI, which shows the vault as locked, but I assume it’s because I haven’t done the initial setup yet. I think we’re good on this one for now. Thanks again!

1 Like