Hi
I am quite know to vault and I looked through the docs several hours, but can’t get this working as exptected.
I would like to run vault as a workload in nomad with a consul backend. (I already use nomad and consul). So I created two identical job files for vault-a and vault-b (see below).
The nomad job service vault-a ( and vault-b) gets registered in consul by nomad and traefik is used to HTTPs offloading and proxing. That works as expected, vault-a and vault-b is reachable on https://vault-a.apps.example.com resp. https://vault-b.apps.example.com
variable "datacenters" {
type = list(string)
default = ["dc1"]
}
variable "namespace" {
type = string
default = "default"
}
variable "host_network" {
type = string
default = ""
}
job "vault-a" {
datacenters = var.datacenters
namespace = var.namespace
type = "service"
group "vault" {
count = 1
network {
mode = "host"
port "tcp" {
host_network = var.host_network
static = 25081
}
port "cluster" {
host_network = var.host_network
static = 25082
}
}
task "vault" {
template {
change_mode = "restart"
destination = "local/config.hcl"
data = <<EOH
ui = true
cluster_name = "my-cluster"
storage "consul" {
address = "172.17.0.1:8500"
path = "vault/"
}
service_registration "consul" {
address = "172.17.0.1:8500"
}
listener "tcp" {
address = "[::]:{{ env "NOMAD_PORT_tcp" }}"
cluster_address = "[::]:{{ env "NOMAD_PORT_cluster" }}"
tls_disable = 1
}
api_addr = "https://vault-a.apps.example.com:443"
EOH
}
driver = "docker"
config {
image = "vault:1.9.2"
# cap_add = ["IPC_LOCK"]
privileged = true
volumes = [
"local/config.hcl:/etc/vault/config.hcl",
]
args = [
"server",
"-config", "/etc/vault",
]
ports = [
"tcp",
"cluster",
]
}
service {
name = "vault-a"
tags = [
"traefik.enable=true",
"traefik.http.routers.vault-a.rule=HostRegexp(`vault-a.{domain:.*}`)",
"traefik.http.routers.vault-a.middlewares=vault-a-https",
"traefik.http.middlewares.vault-a-https.redirectscheme.scheme=https",
"traefik.http.routers.vault-a-https.rule=HostRegexp(`vault-a.{domain:.*}`)",
"traefik.http.routers.vault-a-https.tls=true",
]
port = "tcp"
}
}
}
}
The issue I see is on the standby vault node: “This is a standby Vault node but can’t communicate with the active node via request forwarding. Sign in at the active node to use the Vault UI.” and I didn’t see what I am doing wrong. I even switched to static identical ports on both jobs. I guess the question is, how does vault find the other node? Because we already use consul, my guess would be consul and I see this in consul:
Logs:
==> Vault server configuration:
Api Address: https://vault-a.apps.example.com:443
Cgo: disabled
Cluster Address: https://vault-a.apps.example.com:444
Go Version: go1.17.5
Listener 1: tcp (addr: "[::]:25081", cluster address: "192.168.2.151:25082", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Log Level: info
Mlock: supported: true, enabled: true
Recovery Mode: false
Storage: consul (HA available)
Version: Vault v1.9.2
Version Sha: f4c6d873e2767c0d6853b5d9ffc77b0d297bfbdf
==> Vault server started! Log data will stream in below:
I noted the cluster address to lok different with port 444?
Could you give me a hint?
Thanks in advance