Vault returns HTTP 403 for metrics API

I’m running a Vault 1.13.1 in HA mode (3 replicas) with internal Raft storage in a Kubernetes cluster. Vault is installed via vault-helm as a sub chart with some extensions (mainly snapshot / backup related resources) and managed by a GitOps Controller (Argo CD).

Prometheus Operator in the form of a kube-prometheus-stack is present in the cluster. Vault is configured to allow unauthenticated access to its metrics API. However, Prometheus shows the target vault-active as ‘down’:

server returned HTTP status 403 Forbidden

The relevant portions of the Vault Chart values look like this:

global:
  tlsDisable: true
  serverTelemetry:
    prometheusOperator: true

server:
  ha:
    enabled: true
    replicas: 3
    config: >
      disable_mlock: true

      ui = true

      listener "tcp" {
        tls_disable = true
        address = "[::]:8200"
        cluster_address = "[::]:8201"
        telemetry {
          unauthenticated_metrics_access = true
        }
      }

      telemetry {
        prometheus_retention_time = "24h"
        disable_hostname = true
      }

serverTelemetry:
  serviceMonitor:
    enabled: true

Does anyone have a clue why Vault reports HTTP 403 when accessing /v1/sys/metrics?

Any hints appreciated, thanks in advance. :slight_smile:

Well, if Vault was configured with unauthenticated_metrics_access = true, then it wouldn’t be responding to /v1/sys/metrics with a 403.

So, it’s time to start challenging assumptions:

You can use the sys/config/state/sanitized endpoint to have Vault report a portion of its actually running configuration. Do that, and verify unauthenticated_metrics_access really is turned on.

Don’t assume that Prometheus is necessarily hitting the right URL - make test requests using curl, and confirm that way whether unauthenticated metrics access is working.

Excellent, thank you for your thoughts. It seems unauthenticated_metrics_access is actually not set at all, althouth specified in the Listener configuration above.

/sys/config/state/sanitized shows the following:

{
    ...
    "data": {
        "api_addr": "",
        "cache_size": 0,
        ...
        "disable_cache": false,
        "disable_clustering": false,
        "disable_indexing": false,
        "disable_mlock": true,
        "disable_performance_standby": false,
        "disable_printable_check": false,
        "disable_sealwrap": false,
        "disable_sentinel_trace": false,
        "enable_response_header_hostname": false,
        "enable_response_header_raft_node_id": false,
        "enable_ui": true,
        ...
        "listeners": [
            {
                "config": {
                    "address": "[::]:8200",
                    "cluster_address": "[::]:8201",
                    "tls_disable": 1
                },
                "type": "tcp"
            }
        ],
        "log_format": "json",
        "log_level": "info",
        ...
        "seals": [
            ...
        ],
        "service_registration": {
            "type": "kubernetes"
        },
        "storage": {
            "cluster_addr": "",
            "disable_clustering": false,
            "redirect_addr": "",
            "type": "raft"
        }
    },
    ....
    "auth": null
}

unauthenticated_metrics_access seems to have magically vanished from the config - but why?

The ConfigMap created from the Helm Chart reads as follows:

  extraconfig-from-values.hcl: |-
    disable_mlock = true
    ui = true

    listener "tcp" {
      tls_disable = 1
      address = "[::]:8200"
      cluster_address = "[::]:8201"
      # Enable unauthenticated metrics access (necessary for Prometheus Operator)
      #telemetry {
      #  unauthenticated_metrics_access = "true"
      #}
    }

    storage "raft" {
      path = "/vault/data"
    }

    service_registration "kubernetes" {}

The relevant portion of the config is commented… - ?

You’ve only shared redacted portions of your Helm values so far. I guess they’re not set up to feed the config you think you have to Vault, but I can’t tell what would be wrong from the limited portions in your earlier post.

Ok, I guess I’ve solved it. I just had a closer look at the Helm Chart - the server config was at the wrong level. It should live at server.ha.raft.config rather than server.ha.config, or otherwise it wouldn’t be applied properly, depending on the rest of the Helm configuration. Prometheus reports the target as up now.

Anyway, thank you so much for that invaluable hint at /sys/config/state/sanitized that got me in the right direction. :slight_smile: