Nomad Exec Error: failed to exec into task: unexpected EOF

Hi Team,

We have encountered an issue after upgrading to Nomad v1.1.2 from v1.0.4 in doing an nomad allow exec when using a non admin ACL token

This is the error we get

▶ nomad alloc exec -e none $ALLOCATION_ID /bin/bash                                                      
failed to exec into task: unexpected EOF

▶ nomad -v 
Nomad v1.1.2 (60638a086ef9630e2a9ba1e237e8426192a44244)

This is when we use a non admin ACL token.

the policy of the non admin token is as follows

namespace "default" {
  policy = "read"
  capabilities = ["submit-job","dispatch-job","read-logs","alloc-exec"]
}

When we checked the logs we got this using Nomad v1.1.2
The logs in in reverse order


IP:53703 [09/Jul/2021:12:15:24.416] lb_01~ lb_01/<NOSRV> -1/-1/-1/-1/0 0 0 - - PR-- 1/1/0/0/0 0/0 "<BADREQ>"
IP:53703 [09/Jul/2021:12:15:24.416] lb_01~ lb_01/<NOSRV> -1/-1/-1/-1/0 0 0 - - PR-- 1/1/0/0/0 0/0 "<BADREQ>"
IP:53701 [09/Jul/2021:12:15:24.352] lb_01~ nomad_console_backend/nc-server-1 0/0/1/2/3 403 212 - - ---- 1/1/0/0/0 0/0 {Go-http-client/2.0} "GET https://nomad.example.com/v1/node/b86454c4-aa4c-79c3-92d7-3dce287ca758 HTTP/2.0"
IP:53700 [09/Jul/2021:12:15:24.266] lb_01~ nomad_console_backend/nc-server-3 0/0/1/4/5 200 3412 - - ---- 1/1/0/0/0 0/0 {Go-http-client/2.0} "GET https://nomad.example.com/v1/allocation/4ec06a2f-c2bc-da9b-44c3-7dc6b36b04ae?namespace=default HTTP/2.0"
IP:53699 [09/Jul/2021:12:15:24.200] lb_01~ nomad_console_backend/nc-server-2 0/0/1/4/5 200 1094 - - ---- 1/1/0/0/0 0/0 {Go-http-client/2.0} "GET https://nomad.example.com/v1/allocations?prefix=4ec06a2f-c2bc-da9b-44c3-7dc6b36b04ae HTTP/2.0"

Where as when we were using Nomad v1.0.4 we got these logs:

IP:53229 [09/Jul/2021:11:40:23.790] lb_01~ nomad_console_backend/nc-server-1 0/0/1/1/469692 101 434 - - ---- 1/1/0/0/0 0/0 {Go-http-client/1.1} "GET /v1/client/allocation/4ec06a2f-c2bc-da9b-44c3-7dc6b36b04ae/exec?command=%5B%22%2Fbin%2Fbash%22%5D&task=postgres_11&tty=true HTTP/1.1"
IP:53228 [09/Jul/2021:11:40:23.544] lb_01~ nomad_console_backend/nc-server-3 0/0/1/2/3 403 212 - - ---- 1/1/0/0/0 0/0 {Go-http-client/1.1} "GET /v1/node/b86454c4-aa4c-79c3-92d7-3dce287ca758 
HTTP/1.1"
IP:53227 [09/Jul/2021:11:40:23.295] lb_01~ nomad_console_backend/nc-server-2 0/0/1/5/6 200 3415 - - ---- 1/1/0/0/0 0/0 {Go-http-client/1.1} "GET /v1/allocation/4ec06a2f-c2bc-da9b-44c3-7dc6b36b04ae?namespace=default HTTP/1.1"
IP:53226 [09/Jul/2021:11:40:23.070] lb_01~ nomad_console_backend/nc-server-1 0/0/1/4/5 200 1094 - - ---- 1/1/0/0/0 0/0 {Go-http-client/1.1} "GET /v1/allocations?prefix=4ec06a2f-c2bc-da9b-44c3-7dc6b36b04ae HTTP/1.1"

I see two differences:

  • {Go-http-client/1.1} has been updated to {Go-http-client/2.0}
  • Maybe in earlier version 403 in GET /v1/node/* was being ignored, where as in latest version it is considered error and returns failed to exec into task: unexpected EOF

We use haproxy as load balancer for our nomad domain.

What could be the issue here ? Do we need to adjust our policy for ACL ?

I also checked, its not the servers being updated thats causing the issue,
Its the client side for example here our terminal from where we make exec, which if upgraded to v1.1.2 gives the above error.

@surajthakur, Do you receive that error if you run the exec command in a way that bypasses your load balancer? I did notice that the newer go-http-client is also using HTTP/2 for requests instead of HTTP/1.1 for the requests made from the CLI tool, which might be now be hitting a LB configuration issue?

Specifically, I am curious if your configuration rewrites the Origin header to allow the CORS handshake to succeed properly. The Configure NGINX Reverse Proxy for Nomad’s Web UI Learn tutorial shows an example using NGINX configuration, and the “Enable Websocket Connections” section covers the additional headers that have to be passed if you are not presently.

If your configuration is forwarding the Upgrade and (a rewritten) Origin header, then perhaps you might share a sanitized version here that we could use as a reproducer.

Hopefully we can get you unjammed!

Charlie

Thanks @angrycub for your response.

If I try to make exec request bypassing load balancer, using http://nc-server-1:4646 the exec request is success. In case of https it goes through load balancer.

If the issue is with allowing web sockets in load balancer, why is that with admin ACL token, it allows the exec (probably because in that case /v1/node/* returns 200) whereas non acl token return 403 and gives this error.

We use haproxy as load balancer, I will have to check with the configuration for web sockets,

but this is my raw configuration for haproxy, may be you can pin point something

global
        log stdout  format raw  local0  info
        # Default SSL material locations
        ca-base /etc/ssl/certs
        crt-base /etc/ssl/private
        tune.ssl.default-dh-param 2048
        maxconn 50000
        ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-bind-options prefer-client-ciphers no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
        ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
        ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
        ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

frontend lb_01
        bind *:80
        bind *:443 ssl crt /etc/ssl/private/combined.pem alpn h2,http/1.1
        redirect scheme https code 301 if !{ ssl_fc }
        option forwardfor

        acl nomad hdr(host) -i nomad.example.com

        http-response set-header X-Frame-Options SAMEORIGIN
        http-response set-header X-XSS-Protection 1;mode=block
        http-response set-header Strict-Transport-Security max-age=31536000;includeSubDomains;preload
        http-response set-header Referrer-Policy no-referrer-when-downgrade

        http-request capture req.hdr(User-Agent) len 200

        use_backend nomad_console_backend if nomad
        

backend nomad_console_backend
        balance roundrobin
        option forwardfor{{ range nodes }}{{if .Node | regexMatch "server*"}}
        server {{.Node}} {{.Address}}:4646 check{{end}}{{end}}

Interestingly, When I try via browser, the exec is fine even with non admin ACL token.

Any ideas on this anyone ?

Hi @surajthakur :wave:

I was able to reproduce your issue with this job file (the certificate is self-signed, so no worries about “leaking” it):

job "haproxy" {
  datacenters = ["dc1"]

  group "haproxy" {
    count = 1

    network {
      port "http" {
        static = 8080
      }

      port "haproxy_ui" {
        static = 1936
      }
    }

    task "haproxy" {
      driver = "docker"

      config {
        image = "haproxy:2.4"
        ports = ["http", "haproxy_ui"]
        volumes = [
          "local/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg",
        ]
      }

      template {
        data = <<EOF
global
   log stdout format raw local0

defaults
   mode http
   timeout connect 5s
   timeout client 1m
   timeout server 1m
   log global

frontend http_front
   bind *:8080 ssl crt /local/ssl/cert.pem alpn h2,http/1.1
   option httplog
   default_backend http_back

backend http_back
    balance roundrobin
    server local host.docker.internal:4646
EOF

        destination = "local/haproxy.cfg"
      }

      template {
        data        = <<EOF
-----BEGIN CERTIFICATE-----
MIIE1DCCArwCCQCcTGYnzC0kXzANBgkqhkiG9w0BAQsFADAsMQswCQYDVQQGEwJD
QTELMAkGA1UECAwCT04xEDAOBgNVBAcMB1Rvcm9udG8wHhcNMjEwNzIzMjEzNTU5
WhcNMjIwNzIzMjEzNTU5WjAsMQswCQYDVQQGEwJDQTELMAkGA1UECAwCT04xEDAO
BgNVBAcMB1Rvcm9udG8wggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQDU
hR6YbHWisBRpkzN1CGM3lNb/Uv43gVEkRm8aCaBlRM3uktz1hzjX0+gAechnAmV+
k6CJTNzaUc7vrcgSbreloHn6W0Akzz2XBXXxufQzSFxqri1/zTvcyjqGYrR2Z1RQ
y5JIAmekmnwcNmlDuhgwer/QbZvSEGn6qtq/ujR870AkgJAoZ1ae5JuonuLUtSF5
1gzb0dg2qvDSdlOeSQg6uRb2Qg7M667ZxPLGUZdnuAZCuBgLOMNXP7fWgXXnn+GD
1y80bbwQigJhgY1OFi/Vm53Pkqid3ed6oaUMBzgp1apUnqYsPhh0g0eVvpplzlX0
QcdnlDapreY+WgRIPXmsbVESEOcl5Qp4+C291XusTP5UasPfsyz8FxIKrd73bbOR
oHumziARdpwRDyfA6dFAY3xqzr7IPjeS42aLNS8Xz/RhK+5smsSHiNlI//mw2nFa
6RrUl/Oa2KipHNtzHGp/DlYX1pjbj3a0/BqXpvsNUxvBihxJcd6htXi1wRCQ7pGI
e4KE7Ztp6xmnDijm5mMAuA5NpEKH2U5h8PB7jHOA0tXX8XSex+R5q5p/byo3L1QG
LToiD9ZMY6cuRJbFAGb4xMg55krGImGvRcnvwS4p6xpDO/gudEw68MdC4XgLACDg
Ms5pMbeD4oTopmoTC6f9jxizSbALVWkL0mY6CN5WuQIDAQABMA0GCSqGSIb3DQEB
CwUAA4ICAQAjJnx6mwMtYkcE+iInnSLw0NiDt+OzoR+WGpaJ3hMaw+E8G4Y/i/F+
QzPSlgFRgrXjQwCTg2qEGU7BsYHNK6ZNitP4xoKrrRpD6G5C61SLIdYwYuw3uRHC
2jK2wZnIsS5DWpqWegZC3rDuKQFRVaw21LU0fYzl8stmvbmefyiKByCccRiDnz+8
W/bwkmEFp/CbuMVuOKBQoTlK8+HwU0Tjt0Ac0/UyUFTtAm52172U4Z3LUihZCAJq
tOkI/hnxHM9tKKh+tMYuJhqSnGxTOl5eKDJJ5KkZCM89wG9wupisteeLbz1H7jhC
bCKknGlpFY3QA2hF6LCM12meecF3g8XbUZbbohsgQj1bo9ggoAxkXAKa0k01JeN/
51QtbgL6RO+GLA/nDcFHOeUi9p301aGkc5B8LWtGgpG4GjZcr2f7iRPLlMB8gtIY
ae5EzJjmWTrGIsT+/pDAAe8eZcCSSTZO/T3dMaj6gfrZDP0fcxAEhxfYHSoFgZof
24oT1PNSKXt/ZJc5aG91LY+98vlv14rZVywGoDuXc9aeVUXbHkNyPNRTO3BE/sxD
2kYQF2jf5gYeyeQ5vmgx7/6lslNmszUw9ize6HkDncQeGD2rwARsVGK9GDklAhvY
2v7+AlnygQtkJimtyaa1ARwGU8O1cWpMWnNZHwknVQwxhzMZdUcanw==
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
MIIJKQIBAAKCAgEA1IUemGx1orAUaZMzdQhjN5TW/1L+N4FRJEZvGgmgZUTN7pLc
9Yc419PoAHnIZwJlfpOgiUzc2lHO763IEm63paB5+ltAJM89lwV18bn0M0hcaq4t
f8073Mo6hmK0dmdUUMuSSAJnpJp8HDZpQ7oYMHq/0G2b0hBp+qrav7o0fO9AJICQ
KGdWnuSbqJ7i1LUhedYM29HYNqrw0nZTnkkIOrkW9kIOzOuu2cTyxlGXZ7gGQrgY
CzjDVz+31oF155/hg9cvNG28EIoCYYGNThYv1Zudz5Kond3neqGlDAc4KdWqVJ6m
LD4YdINHlb6aZc5V9EHHZ5Q2qa3mPloESD15rG1REhDnJeUKePgtvdV7rEz+VGrD
37Ms/BcSCq3e922zkaB7ps4gEXacEQ8nwOnRQGN8as6+yD43kuNmizUvF8/0YSvu
bJrEh4jZSP/5sNpxWuka1JfzmtioqRzbcxxqfw5WF9aY2492tPwal6b7DVMbwYoc
SXHeobV4tcEQkO6RiHuChO2baesZpw4o5uZjALgOTaRCh9lOYfDwe4xzgNLV1/F0
nsfkeauaf28qNy9UBi06Ig/WTGOnLkSWxQBm+MTIOeZKxiJhr0XJ78EuKesaQzv4
LnRMOvDHQuF4CwAg4DLOaTG3g+KE6KZqEwun/Y8Ys0mwC1VpC9JmOgjeVrkCAwEA
AQKCAgEAsFS3mw662D6y4RpS4rMP56kmbnkFFzbEBY4vVvJP1Fava1kN3ubQojtf
zy08u0OAxPJmjCVrRfYE9ldBnxGgbNtm+fRGl4QgfTL5tpRs6zQKAjX86IJ4PezJ
fIFfbLK1gcg22mqsZiYL/jijRJ+evHLMvnqDhFs8I9EpaVVtgY/dr6vAcNW1SGc0
REd4u7aCTR3uU2GcvVc+M9Ib9URxgI/cXn1W3G5dCLFiImzGbDeDck5fHMh/Q+BJ
f8Cw/Htq2UJtF1pJZYutCAw/G2BLVjglS3pLT5k0HEsMr5s53XQ8PLPZ+vGWxu26
MqQQZZI7PUxq0CVo6YaxeGCmFu5zCbvLRMDz4q7U5uhbd9zrBHviUcbPf20OSfXc
x9rjiZqc2sEZq5jcBGSNkDaAQXZ+sVBqFvSB0Ygm4DL9q2N37EBQ3Fmh2q7qf2cd
E1erPyvS9ICJZ3C7OJS9TBG11qF32A45wLqgX6UJ7dXzQnlAg8yL8097Cititnxw
iHdGuq5VmQIseH3fmGzzcl7MyuZLRnht0hUaFbNG3/RNw8PyX4n9MLuZsJYpSZ0x
cqsGsAhCoUUrOqsdKxDGxijqSOKIAfpEsm5C4neZv6O75o2h5VpuSk1RJ9nAv54W
AtfZ9xRWA0XC17kxXB3SQ+7ONGQZ9jB8tc7r1gJ16Cdffv4qBNECggEBAPNoFpJZ
8YGdHfXfo4tUscxxENK8+iDeArPqe5Qsj13bOzANdnj2FGmifPzIT926Ei0PJLpn
Ty2/nONOh9311cLXPhpZtQDF26f6GVbGW8L4akxE/MCfVbHOJYXpgX6ieN+z9RUX
j2C8Q5gayZAY9t9+HzoKksxyrmKC95PGi4Km+z9tJo1kAFcvPaOj/Fbc3MtZ66qO
FNbO0U1iZknxKTUteSUHtFmTCFRqNLdnJ0UZj0j4cAzY2BRTJQaHAePwhuHaLjta
u+PVKpD2IAJoIzleYHyHnfneyMFyONZRn//MXH3P9Yjfme4Ji1h3YiQMxW3JPftS
31G1rpooAq6pVI0CggEBAN+D8IAbXE7KMPdrMQznRQrYtGAC6ayW1JIv+SkwTCJT
df8LRLXnOGGeMx9LMXWL0afDJMOEm2r2HvDO1jhFy+C2nX6dQq/zlAGxXyjtUoaL
b4LTJBM4OU6URMz6UBQPSOWoUnazgXp5qvuLkzhAqysXl5dGLMiV2LDfW33K+FLu
IzIWHmiKINpLVMyVW4j5eZOLqXNhCnI6P/ktrMZOLnLRPGdfqYYxPfARLZ3BBfx9
5vanE1Esqf53cvMjbYl7v40JSDOl6T/QdhYNRQ2wguCDVgWc9ECrJBY2dQyoUWOY
XvLxvjL6CjBfp7Ph2Zy8QMbADhZ1qtVv0AxQxd7l/d0CggEBAJ1AjfSXLzOxsf17
Mkl9ujB+i4Pamy5IwC5EOvqLn0PfsulkiTm0oZNtappVP2PcJon90piqzbicplsk
DRsVC7kJHhIgCpQpP7PSHDS3iej4XJRRrYk0Z0SsDgnpxcNua/D6bkfmJLc2aeUG
yVnTBwt0i/APjK+RF6CFRDWwe8k43/EmL1YBWUb6OjRSaWAk6HBn850IleYBT69S
9wqmRx0X98A4rgeAukzvIIesmO5HiQ53ksx5+3+GB9Gjv1Dnv/yB3IR0JhcXTJzC
pgNoC/mwQ12wsBsSF1kC1j2AFoJIISkXBWYcL3JdsCzDVCA+L/6xmN2ZuLUtT1RA
rRryQrECggEAWg/IzWW46QpxdpBbgE6DtF4jN/iUfXV9C7aG2ADc/IvSpMS+l+kl
/7eF89sRf8Kp5MYtvxZkpVGsn+1Hxf7hqpcKmOT25Pzpq1Dz/gK6WPpAIV/ATno+
JRp7KnjF4X9TKS6Mo8Wqq1Xw/lB8LpNoJQHplAuzqdMvL/2f2Oz66DeKOPlOoWLe
3/awoYqhCm0zfq8sxQ/Z7LLp6hZsYq9H6f3DMAgMv8SBp4TUc5c+OUHl2Ybysqej
i6RHzg59aYNSaJrP2/fDJ3Jw0mvgYia4ZYymEbbveEs9TDH/Me10dgQEZjHgKJw1
lM5GPaYIUC5Oj4b9ZjFdd4kJNJ0rTagwhQKCAQBWioIh2ezmxq7Zxq09rJU5Lln4
oHTqtMscolP1H04KOwtiQhYIt9mImY4jJbfsOBl+SRvFIZ307edkiumiEIcZ0ccX
rnIfc3xyVxknbW304P4LcpICQHnMq2+4cAkKhCZSXeU9qMf9KRXU5z/4ay99+g5P
O+HP3nqOxNlV4w9xUGTx2HJS+twaRuOVmqIJNWC8sUIR1OKOZjtko/r7gNJCKxNS
LQbMQF53NP+5uH9yPhmQ/5xYR7SxMzelCEzIuwkklvxcaE+VtqEFla7Pu6kRvovh
+WU/7A+5XpPeOtnwknJIMuGV1ADUKhFFvTH5zNWaXhCFuZ6LBJzehsROfjsy
-----END RSA PRIVATE KEY-----
EOF
        destination = "local/ssl/cert.pem"
      }

      resources {
        cpu    = 200
        memory = 256
      }
    }
  }
}

Nomad 1.1x uses an updated version of Go, which seems to default to HTTP2 if available. This is causing some issue with exec, though I couldn’t find the exact reason why.

I was able to get around this by disabling HTTP2 in the HAProxy config (remove the h2 option in the frontend bind).

See if this works for you.

Thanks, @lgfa29 for your response. I can try that, but I still do not get how does it work with the admin ACL token and not work with the non-admin ACL token. That’s the main concern.

I am suspecting its this Handle `nomad exec` termination events in order by notnoop · Pull Request #10657 · hashicorp/nomad · GitHub change, but I am not really sure.
If it requires nodeID which a non admin ACL cannot fetch might be causing issue.

@angrycub @lgfa29 , I got it working by allowing read node permissions in non admin ACL.

So now the ACL permission looks like this, and it works fine. I didn’t modify any settings in load balancer.

namespace "default" {
  policy = "read"
  capabilities = ["submit-job","dispatch-job","read-logs","alloc-exec"]
}
node {
  policy = "read"
}

I could have done this earlier, but I keen to understand what changes in nomad 1.1.2 could have led this break