Can't deploy into fresh EKS cluster

I’ve deployed a cluster using eksctl create cluster , but waypoint never comes online. Here’s the output:

waypoint install --platform=kubernetes -vvv -accept-tos
2023-03-23T10:12:00.285+1300 [INFO]  waypoint: waypoint version: full_string="v0.11.0 (e92d6fbe0)" version=v0.11.0 prerelease="" metadata="" revision=e92d6fbe0
2023-03-23T10:12:00.285+1300 [TRACE] waypoint: starting interrupt listener for context cancellation
2023-03-23T10:12:00.285+1300 [TRACE] waypoint: interrupt listener goroutine started
2023-03-23T10:12:00.286+1300 [DEBUG] waypoint: home configuration directory: path=/Users/ivanvanderbyl/Library/Preferences/waypoint
⠸ Installing Waypoint Helm chart...2023-03-23T10:12:28.248+1300 [DEBUG] waypoint.install.helm_action: creating 1 resource(s)
⠼ Installing Waypoint Helm chart...2023-03-23T10:12:30.084+1300 [DEBUG] waypoint.install.helm_action: creating 11 resource(s)
⠏ Installing Waypoint Helm chart...2023-03-23T10:12:32.674+1300 [DEBUG] waypoint.install.helm_action: beginning wait for 11 resources with timeout of 5m0s
⠇ Installing Waypoint Helm chart...2023-03-23T10:12:34.305+1300 [DEBUG] waypoint.install.helm_action: Service does not have load balancer ingress IP address: default/waypoint-ui
⠸ Installing Waypoint Helm chart...2023-03-23T10:12:36.879+1300 [DEBUG] waypoint.install.helm_action: Service does not have load balancer ingress IP address: default/waypoint-ui
⠋ Installing Waypoint Helm chart...2023-03-23T10:12:39.787+1300 [DEBUG] waypoint.install.helm_action: StatefulSet is not ready: default/waypoint-server. 0 out of 1 expected pods are ready
... 5 mins later
⠦ Installing Waypoint Helm chart...2023-03-23T10:50:31.611+1300 [DEBUG] waypoint.install.helm_action: StatefulSet is not ready: default/waypoint-server. 0 out of 1 expected pods are ready
❌ Installing Waypoint Helm chart...
! Error installing server into kubernetes: Get
  "https://601806A2B558053263442A247BC08056.gr7.us-west-2.eks.amazonaws.com/apis/apps/v1/namespaces/default/statefulsets/waypoint-server":
  context deadline exceeded
2023-03-23T10:50:34.697+1300 [TRACE] waypoint: stopping signal listeners and cancelling the context

Any ideas?

While this process is stuck on Installing Waypoint Helm chart... start a new terminal session and get the output of:

kubectl describe sts -n default waypoint-server
kubectl get sts -n default waypoint-server -o yaml
kubectl logs -n default waypoint-server-0

(the last one may return an error saying the Pod does not yet exist)

Hi @macmiranda — here’s the output:

kubectl describe sts -n default waypoint-server
Name:               waypoint-server
Namespace:          default
CreationTimestamp:  Thu, 23 Mar 2023 13:23:33 +1300
Selector:           app.kubernetes.io/instance=waypoint,app.kubernetes.io/name=waypoint,component=server
Labels:             app.kubernetes.io/instance=waypoint
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=waypoint
Annotations:        meta.helm.sh/release-name: waypoint
                    meta.helm.sh/release-namespace: default
Replicas:           1 desired | 1 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        0 Running / 1 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/instance=waypoint
                    app.kubernetes.io/name=waypoint
                    component=server
                    helm.sh/chart=waypoint-0.1.18
  Service Account:  waypoint
  Containers:
   waypoint:
    Image:       docker.io/hashicorp/waypoint:latest
    Ports:       9701/TCP, 9702/TCP, 9703/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Command:
      waypoint
    Args:
      server
      run
      -accept-tos
      -db=/data/data.db
      -listen-grpc=0.0.0.0:9701
      -listen-http=0.0.0.0:9702
      -listen-http-insecure=0.0.0.0:9703
      -vv
    Limits:
      cpu:     0
      memory:  0
    Requests:
      cpu:     0
      memory:  0
    Liveness:  tcp-socket :grpc delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      HOME:  /home/waypoint
    Mounts:
      /data from data-default (rw)
      /home/waypoint from home (rw)
  Volumes:
   home:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Volume Claims:
  Name:          data-default
  StorageClass:
  Labels:        <none>
  Annotations:   <none>
  Capacity:      10Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age   From                    Message
  ----    ------            ----  ----                    -------
  Normal  SuccessfulCreate  99s   statefulset-controller  create Claim data-default-waypoint-server-0 Pod waypoint-server-0 in StatefulSet waypoint-server success
  Normal  SuccessfulCreate  99s   statefulset-controller  create Pod waypoint-server-0 in StatefulSet waypoint-server successful
kubectl get sts -n default waypoint-server -o yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  annotations:
    meta.helm.sh/release-name: waypoint
    meta.helm.sh/release-namespace: default
  creationTimestamp: "2023-03-23T00:23:33Z"
  generation: 1
  labels:
    app.kubernetes.io/instance: waypoint
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: waypoint
  name: waypoint-server
  namespace: default
  resourceVersion: "3456"
  uid: 4d3c1413-d6d8-4f44-a651-b8f71c34ae8b
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: waypoint
      app.kubernetes.io/name: waypoint
      component: server
  serviceName: waypoint-server
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: waypoint
        app.kubernetes.io/name: waypoint
        component: server
        helm.sh/chart: waypoint-0.1.18
    spec:
      containers:
      - args:
        - server
        - run
        - -accept-tos
        - -db=/data/data.db
        - -listen-grpc=0.0.0.0:9701
        - -listen-http=0.0.0.0:9702
        - -listen-http-insecure=0.0.0.0:9703
        - -vv
        command:
        - waypoint
        env:
        - name: HOME
          value: /home/waypoint
        image: docker.io/hashicorp/waypoint:latest
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 5
          periodSeconds: 3
          successThreshold: 1
          tcpSocket:
            port: grpc
          timeoutSeconds: 5
        name: waypoint
        ports:
        - containerPort: 9701
          name: grpc
          protocol: TCP
        - containerPort: 9702
          name: https
          protocol: TCP
        - containerPort: 9703
          name: http
          protocol: TCP
        resources:
          limits:
            cpu: "0"
            memory: "0"
          requests:
            cpu: "0"
            memory: "0"
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: data-default
        - mountPath: /home/waypoint
          name: home
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1000
        runAsGroup: 1000
        runAsNonRoot: true
        runAsUser: 100
      serviceAccount: waypoint
      serviceAccountName: waypoint
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: home
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data-default
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      volumeMode: Filesystem
    status:
      phase: Pending
status:
  availableReplicas: 0
  collisionCount: 0
  currentReplicas: 1
  currentRevision: waypoint-server-7bcf554ff4
  observedGeneration: 1
  replicas: 1
  updateRevision: waypoint-server-7bcf554ff4
  updatedReplicas: 1

Nothing returned for kubectl logs -n default waypoint-server-0

Here’s the output from the other pods that did start:

❯ kubectl logs -n default waypoint-bootstrap-latest-gyjvzr-bdr4r
2023-03-23T00:25:44.742Z [INFO]  waypoint: waypoint version: full_string="v0.11.0 (e92d6fbe0+CHANGES)" version=v0.11.0 prerelease="" metadata="" revision="e92d6fbe0+CHANGES"
2023-03-23T00:25:44.742Z [TRACE] waypoint: starting interrupt listener for context cancellation
2023-03-23T00:25:44.742Z [DEBUG] waypoint: home configuration directory: path=/home/waypoint/.config/waypoint
Checking for service readiness every 5 seconds...
2023-03-23T00:25:44.744Z [TRACE] waypoint: interrupt listener goroutine started
2023-03-23T00:25:44.765Z [INFO]  waypoint: service ready: advertise_addr=aba0ffa2372494584b122e35b43f0b9a-1919838717.us-west-2.elb.amazonaws.com:9701
2023-03-23T00:25:44.765Z [INFO]  waypoint: initializing server connection
2023-03-23T00:25:44.765Z [TRACE] waypoint: no API client provided, initializing connection if possible
2023-03-23T00:25:44.765Z [INFO]  waypoint.server: attempting to source credentials and connect
2023-03-23T00:25:44.765Z [DEBUG] waypoint.serverclient: connection information: address=waypoint-server:9701 tls=true tls_skip_verify=true send_auth=false has_token=false
2023-03-23T00:27:44.766Z [TRACE] waypoint: stopping signal listeners and cancelling the context
! Error reconnecting with token: error connecting to server: context deadline exceeded
❯ kubectl logs -n default waypoint-bootstrap-latest-gyjvzr-xhg24
2023-03-23T00:23:39.866Z [INFO]  waypoint: waypoint version: full_string="v0.11.0 (e92d6fbe0+CHANGES)" version=v0.11.0 prerelease="" metadata="" revision="e92d6fbe0+CHANGES"
2023-03-23T00:23:39.867Z [TRACE] waypoint: starting interrupt listener for context cancellation
2023-03-23T00:23:39.867Z [DEBUG] waypoint: home configuration directory: path=/home/waypoint/.config/waypoint
2023-03-23T00:23:39.867Z [TRACE] waypoint: interrupt listener goroutine started
Checking for service readiness every 5 seconds...
2023-03-23T00:23:39.880Z [INFO]  waypoint: service ready: advertise_addr=aba0ffa2372494584b122e35b43f0b9a-1919838717.us-west-2.elb.amazonaws.com:9701
2023-03-23T00:23:39.880Z [INFO]  waypoint: initializing server connection
2023-03-23T00:23:39.880Z [TRACE] waypoint: no API client provided, initializing connection if possible
2023-03-23T00:23:39.880Z [INFO]  waypoint.server: attempting to source credentials and connect
2023-03-23T00:23:39.880Z [DEBUG] waypoint.serverclient: connection information: address=waypoint-server:9701 tls=true tls_skip_verify=true send_auth=false has_token=false
! Error reconnecting with token: error connecting to server: context deadline exceeded
2023-03-23T00:25:39.880Z [TRACE] waypoint: stopping signal listeners and cancelling the context❯ kubectl logs -n default waypoint-bootstrap-latest-gyjvzr-xhg24
2023-03-23T00:23:39.866Z [INFO]  waypoint: waypoint version: full_string="v0.11.0 (e92d6fbe0+CHANGES)" version=v0.11.0 prerelease="" metadata="" revision="e92d6fbe0+CHANGES"
2023-03-23T00:23:39.867Z [TRACE] waypoint: starting interrupt listener for context cancellation
2023-03-23T00:23:39.867Z [DEBUG] waypoint: home configuration directory: path=/home/waypoint/.config/waypoint
2023-03-23T00:23:39.867Z [TRACE] waypoint: interrupt listener goroutine started
Checking for service readiness every 5 seconds...
2023-03-23T00:23:39.880Z [INFO]  waypoint: service ready: advertise_addr=aba0ffa2372494584b122e35b43f0b9a-1919838717.us-west-2.elb.amazonaws.com:9701
2023-03-23T00:23:39.880Z [INFO]  waypoint: initializing server connection
2023-03-23T00:23:39.880Z [TRACE] waypoint: no API client provided, initializing connection if possible
2023-03-23T00:23:39.880Z [INFO]  waypoint.server: attempting to source credentials and connect
2023-03-23T00:23:39.880Z [DEBUG] waypoint.serverclient: connection information: address=waypoint-server:9701 tls=true tls_skip_verify=true send_auth=false has_token=false
! Error reconnecting with token: error connecting to server: context deadline exceeded
2023-03-23T00:25:39.880Z [TRACE] waypoint: stopping signal listeners and cancelling the context

Hi @IvanV,

It’s definitely waypoint-server-0 Pod that’s not coming up.

Try:

kubectl describe pod -n default waypoint-server-0
kubectl get -n default events

You should look for anything that may explain why the Pod remains in a pending state. From your previous command outputs there is a good indication it may be related to the PVC:

  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data-default
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      volumeMode: Filesystem
    status:
      phase: Pending

Is the EBS CSI driver installed on your EKS cluster (and configured properly)?

Aha, that was it! For anyone else who comes across this, you need to enable the EBS CSI driver addon, and then create the the IAM permissions for it to provision resources.

Follow these two guides to make it happen:

  1. Creating the Amazon EBS CSI driver IAM role for service accounts - Amazon EKS
  2. Managing the Amazon EBS CSI driver as an Amazon EKS add-on - Amazon EKS

Thanks!

1 Like