How do you run Waypoint on an EKS Fargate Cluster?

I’ve successfully installed waypoint on AWS EKS, on a regular Cluster and also deployed a minimal project. However, I was not able to get the same result with a EKS Fargate Cluster using the same steps.

Is using Waypoint with EKS Fargate possible?


Context:

$GOPATH/bin/waypoint install -platform=kubernetes -accept-tos 2>&1 | tee install.txt

-> Inspecting Kubernetes cluster...
-> Gathering information about the Kubernetes cluster...
-> Initializing service account for on-demand runners...
-> Initializing role bindings for on-demand runner...
-> Service account for on-demand runner initialized!
-> 
-> Creating Kubernetes resources...
-> Waiting for Kubernetes StatefulSet to be ready...
-> Kubernetes StatefulSet reporting ready
-> Waiting for Kubernetes service to become ready..

…and the CLI would timeout after 10 minutes.

I also saw a pod stuck in Pending:

k get pods -A

NAMESPACE     NAME                      READY   STATUS    RESTARTS   AGE
default       waypoint-server-0         0/1     Pending   0          13m
kube-system   coredns-c8589f55d-ffdvk   1/1     Running   0          17m
kube-system   coredns-c8589f55d-q9scr   1/1     Running   0          17m

Thanks in advance!

Hello - if you view the logs of the pending container with kubectl logs pod/waypoint-server-0, does that provide any information as to what’s happening? If you could also describe the container with kubectl describe pod/waypoint-server-0 that may provide some useful information as to why it’s getting stuck in pending.

Please let us know!

Thanks for the tips @catsby !

kubectl logs pod/waypoint-server-0 prints nothing

kubectl describe pod/waypoint-server-0 prints

Name:                 waypoint-server-0
Namespace:            default
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 <none>
Labels:               app=waypoint-server
                      controller-revision-hash=waypoint-server-56dfb5567c
                      eks.amazonaws.com/fargate-profile=CdkFargetClusterfargateprofile-e34593590693476a84e06bdeb231edc4
                      statefulset.kubernetes.io/pod-name=waypoint-server-0
Annotations:          kubernetes.io/psp: eks.privileged
Status:               Pending
IP:                   
IPs:                  <none>
Controlled By:        StatefulSet/waypoint-server
Containers:
  server:
    Image:       hashicorp/waypoint:latest
    Ports:       9701/TCP, 9702/TCP
    Host Ports:  0/TCP, 0/TCP
    Command:
      waypoint
    Args:
      server
      run
      -accept-tos
      -vv
      -db=/data/data.db
      -listen-grpc=0.0.0.0:9701
      -listen-http=0.0.0.0:9702
    Requests:
      cpu:     100m
      memory:  256Mi
    Liveness:  http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      HOME:  /data
    Mounts:
      /data from data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dl8rk (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-waypoint-server-0
    ReadOnly:   false
  kube-api-access-dl8rk:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  4m7s  fargate-scheduler  Pod not supported on Fargate: volumes not supported: data not supported because: PVC data-waypoint-server-0 not bound

Dug into this a bit more:

kubectl describe pvc/data-waypoint-server-0

Name:          data-waypoint-server-0
Namespace:     default
StorageClass:  gp2
Status:        Pending
Volume:        
Labels:        app=waypoint-server
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Used By:       waypoint-server-0
Events:
  Type    Reason                Age                  From                         Message
  ----    ------                ----                 ----                         -------
  Normal  WaitForFirstConsumer  31m                  persistentvolume-controller  waiting for first consumer to be created before binding
  Normal  WaitForPodScheduled   70s (x121 over 31m)  persistentvolume-controller  waiting for pod waypoint-server-0 to be scheduled

Just chiming in here as I have had issues with PVCs and Fargate in the past.

I think to use PVCs on Fargate EKS you need to either create an EFS Volume and pre-provision the Volumes and PVC ahead of pod launch.

At present @thiskevinwang I think it should be possible to do this manually when you create the EKS Fargate cluster. I will spin up an example using Terraform and share it a little later.

@catsby ping me when you start work, I can walk you through the storage requirements for EKS fargate. It might be interesting to look at how the EFS creation could be automated by waypoint. Or at least how we document the process.

Running stateful workloads with Amazon EKS on AWS Fargate using Amazon EFS | Containers

1 Like

I have not got this fully working yet but the Terraform below is a starting point on how to get Waypoint running on EKS Fargate.

nicholasjackson/waypoint-eks-fargate-example: Example showing how to run Waypoint on EKS Fargate (github.com)

Fargate EKS has a number of differences to Fargate ECS, the main two that concern Waypoint are Kubernetes PVCs that Waypoint uses for data storage and configuration of Kubernetes services for load-balancing.

Fargate uses EFS for storage so to enable PVCs you need to provision EFS storage and configure the storage driver in addition to creating the cluster. Once this has been done the Waypoint server will correctly start as it can attach the required PVC.

The second point is the Kubernetes service for Fargate pods, I am not 100% certain on this but I think annotations need to be added to route external traffic as this needs to hit the pod IP for Fargate workloads

 'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': IP
 'service.beta.kubernetes.io/aws-load-balancer-type': external

Network load balancing on Amazon EKS - Amazon EKS

The Helm chart for Waypoint allows finer configuration of the Services for the Waypoint API and UI. The linked repo does have this config in there but I have not got things fully working yet.

Hopefully will have an update for you in the morning once I do some more testing.

1 Like

@nic I think the repo you shared is private

Sorry, I have made that public now.

I have also got to the bottom of a CoreDNS issue, will push an update in a short while.

1 Like

Awesome, thank you @nic :raised_hands:. This is both helpful and extremely educational for me.

I can try out the repo soon and report back if I hit any snags.

I tried to deploy the repo with:

The helm bit looks to have failed, but all the infrastructure was deployed successfully

Here’s my cluster overview:

λ  k get all -A
NAMESPACE     NAME                           READY   STATUS             RESTARTS   AGE
default       pod/waypoint-runner-0          0/1     Init:0/1           0          34m
default       pod/waypoint-server-0          0/1     CrashLoopBackOff   15         34m
kube-system   pod/aws-node-lcnbh             1/1     Running            0          73m
kube-system   pod/coredns-66cb55d4f4-4xqsv   1/1     Running            0          86m
kube-system   pod/coredns-66cb55d4f4-zgqsz   1/1     Running            0          86m
kube-system   pod/kube-proxy-2x2q4           1/1     Running            0          80m

NAMESPACE     NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                                                    AGE
default       service/kubernetes        ClusterIP      10.100.0.1       <none>        443/TCP                                                    87m
default       service/waypoint-server   ClusterIP      None             <none>        9702/TCP,9701/TCP                                          34m
default       service/waypoint-ui       LoadBalancer   10.100.251.185   <pending>     80:32432/TCP,443:32016/TCP,9701:32124/TCP,9702:30743/TCP   34m
kube-system   service/kube-dns          ClusterIP      10.100.0.10      <none>        53/UDP,53/TCP                                              87m

NAMESPACE     NAME                        DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   daemonset.apps/aws-node     1         1         1       1            1           <none>          87m
kube-system   daemonset.apps/kube-proxy   1         1         1       1            1           <none>          87m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns   2/2     2            2           87m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-66cb55d4f4   2         2         2       86m

NAMESPACE   NAME                               READY   AGE
default     statefulset.apps/waypoint-runner   0/1     34m
default     statefulset.apps/waypoint-server   0/1     34m

Persistent Volumes

λ  k get pv -A
NAME              CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                    STORAGECLASS   REASON   AGE
efs-pv-runner-0   10Gi       RWX            Delete           Bound    default/data-default-waypoint-runner-0   efs-sc                  77m
efs-pv-server-0   10Gi       RWX            Delete           Bound    default/data-default-waypoint-server-0   efs-sc                  77m

Persistent Volume Claims

λ  k get pvc -A
NAMESPACE   NAME                             STATUS    VOLUME            CAPACITY   ACCESS MODES   STORAGECLASS   AGE
default     data-default-waypoint-runner-0   Bound     efs-pv-runner-0   10Gi       RWX            efs-sc         77m
default     data-waypoint-server-0           Pending                                               gp2            58m

Hey @thiskevinwang apologies for the late reply, I did not get a notification from Discuss.

I have done quite a bit of testing around this and it is possible to get Waypoint running solely on Fargate. The steps that you need to take are:

  • Provision EFS
  • Create IAM Role for Load balancing controller
  • Create IAM for Fargate pods that allow access to ECR
  • Create IAM for Waypoint Runner that has permission to push to ECR
  • Deploy AWS Load balancer controller
  • Configure Waypoint runner service accounts to use IAM policy

With all of this in place, it is possible to get the server up and running, but due to the slow startup time for Fargate pods Waypoint can timeout.

My latest version of the repo takes a slightly different approach. The Waypoint server, runner, and other system pods are deployed to a single node managed node group in EKS. I then create a Fargate profile waypoint-apps that is linked to the waypoint-apps namespace for running Waypoint applications. So this is a hybrid approach with the server and runners on traditional nodes and apps on Fargate.

In my humble opinion, this provides the best user experience since the on-demand runners execute on the existing node startup for jobs, and build times are greatly reduced.

We will take a look at what we can do stop the timeout issues but due to fargate startup a build and deployment can take 10 minutes using Fargate pods. This is reduced to about 1 minute on the managed node pool.

I have updated the repo below:

nicholasjackson/waypoint-eks-fargate-example: Example showing how to run Waypoint on EKS Fargate (github.com)

Kind regards,

Nic

1 Like