I’ve successfully installed waypoint on AWS EKS, on a regular Cluster and also deployed a minimal project. However, I was not able to get the same result with a EKS Fargate Cluster using the same steps.
Is using Waypoint with EKS Fargate possible?
Context:
$GOPATH/bin/waypoint install -platform=kubernetes -accept-tos 2>&1 | tee install.txt
-> Inspecting Kubernetes cluster...
-> Gathering information about the Kubernetes cluster...
-> Initializing service account for on-demand runners...
-> Initializing role bindings for on-demand runner...
-> Service account for on-demand runner initialized!
->
-> Creating Kubernetes resources...
-> Waiting for Kubernetes StatefulSet to be ready...
-> Kubernetes StatefulSet reporting ready
-> Waiting for Kubernetes service to become ready..
…and the CLI would timeout after 10 minutes.
I also saw a pod stuck in Pending:
k get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default waypoint-server-0 0/1 Pending 0 13m
kube-system coredns-c8589f55d-ffdvk 1/1 Running 0 17m
kube-system coredns-c8589f55d-q9scr 1/1 Running 0 17m
Hello - if you view the logs of the pending container with kubectl logs pod/waypoint-server-0, does that provide any information as to what’s happening? If you could also describe the container with kubectl describe pod/waypoint-server-0 that may provide some useful information as to why it’s getting stuck in pending.
Name: waypoint-server-0
Namespace: default
Priority: 2000001000
Priority Class Name: system-node-critical
Node: <none>
Labels: app=waypoint-server
controller-revision-hash=waypoint-server-56dfb5567c
eks.amazonaws.com/fargate-profile=CdkFargetClusterfargateprofile-e34593590693476a84e06bdeb231edc4
statefulset.kubernetes.io/pod-name=waypoint-server-0
Annotations: kubernetes.io/psp: eks.privileged
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/waypoint-server
Containers:
server:
Image: hashicorp/waypoint:latest
Ports: 9701/TCP, 9702/TCP
Host Ports: 0/TCP, 0/TCP
Command:
waypoint
Args:
server
run
-accept-tos
-vv
-db=/data/data.db
-listen-grpc=0.0.0.0:9701
-listen-http=0.0.0.0:9702
Requests:
cpu: 100m
memory: 256Mi
Liveness: http-get https://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
HOME: /data
Mounts:
/data from data (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dl8rk (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-waypoint-server-0
ReadOnly: false
kube-api-access-dl8rk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m7s fargate-scheduler Pod not supported on Fargate: volumes not supported: data not supported because: PVC data-waypoint-server-0 not bound
Dug into this a bit more:
kubectl describe pvc/data-waypoint-server-0
Name: data-waypoint-server-0
Namespace: default
StorageClass: gp2
Status: Pending
Volume:
Labels: app=waypoint-server
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: waypoint-server-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal WaitForFirstConsumer 31m persistentvolume-controller waiting for first consumer to be created before binding
Normal WaitForPodScheduled 70s (x121 over 31m) persistentvolume-controller waiting for pod waypoint-server-0 to be scheduled
Just chiming in here as I have had issues with PVCs and Fargate in the past.
I think to use PVCs on Fargate EKS you need to either create an EFS Volume and pre-provision the Volumes and PVC ahead of pod launch.
At present @thiskevinwang I think it should be possible to do this manually when you create the EKS Fargate cluster. I will spin up an example using Terraform and share it a little later.
@catsby ping me when you start work, I can walk you through the storage requirements for EKS fargate. It might be interesting to look at how the EFS creation could be automated by waypoint. Or at least how we document the process.
Fargate EKS has a number of differences to Fargate ECS, the main two that concern Waypoint are Kubernetes PVCs that Waypoint uses for data storage and configuration of Kubernetes services for load-balancing.
Fargate uses EFS for storage so to enable PVCs you need to provision EFS storage and configure the storage driver in addition to creating the cluster. Once this has been done the Waypoint server will correctly start as it can attach the required PVC.
The second point is the Kubernetes service for Fargate pods, I am not 100% certain on this but I think annotations need to be added to route external traffic as this needs to hit the pod IP for Fargate workloads
'service.beta.kubernetes.io/aws-load-balancer-nlb-target-type': IP
'service.beta.kubernetes.io/aws-load-balancer-type': external
The Helm chart for Waypoint allows finer configuration of the Services for the Waypoint API and UI. The linked repo does have this config in there but I have not got things fully working yet.
Hopefully will have an update for you in the morning once I do some more testing.
λ k get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/waypoint-runner-0 0/1 Init:0/1 0 34m
default pod/waypoint-server-0 0/1 CrashLoopBackOff 15 34m
kube-system pod/aws-node-lcnbh 1/1 Running 0 73m
kube-system pod/coredns-66cb55d4f4-4xqsv 1/1 Running 0 86m
kube-system pod/coredns-66cb55d4f4-zgqsz 1/1 Running 0 86m
kube-system pod/kube-proxy-2x2q4 1/1 Running 0 80m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 10.100.0.1 <none> 443/TCP 87m
default service/waypoint-server ClusterIP None <none> 9702/TCP,9701/TCP 34m
default service/waypoint-ui LoadBalancer 10.100.251.185 <pending> 80:32432/TCP,443:32016/TCP,9701:32124/TCP,9702:30743/TCP 34m
kube-system service/kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP 87m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 1 1 1 1 1 <none> 87m
kube-system daemonset.apps/kube-proxy 1 1 1 1 1 <none> 87m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 87m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-66cb55d4f4 2 2 2 86m
NAMESPACE NAME READY AGE
default statefulset.apps/waypoint-runner 0/1 34m
default statefulset.apps/waypoint-server 0/1 34m
Persistent Volumes
λ k get pv -A
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
efs-pv-runner-0 10Gi RWX Delete Bound default/data-default-waypoint-runner-0 efs-sc 77m
efs-pv-server-0 10Gi RWX Delete Bound default/data-default-waypoint-server-0 efs-sc 77m
Persistent Volume Claims
λ k get pvc -A
NAMESPACE NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
default data-default-waypoint-runner-0 Bound efs-pv-runner-0 10Gi RWX efs-sc 77m
default data-waypoint-server-0 Pending gp2 58m
Hey @thiskevinwang apologies for the late reply, I did not get a notification from Discuss.
I have done quite a bit of testing around this and it is possible to get Waypoint running solely on Fargate. The steps that you need to take are:
Provision EFS
Create IAM Role for Load balancing controller
Create IAM for Fargate pods that allow access to ECR
Create IAM for Waypoint Runner that has permission to push to ECR
Deploy AWS Load balancer controller
Configure Waypoint runner service accounts to use IAM policy
With all of this in place, it is possible to get the server up and running, but due to the slow startup time for Fargate pods Waypoint can timeout.
My latest version of the repo takes a slightly different approach. The Waypoint server, runner, and other system pods are deployed to a single node managed node group in EKS. I then create a Fargate profile waypoint-apps that is linked to the waypoint-apps namespace for running Waypoint applications. So this is a hybrid approach with the server and runners on traditional nodes and apps on Fargate.
In my humble opinion, this provides the best user experience since the on-demand runners execute on the existing node startup for jobs, and build times are greatly reduced.
We will take a look at what we can do stop the timeout issues but due to fargate startup a build and deployment can take 10 minutes using Fargate pods. This is reduced to about 1 minute on the managed node pool.