Hi,
so I wanted to get my hands dirty with the CSI volumes business a came across a quite typical roadblock with our company, which is the internal corporate proxy. In a nutshell all my AWS account traffic is routed via the on prem network, border control is enforced and we’re off into the www.
I was able to successfully deploy the AWS EBS CSI controller and client containers following the official guide. IAM policy is in place, all that that config stuff is taken care of…
- Nomad version is 1.0.0
- Both container running on the same machine
Logs from the nodes container:
I1210 12:55:34.132139 1 driver.go:68] Driver: ebs.csi.aws.com Version: v0.8.0
W1210 12:55:37.364275 1 metadata.go:136] Failed to parse the outpost arn:
I1210 12:55:37.364753 1 mount_linux.go:153] Detected OS without systemd
I1210 12:55:37.365614 1 driver.go:138] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:“unix”}
I1210 12:55:40.161131 1 node.go:367] NodeGetInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:55:40.162960 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:56:10.164083 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:56:40.165448 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:57:10.166583 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:57:40.167928 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:58:10.168914 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:58:40.170301 1 node.go:351] NodeGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
Logs from the controller container:
I1210 12:54:58.651043 1 driver.go:68] Driver: ebs.csi.aws.com Version: v0.8.0
W1210 12:55:01.896980 1 metadata.go:136] Failed to parse the outpost arn:
I1210 12:55:01.897598 1 driver.go:138] Listening for connections on address: &net.UnixAddr{Name:"/csi/csi.sock", Net:“unix”}
I1210 12:55:04.688972 1 controller.go:334] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:55:34.690141 1 controller.go:334] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:56:04.691342 1 controller.go:334] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
I1210 12:56:34.693708 1 controller.go:334] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized: XXX_sizecache:0}
So the next step would be to register the volume. And here’s where things get hairy. The command times out with the controller showing different log entries:
driver.go:115] GRPC error: rpc error: code = Internal desc = Could not get volume with ID “vol-06ec063b1287cb0cd”: RequestCanceled: request context canceled
caused by: context canceled
or
driver.go:115] GRPC error: rpc error: code = Internal desc = Could not get volume with ID “vol-06ec063b1287cb0cd”: RequestCanceled: request context canceled
caused by: context deadline exceeded
or
driver.go:115] GRPC error: rpc error: code = Internal desc = Could not get volume with ID “vol-06ec063b1287cb0cd”: RequestError: send request failed
caused by: Post “https://ec2.eu-central-1.amazonaws.com/”: dial tcp 54.239.55.102:443: i/o timeout
Well especially the last error had my attention.
I guess I have a couple of questions at this point:
- is the startup log output from both of the container okay/normal?
- in general which are the resources Nomad (server/client) needs to have access to in a AWS context (metadata endpoint is being one of them)
- who makes the request that is blocked / how is the request flow when I register a volume?
To generally follow up on this, does it make sense to include a section somewhere in the docs where it is outlined what the external dependencies are when one uses CSI volumes on AWS/Azure/GCP or sort of make sure the Nomad clients have access to the metadata endpoint in order to get the az information or gcp.io needs to be accessible because the Envoys aren’t pulled from Docker Hub?
I’m just thinking out loud here I don’t even know if this is something you can come up with given that each setting out there is different…
Cheers