Ceph Volume is blocking task from starting

acziryak · January 30, 2023, 6:57pm

I’m failing to deploy a container that is attempting to use a volume that’s using a ceph_csi plugin:

failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out

I see this in the nomad logs:

Jan 28 00:22:01 ind-test-nomad-worker11 nomad[208094]:     2023-01-28T00:22:01.022-0500 [WARN]  client.ceph-csi: finished client unary call: grpc.code=Internal duration=50m0.016328065s grpc.service=csi.v1.Node grpc.method=NodeStageVolume
Jan 28 00:22:01 ind-test-nomad-worker11 nomad[208094]:     2023-01-28T00:22:01.022-0500 [ERROR] client.alloc_runner: prerun failed: alloc_id=b6c05535-82b8-f5d7-a65f-0960daf0b087 error="pre-run hook \"csi_hook\" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = failed to establish the connection: failed to get connection: connecting failed: rados: ret=-110, Connection timed out"

I see this in the ceph node container logs:

I0128 05:30:42.057159       7 utils.go:195] ID: 5090 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-dc31d910-9ecc-11ed-b9a0-9238a2ead1d6 GRPC call: /csi.v1.Node/NodeStageVolume
I0128 05:30:42.057365       7 utils.go:206] ID: 5090 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-dc31d910-9ecc-11ed-b9a0-9238a2ead1d6 GRPC request: {"secrets":"***stripped***","staging_target_path":"/local/csi/staging/prometheus-us-ind-test/rw-file-system-single-node-writer","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"*********************","imageFeatures":"layering","imageName":"csi-vol-dc31d910-9ecc-11ed-b9a0-9238a2ead1d6","journalPool":"ind-nonprod2","pool":"ind-nonprod2"},"volume_id":"0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-dc31d910-9ecc-11ed-b9a0-9238a2ead1d6"}
I0128 05:30:42.057659       7 rbd_util.go:1279] ID: 5090 Req-ID: 0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-dc31d910-9ecc-11ed-b9a0-9238a2ead1d6 setting disableInUseChecks: false image features: [layering] mounter: rbd

FWIW, I am able to manually mount this via

rbd device map ind-nonprod2/test_image --id ind-nonprod2 --keyfile ceph_secret.txt

The thing that’s really leaving me scratching my head is that the allocation isn’t even starting, there is no task for the job. Presumably this is because the volume is not registered even though it says it’s registered in the nomad UI.

I’m sure I’m missing something here, but I’m not sure where to start.

acziryak · February 3, 2023, 5:53pm

Let me back up a bit.

Here are the logs from trying to do the rbd device map... command inside of the node container:

Logs

2023-02-03T17:49:11.903+0000 7f8a94117700  2 Event(0x55c0b8ec92e0 nevent=5000 time_id=1).set_owner center_id=0 owner=140233166386944
2023-02-03T17:49:11.903+0000 7f8a93916700  2 Event(0x55c0b8ecb390 nevent=5000 time_id=1).set_owner center_id=1 owner=140233157994240
2023-02-03T17:49:11.903+0000 7f8a93115700  2 Event(0x55c0b8ecc380 nevent=5000 time_id=1).set_owner center_id=2 owner=140233149601536
2023-02-03T17:49:11.903+0000 7f8a95e0d380  1  Processor -- start
2023-02-03T17:49:11.903+0000 7f8a95e0d380  1 --  start start
2023-02-03T17:49:11.903+0000 7f8a95e0d380  1 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2023-02-03T17:49:11.903+0000 7f8a95e0d380  1 --  --> v1:10.2.64.130:6789/0 -- auth(proto 0 37 bytes epoch 0) v1 -- 0x55c0b8df4ad0 con 0x55c0b8f60f50
2023-02-03T17:49:11.903+0000 7f8a95e0d380  1 --  --> v1:10.2.64.132:6789/0 -- auth(proto 0 37 bytes epoch 0) v1 -- 0x55c0b8de8b40 con 0x55c0b8f5d2d0
2023-02-03T17:49:11.903+0000 7f8a95e0d380  1 --  --> v2:10.2.64.130:3300/0 -- mon_getmap magic: 0 v1 -- 0x55c0b8e47470 con 0x55c0b8f5a950
2023-02-03T17:49:11.903+0000 7f8a95e0d380  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x55c0b8e47470 type=5 mon_getmap magic: 0 v1
2023-02-03T17:49:11.903+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:11.903+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:11.903+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:11.903+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:11.903+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:11.903+0000 7f8a93916700  1 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.200000
2023-02-03T17:49:11.903+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:11.903+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:12.107+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:12.107+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:12.107+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:12.107+0000 7f8a93916700  1 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.400000
2023-02-03T17:49:12.107+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:12.107+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:12.107+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:12.107+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:12.507+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:12.507+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:12.507+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:12.507+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:12.507+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:12.507+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:12.507+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:12.507+0000 7f8a93916700  1 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.800000
2023-02-03T17:49:13.311+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:13.311+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:13.311+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:13.311+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:13.311+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:13.311+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:13.311+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:13.311+0000 7f8a93916700  1 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 1.600000
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 msgr2=0x55c0b8f5cd90 unknown :-1 s=STATE_CONNECTING l=0).mark_down
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).stop
2023-02-03T17:49:14.907+0000 7f8a91912700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 legacy=0x55c0b8f5f6b0 unknown :-1 s=STATE_CONNECTING l=0).mark_down
2023-02-03T17:49:14.907+0000 7f8a91912700  2 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).stop
2023-02-03T17:49:14.907+0000 7f8a91912700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:14.907+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state (warped) reseting security handlers
2023-02-03T17:49:14.907+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state (warped) reseting crypto and compression handlers
2023-02-03T17:49:14.907+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:14.907+0000 7f8a93916700  5 --2-  >> v2:10.2.64.130:3300/0 conn(0x55c0b8f5a950 0x55c0b8f5cd90 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:14.907+0000 7f8a93115700  5 --1-  >> v1:10.2.64.132:6789/0 conn(0x55c0b8f5d2d0 0x55c0b8f5f6b0 :-1 s=CLOSED pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 legacy=0x55c0b8f63400 unknown :-1 s=STATE_CONNECTING l=0).mark_down
2023-02-03T17:49:14.907+0000 7f8a91912700  2 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).stop
2023-02-03T17:49:14.907+0000 7f8a91912700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:14.907+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state (warped) reseting security handlers
2023-02-03T17:49:14.907+0000 7f8a94117700  5 --1-  >> v1:10.2.64.130:6789/0 conn(0x55c0b8f60f50 0x55c0b8f63400 :-1 s=CLOSED pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2023-02-03T17:49:14.907+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:14.907+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:14.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:14.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:14.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:14.907+0000 7f8a93115700  1 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.200000
2023-02-03T17:49:14.907+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:14.907+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --  --> v1:10.2.64.128:6789/0 -- auth(proto 0 37 bytes epoch 0) v1 -- 0x55c0b8de8930 con 0x7f8a7c008850
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --  --> v1:10.2.64.131:6789/0 -- auth(proto 0 37 bytes epoch 0) v1 -- 0x55c0b8df4ad0 con 0x7f8a7c002060
2023-02-03T17:49:14.907+0000 7f8a91912700  1 --  --> v2:10.2.64.131:3300/0 -- mon_getmap magic: 0 v1 -- 0x55c0b8e47470 con 0x7f8a7c005cb0
2023-02-03T17:49:14.907+0000 7f8a91912700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x55c0b8e47470 type=5 mon_getmap magic: 0 v1
2023-02-03T17:49:15.107+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:15.107+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:15.107+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:15.107+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:15.107+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:15.107+0000 7f8a93115700  1 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.400000
2023-02-03T17:49:15.107+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:15.107+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:15.511+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:15.511+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:15.511+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:15.511+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:15.511+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:15.511+0000 7f8a93115700  1 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.800000
2023-02-03T17:49:15.511+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:15.511+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:16.311+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:16.311+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:16.311+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:16.311+0000 7f8a93115700  1 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 1.600000
2023-02-03T17:49:16.311+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:16.311+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:16.311+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:16.311+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 msgr2=0x7f8a7c008160 unknown :-1 s=STATE_CONNECTING l=0).mark_down
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).stop
2023-02-03T17:49:17.907+0000 7f8a91912700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 legacy=0x7f8a7c0044a0 unknown :-1 s=STATE_CONNECTING l=0).mark_down
2023-02-03T17:49:17.907+0000 7f8a91912700  2 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).stop
2023-02-03T17:49:17.907+0000 7f8a91912700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:17.907+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state (warped) reseting security handlers
2023-02-03T17:49:17.907+0000 7f8a93916700  5 --1-  >> v1:10.2.64.131:6789/0 conn(0x7f8a7c002060 0x7f8a7c0044a0 :-1 s=CLOSED pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state (warped) reseting crypto and compression handlers
2023-02-03T17:49:17.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.131:3300/0 conn(0x7f8a7c005cb0 0x7f8a7c008160 unknown :-1 s=CLOSED pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 legacy=0x7f8a7c00ad00 unknown :-1 s=STATE_CONNECTING l=0).mark_down
2023-02-03T17:49:17.907+0000 7f8a91912700  2 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).stop
2023-02-03T17:49:17.907+0000 7f8a91912700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=START_CONNECT pgs=0 cs=0 l=0).reset_recv_state
2023-02-03T17:49:17.907+0000 7f8a94117700  1 --  reap_dead start
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2023-02-03T17:49:17.907+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:17.907+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:17.907+0000 7f8a93916700  1 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.200000
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).connect
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --  reap_dead delete 0x55c0b8f5a950
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --  reap_dead delete 0x55c0b8f5d2d0
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --  reap_dead delete 0x55c0b8f60f50
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --  reap_dead delete 0x7f8a7c002060
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --  reap_dead delete 0x7f8a7c005cb0
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --  reap_dead delete 0x7f8a7c008850
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=CLOSED pgs=0 cs=0 l=0).reset_recv_state (warped) reseting security handlers
2023-02-03T17:49:17.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --1-  >> v1:10.2.64.128:6789/0 conn(0x7f8a7c008850 0x7f8a7c00ad00 :-1 s=CLOSED pgs=0 cs=0 l=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a94117700  1 --  reap_dead start
2023-02-03T17:49:17.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:17.907+0000 7f8a93115700  1 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.200000
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:17.907+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:17.907+0000 7f8a94117700  1 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.200000
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --  --> v2:10.2.64.128:3300/0 -- mon_getmap magic: 0 v1 -- 0x55c0b8e47470 con 0x7f8a7c00e120
2023-02-03T17:49:17.907+0000 7f8a91912700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x55c0b8e47470 type=5 mon_getmap magic: 0 v1
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --  --> v2:10.2.64.129:3300/0 -- mon_getmap magic: 0 v1 -- 0x7f8a7c00d2b0 con 0x7f8a7c00d6b0
2023-02-03T17:49:17.907+0000 7f8a91912700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x7f8a7c00d2b0 type=5 mon_getmap magic: 0 v1
2023-02-03T17:49:17.907+0000 7f8a91912700  1 --  --> v2:10.2.64.132:3300/0 -- mon_getmap magic: 0 v1 -- 0x7f8a7c00d3f0 con 0x7f8a7c00cc30
2023-02-03T17:49:17.907+0000 7f8a91912700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x7f8a7c00d3f0 type=5 mon_getmap magic: 0 v1
2023-02-03T17:49:18.111+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:18.111+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:18.111+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:18.111+0000 7f8a93916700  1 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.400000
2023-02-03T17:49:18.111+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:18.111+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:18.111+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:18.111+0000 7f8a94117700  1 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.400000
2023-02-03T17:49:18.111+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:18.111+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:18.111+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:18.111+0000 7f8a93115700  1 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.400000
2023-02-03T17:49:18.515+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:18.515+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:18.515+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:18.515+0000 7f8a93916700  1 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.800000
2023-02-03T17:49:18.515+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:18.515+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:18.515+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:18.515+0000 7f8a94117700  1 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.800000
2023-02-03T17:49:18.515+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:18.515+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:18.515+0000 7f8a93115700  5 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:18.515+0000 7f8a93115700  1 --2-  >> v2:10.2.64.128:3300/0 conn(0x7f8a7c00e120 0x7f8a7c00c570 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 0.800000
2023-02-03T17:49:19.315+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:19.315+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security
2023-02-03T17:49:19.315+0000 7f8a93916700  5 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_compression
2023-02-03T17:49:19.315+0000 7f8a93916700  1 --2-  >> v2:10.2.64.129:3300/0 conn(0x7f8a7c00d6b0 0x7f8a7c00da80 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0)._fault waiting 1.600000
2023-02-03T17:49:19.315+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_recv_state
2023-02-03T17:49:19.315+0000 7f8a94117700  5 --2-  >> v2:10.2.64.132:3300/0 conn(0x7f8a7c00cc30 0x7f8a7c0148a0 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rev1=0 crypto rx=0 tx=0 comp rx=0 tx=0).reset_security

Maybe it’s my untrained eye, but there’s nothing in there that shows me any failure. Additionally, there’s no ping or traceroute inside of the ceph-csi container that I can use to do network troubleshooting.

I have to conclude that it’s silently failing somewhere, but where?

acziryak · February 17, 2023, 6:46pm

I’ve created a new volume, and am now getting this error:

failed to setup alloc: pre-run hook "csi_hook" failed: node plugin returned an internal error, check the plugin allocation logs for more information: rpc error: code = Internal desc = mount failed: exit status 1 Mounting command: mount Mounting arguments: -t ext4 -o _netdev,noatime,defaults /dev/rbd0 /local/csi/staging/agnostic2-us-ind-test/rw-file-system-single-node-writer/0001-0024-1e35f6bc-1257-45b6-aa9d-16f9ecd30652-0000000000000024-9f0760b0-aee6-11ed-96f6-ae765b511692 Output: mount: only root can use "--options" option (effective UID is 100000)

Is this an issue with namespace mapping?

acziryak · February 17, 2023, 6:55pm

Also, is there some place that Nomad keeps metadata for volumes? I went to delete the volume that was previously giving me trouble, but when I go to recreate it with the same name and ID, it immediately fails. Other volumes are able to be created, but this one is not able to be recreated. I can’t find it in the UI or with CLI tools, but I have a suspicion that there is a cache that needs busting here…

mmeier86 · February 18, 2023, 9:43pm

Let’s start with the recreation issue. Did you also delete the volume in Ceph itself? The ID is part of the volume name in Ceph. Perhaps that’s where the error is coming from?

For the mounting issue. If I remember correctly, Ceph CSI plugin jobs need to run as privileged containers.

What about the Ceph auth you are using? When testing the manual mount, did you use exactly the same credentials as in the Nomad volume file?

I am afraid I am not seeing anything obvious in the logs.

What about the Ceph auth config? Did you happen to restrict the user you are using for Nomad to a specific network?

acziryak · February 20, 2023, 5:41pm

The recreation issue was solved somwhere between restarting the CSI node & controller containers, and a nomad system gc and nomad system reconcile summaries.

Now that I have that volume again, the job hangs when starting.

I get no logs from the controller, the node, the job, or the nomad daemon about why it won’t start. There’s just… nothing.

Also, in testing the container WITHOUT the volume, it starts up immediately and healthily.

Topic		Replies	Views
Problem with volume mount using Ceph CSI Plugin Nomad	1	4190	April 28, 2021
Ceph-csi allocation error Nomad csi	2	537	April 19, 2023
Trying mount CephFS volume into our Nomad cluster Nomad	10	1771	April 17, 2024
Mounting cephfs using csi plugin Nomad	2	1971	July 23, 2022
Internal Ceph Cluster Nomad	6	2660	June 18, 2021

Ceph Volume is blocking task from starting

Related topics