Hi,
Is it possible to connect two pods by their IP addresses?
I am trying to deploy a 5G network core on k8s and connect everything to a service mesh. The biggest challenge is that each 5G Network Function needs multiple interfaces to work correctly (at least the implementation I am using). This can be achieved by using Multus CNI to add those extra interfaces to pods. For example: Session Management Function (SMF) communicates with User Plane Function (UPF) via the N4 interface:
# part of the smf-deployment.yaml file
annotations:
k8s.v1.cni.cncf.io/networks: |
[
{
"name": "br-mgmt",
"interface": "mgmt",
"ips": [ "10.0.254.40/24" ]
}, {
"name": "br-cp",
"interface": "cp",
"ips": [ "10.0.11.40/24" ]
}, {
"name": "br-n4",
"interface": "n4",
"ips": [ "10.0.14.40/24" ]
}, {
"name": "br-db",
"interface": "db",
"ips": [ "10.0.253.40/24" ]
}
]
Here is my consul helm-consul-values.yaml:
connectInject:
cni:
enabled: true
logLevel: info
multus: true
cniBinDir: "/opt/cni/bin"
cniNetDir: "/etc/cni/net.d"
enabled: true
global:
acls:
manageSystemACLs: true
datacenter: dc1
enabled: true
name: consul
tls:
enableAutoEncrypt: true
enabled: true
httpsOnly: false
verify: false
gossipEncryption:
autoGenerate: true
server:
replicas: 1
securityContext:
fsGroup: 0
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
updatePartition: 0
ui:
enabled: true
metrics:
baseURL: http://prometheus.default.svc.cluster.local:8082
enabled: true
provider: prometheus
service:
type: NodePort
I added connect-inject and transparent-proxy annotations to the deployments. Also, following advice from the sample values.yaml I included the consul-cni network annotation:
annotations:
'consul.hashicorp.com/connect-inject': 'true'
'consul.hashicorp.com/transparent-proxy': 'true'
k8s.v1.cni.cncf.io/networks: |
[
{
"name": "br-mgmt",
"interface": "mgmt",
"ips": [ "10.0.254.40/24" ]
}, {
"name": "br-cp",
"interface": "cp",
"ips": [ "10.0.11.40/24" ]
}, {
"name": "br-n4",
"interface": "n4",
"ips": [ "10.0.14.40/24" ]
}, {
"name": "br-db",
"interface": "db",
"ips": [ "10.0.253.40/24" ]
}, {
"name":"consul-cni",
"namespace": "default"
}
]
I turned on the transparentProxy.dialedDirectly feature in in the ProxyDefaults:
apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
name: global
namespace: default
spec:
transparentProxy:
dialedDirectly: true
in the Consul UI all 5G services are visible as healthy, I added a generic intention (* --allow–> *) however, pods are unable to fully communicate with each other, ping does work, but applications do not:
root@smf-68dc78f887-qxshb:/opt/phoenix# ping -I db 10.0.253.5
PING 10.0.253.5 (10.0.253.5) from 10.0.253.40 db: 56(84) bytes of data.
64 bytes from 10.0.253.5: icmp_seq=1 ttl=64 time=0.050 ms
64 bytes from 10.0.253.5: icmp_seq=2 ttl=64 time=0.023 ms
64 bytes from 10.0.253.5: icmp_seq=3 ttl=64 time=0.027 ms
64 bytes from 10.0.253.5: icmp_seq=4 ttl=64 time=0.023 ms
^C
--- 10.0.253.5 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3056ms
rtt min/avg/max/mdev = 0.023/0.030/0.050/0.011 ms
root@smf-68dc78f887-qxshb:/opt/phoenix# mysql -h 10.0.253.5 --bind-address 10.0.253.40 -u root -p
Enter password:
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
root@smf-68dc78f887-qxshb:/opt/phoenix#
#sql sees the incoming traffic:
root@sql-767646dd-wfgrr:/# tcpdump -i db -nvvvX
tcpdump: listening on db, link-type EN10MB (Ethernet), capture size 262144 bytes
12:15:15.472117 IP (tos 0x0, ttl 64, id 21704, offset 0, flags [DF], proto TCP (6), length 60)
10.0.253.40.51132 > 10.0.253.5.3306: Flags [S], cksum 0x0e5d (incorrect -> 0x3b24), seq 1147482603, win 64240, options [mss 1460,sackOK,TS val 2052275826 ecr 0,nop,wscale 7], length 0
0x0000: 4500 003c 54c8 4000 4006 d7c5 0a00 fd28 E..<T.@.@......(
0x0010: 0a00 fd05 c7bc 0cea 4465 31eb 0000 0000 ........De1.....
0x0020: a002 faf0 0e5d 0000 0204 05b4 0402 080a .....]..........
0x0030: 7a53 3e72 0000 0000 0103 0307 zS>r........
12:15:15.472137 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
10.0.253.5.3306 > 10.0.253.40.51132: Flags [R.], cksum 0x56aa (correct), seq 0, ack 1147482604, win 0, length 0
0x0000: 4500 0028 0000 4000 4006 2ca2 0a00 fd05 E..(..@.@.,.....
0x0010: 0a00 fd28 0cea c7bc 0000 0000 4465 31ec ...(........De1.
0x0020: 5014 0000 56aa 0000 P...V...
#SMF process logs:
133631555143232/(15) 11:56:36 ::registerNFInstance_callback():101> Registration procedure failed due to internal error
133631978767936/(15) 11:56:46 ::nrf_client_registration_timer_exec():170> Sending registration request, attempt number 10
133631565628992/(15) 11:56:46 ::loop_over_handles():537> request failed: Failure when receiving data from the peer
133631544657472/(15) 11:56:46 ::http_client_handle_response():267> request to http://10.0.11.35:8080/nnrf-nfm/v1/nf-instances/da176f01-5cb0-429a-afef-7445b61984f8 failed - libcurl returned OS error code 32 : Broken pipe
133631544657472/(15) 11:56:46 ::registerNFInstance_callback():101> Registration procedure failed due to internal error
133631978767936/(15) 11:56:56 ::nrf_client_registration_timer_exec():167> Maximum number of retries reached- the nrf_server is either down or its address is unreachable. Stopping attempts to register with nrf_server, to start trying again use the command nrf_client.register
What I find interesting is the fact that it seems like only the TCP is blocked. iperf SMF → UPF:
root@smf-68dc78f887-qxshb:/opt/phoenix# iperf -c 10.0.14.45 -B 10.0.14.40 -e -i 1 -t 10 -p 10001
------------------------------------------------------------
Client connecting to 10.0.14.45, TCP port 10001 with pid 305 (1 flows)
Write buffer size: 131072 Byte
TOS set to 0x0 (Nagle on)
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
tcp write failed: Connection reset by peer
shutdown failed: Transport endpoint is not connected
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT(var) NetPwr
[ 1] 0.0000-0.0001 sec 128 KBytes 0.000 bits/sec 1/0 0 28K/25(11) us 0
WARN: this test may have been CPU bound (1) (or may not be detecting the underlying network devices)
[ 1] local 10.0.14.40%n4 port 50869 connected with 10.0.14.45 port 10001 (MSS=1448) (sock=3) (irtt/icwnd=25/14) (ct=0.05 ms) on 2024-12-07 12:20:23 (UTC)
root@smf-68dc78f887-qxshb:/opt/phoenix#
UDP traffic works as expected:
root@smf-68dc78f887-qxshb:/opt/phoenix# iperf -c 10.0.14.45 -B 10.0.14.40 -e -i 1 -t 10 -p 10001 -u
------------------------------------------------------------
Client connecting to 10.0.14.45, UDP port 10001 with pid 308 (1 flows)
TOS set to 0x0 (Nagle on)
Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust)
UDP buffer size: 208 KByte (default)
------------------------------------------------------------
[ 1] local 10.0.14.40%n4 port 36450 connected with 10.0.14.45 port 10001 (sock=3) on 2024-12-07 12:21:46 (UTC)
[ ID] Interval Transfer Bandwidth Write/Err PPS
[ 1] 0.0000-1.0000 sec 131 KBytes 1.07 Mbits/sec 0/0 91 pps
[ 1] 1.0000-2.0000 sec 128 KBytes 1.05 Mbits/sec 0/0 89 pps
[ 1] 2.0000-3.0000 sec 128 KBytes 1.05 Mbits/sec 0/0 89 pps
^C[ 1] 0.0000-3.7735 sec 485 KBytes 1.05 Mbits/sec 0/0 90 pps
[ 1] Sent 339 datagrams
[ 1] Server Report:
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Latency avg/min/max/stdev PPS NetPwr
[ 1] 0.0000-3.7734 sec 485 KBytes 1.05 Mbits/sec 0.000 ms 0/338 (0%) 0.007/0.002/0.073/0.006 ms 89 pps 18874
root@smf-68dc78f887-qxshb:/opt/phoenix#
consul version: 1.20.1