Connect pods directly by their IPs

Hi,

Is it possible to connect two pods by their IP addresses?

I am trying to deploy a 5G network core on k8s and connect everything to a service mesh. The biggest challenge is that each 5G Network Function needs multiple interfaces to work correctly (at least the implementation I am using). This can be achieved by using Multus CNI to add those extra interfaces to pods. For example: Session Management Function (SMF) communicates with User Plane Function (UPF) via the N4 interface:

# part of the smf-deployment.yaml file
      annotations:
        k8s.v1.cni.cncf.io/networks: |
          [
            {
              "name": "br-mgmt",
              "interface": "mgmt",
              "ips": [ "10.0.254.40/24" ]  
            }, {
              "name": "br-cp",
              "interface": "cp",
              "ips": [ "10.0.11.40/24" ]  
            }, {
              "name": "br-n4",
              "interface": "n4",
              "ips": [ "10.0.14.40/24" ]  
            }, {
              "name": "br-db",
              "interface": "db",
              "ips": [ "10.0.253.40/24" ]  
            }
          ]

Here is my consul helm-consul-values.yaml:

connectInject:
  cni:
    enabled: true
    logLevel: info
    multus: true
    cniBinDir: "/opt/cni/bin"
    cniNetDir: "/etc/cni/net.d"
  enabled: true

global:
  acls:
    manageSystemACLs: true
  datacenter: dc1
  enabled: true
  name: consul
  tls:
    enableAutoEncrypt: true
    enabled: true
    httpsOnly: false
    verify: false
  gossipEncryption:
    autoGenerate: true

server:
  replicas: 1
  securityContext:
    fsGroup: 0
    runAsGroup: 0
    runAsNonRoot: false
    runAsUser: 0
  updatePartition: 0

ui:
  enabled: true
  metrics:
    baseURL: http://prometheus.default.svc.cluster.local:8082
    enabled: true
    provider: prometheus
  service:
    type: NodePort

I added connect-inject and transparent-proxy annotations to the deployments. Also, following advice from the sample values.yaml I included the consul-cni network annotation:

      annotations:
        'consul.hashicorp.com/connect-inject': 'true'
        'consul.hashicorp.com/transparent-proxy': 'true'
        k8s.v1.cni.cncf.io/networks: |
          [
            {
              "name": "br-mgmt",
              "interface": "mgmt",
              "ips": [ "10.0.254.40/24" ]  
            }, {
              "name": "br-cp",
              "interface": "cp",
              "ips": [ "10.0.11.40/24" ]  
            }, {
              "name": "br-n4",
              "interface": "n4",
              "ips": [ "10.0.14.40/24" ]  
            }, {
              "name": "br-db",
              "interface": "db",
              "ips": [ "10.0.253.40/24" ]  
            }, { 
              "name":"consul-cni",
              "namespace": "default"
            }
          ]

I turned on the transparentProxy.dialedDirectly feature in in the ProxyDefaults:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ProxyDefaults
metadata:
  name: global
  namespace: default
spec:
  transparentProxy:
    dialedDirectly: true

in the Consul UI all 5G services are visible as healthy, I added a generic intention (* --allow–> *) however, pods are unable to fully communicate with each other, ping does work, but applications do not:

root@smf-68dc78f887-qxshb:/opt/phoenix# ping -I db 10.0.253.5
PING 10.0.253.5 (10.0.253.5) from 10.0.253.40 db: 56(84) bytes of data.
64 bytes from 10.0.253.5: icmp_seq=1 ttl=64 time=0.050 ms
64 bytes from 10.0.253.5: icmp_seq=2 ttl=64 time=0.023 ms
64 bytes from 10.0.253.5: icmp_seq=3 ttl=64 time=0.027 ms
64 bytes from 10.0.253.5: icmp_seq=4 ttl=64 time=0.023 ms
^C
--- 10.0.253.5 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3056ms
rtt min/avg/max/mdev = 0.023/0.030/0.050/0.011 ms
root@smf-68dc78f887-qxshb:/opt/phoenix# mysql -h 10.0.253.5 --bind-address 10.0.253.40 -u root -p
Enter password: 
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 2
root@smf-68dc78f887-qxshb:/opt/phoenix# 

#sql sees the incoming traffic:
root@sql-767646dd-wfgrr:/# tcpdump -i db -nvvvX
tcpdump: listening on db, link-type EN10MB (Ethernet), capture size 262144 bytes
12:15:15.472117 IP (tos 0x0, ttl 64, id 21704, offset 0, flags [DF], proto TCP (6), length 60)
    10.0.253.40.51132 > 10.0.253.5.3306: Flags [S], cksum 0x0e5d (incorrect -> 0x3b24), seq 1147482603, win 64240, options [mss 1460,sackOK,TS val 2052275826 ecr 0,nop,wscale 7], length 0
	0x0000:  4500 003c 54c8 4000 4006 d7c5 0a00 fd28  E..<T.@.@......(
	0x0010:  0a00 fd05 c7bc 0cea 4465 31eb 0000 0000  ........De1.....
	0x0020:  a002 faf0 0e5d 0000 0204 05b4 0402 080a  .....]..........
	0x0030:  7a53 3e72 0000 0000 0103 0307            zS>r........
12:15:15.472137 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    10.0.253.5.3306 > 10.0.253.40.51132: Flags [R.], cksum 0x56aa (correct), seq 0, ack 1147482604, win 0, length 0
	0x0000:  4500 0028 0000 4000 4006 2ca2 0a00 fd05  E..(..@.@.,.....
	0x0010:  0a00 fd28 0cea c7bc 0000 0000 4465 31ec  ...(........De1.
	0x0020:  5014 0000 56aa 0000                      P...V...


#SMF process logs:
133631555143232/(15) 11:56:36  ::registerNFInstance_callback():101> Registration procedure failed due to internal error
133631978767936/(15) 11:56:46  ::nrf_client_registration_timer_exec():170> Sending registration request, attempt number 10
133631565628992/(15) 11:56:46  ::loop_over_handles():537> request failed: Failure when receiving data from the peer
133631544657472/(15) 11:56:46  ::http_client_handle_response():267> request to http://10.0.11.35:8080/nnrf-nfm/v1/nf-instances/da176f01-5cb0-429a-afef-7445b61984f8 failed - libcurl returned OS error code 32 : Broken pipe
133631544657472/(15) 11:56:46  ::registerNFInstance_callback():101> Registration procedure failed due to internal error
133631978767936/(15) 11:56:56  ::nrf_client_registration_timer_exec():167> Maximum number of retries reached- the nrf_server is either down or its address is unreachable. Stopping attempts to register with nrf_server, to start trying again use the command nrf_client.register

What I find interesting is the fact that it seems like only the TCP is blocked. iperf SMF → UPF:

root@smf-68dc78f887-qxshb:/opt/phoenix# iperf -c 10.0.14.45 -B 10.0.14.40 -e -i 1 -t 10 -p 10001
------------------------------------------------------------
Client connecting to 10.0.14.45, TCP port 10001 with pid 305 (1 flows)
Write buffer size: 131072 Byte
TOS set to 0x0 (Nagle on)
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
tcp write failed: Connection reset by peer
shutdown failed: Transport endpoint is not connected
[ ID] Interval            Transfer    Bandwidth       Write/Err  Rtry     Cwnd/RTT(var)        NetPwr
[  1] 0.0000-0.0001 sec   128 KBytes  0.000 bits/sec  1/0          0       28K/25(11) us  0
WARN: this test may have been CPU bound (1) (or may not be detecting the underlying network devices)
[  1] local 10.0.14.40%n4 port 50869 connected with 10.0.14.45 port 10001 (MSS=1448) (sock=3) (irtt/icwnd=25/14) (ct=0.05 ms) on 2024-12-07 12:20:23 (UTC)
root@smf-68dc78f887-qxshb:/opt/phoenix# 

UDP traffic works as expected:

root@smf-68dc78f887-qxshb:/opt/phoenix# iperf -c 10.0.14.45 -B 10.0.14.40 -e -i 1 -t 10 -p 10001 -u
------------------------------------------------------------
Client connecting to 10.0.14.45, UDP port 10001 with pid 308 (1 flows)
TOS set to 0x0 (Nagle on)
Sending 1470 byte datagrams, IPG target: 11215.21 us (kalman adjust)
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  1] local 10.0.14.40%n4 port 36450 connected with 10.0.14.45 port 10001 (sock=3) on 2024-12-07 12:21:46 (UTC)
[ ID] Interval            Transfer     Bandwidth      Write/Err  PPS
[  1] 0.0000-1.0000 sec   131 KBytes  1.07 Mbits/sec  0/0       91 pps
[  1] 1.0000-2.0000 sec   128 KBytes  1.05 Mbits/sec  0/0       89 pps
[  1] 2.0000-3.0000 sec   128 KBytes  1.05 Mbits/sec  0/0       89 pps
^C[  1] 0.0000-3.7735 sec   485 KBytes  1.05 Mbits/sec  0/0       90 pps
[  1] Sent 339 datagrams
[  1] Server Report:
[ ID] Interval            Transfer     Bandwidth        Jitter   Lost/Total  Latency avg/min/max/stdev PPS NetPwr
[  1] 0.0000-3.7734 sec   485 KBytes  1.05 Mbits/sec   0.000 ms 0/338 (0%) 0.007/0.002/0.073/0.006 ms 89 pps 18874
root@smf-68dc78f887-qxshb:/opt/phoenix# 

consul version: 1.20.1

1 Like