I’ve made some progress. Specifically I can get a cluster up and running if I use host networking and static ports. Which proves my environment should be working.
But if I try to use dynamic ports, it doesn’t work.
If I start just 1 instance count =1
, that instance will get up and running.
If I then increase it count = 3
, the new allocs always eventually fail.
This is both using Consul connect/sidecars and not.
My latest attempt is without sidecars, here is the output of one of the failed allocs:
2022-09-29 22:24:52 0 [Note] WSREP: Loading provider /opt/bitnami/mariadb/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-09-29 22:24:52 0 [Note] WSREP: wsrep_load(): loading provider library '/opt/bitnami/mariadb/lib/libgalera_smm.so'
2022-09-29 22:24:52 0 [Note] WSREP: wsrep_load(): Galera 4.12(r6311685) by Codership Oy <info@codership.com> loaded successfully.
2022-09-29 22:24:52 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-09-29 22:24:52 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2022-09-29 22:24:52 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 0a8b0646-4045-11ed-9519-72d137146895
Seqno: -1 - -1
Offset: -1
Synced: 1
2022-09-29 22:24:52 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 0a8b0646-4045-11ed-9519-72d137146895, offset: -1
2022-09-29 22:24:52 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2022-09-29 22:24:52 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-09-29 22:24:52 0 [Note] WSREP: Recovering GCache ring buffer: Recovery failed, need to do full reset.
2022-09-29 22:24:52 0 [Note] WSREP: Passing config to GCS: base_dir = /bitnami/mariadb/data/; base_host = 172.26.64.105; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;
2022-09-29 22:24:52 0 [Note] WSREP: Start replication
2022-09-29 22:24:52 0 [Note] WSREP: Connecting with bootstrap option: 0
2022-09-29 22:24:52 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2022-09-29 22:24:52 0 [Note] WSREP: protonet asio version 0
2022-09-29 22:24:52 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-09-29 22:24:52 0 [Note] WSREP: backend: asio
2022-09-29 22:24:52 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2022-09-29 22:24:52 0 [Note] WSREP: access file(/bitnami/mariadb/data//gvwstate.dat) failed(No such file or directory)
2022-09-29 22:24:52 0 [Note] WSREP: restore pc from disk failed
2022-09-29 22:24:52 0 [Note] WSREP: GMCast version 0
2022-09-29 22:24:52 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-09-29 22:24:52 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-09-29 22:24:52 0 [Note] WSREP: EVS version 1
2022-09-29 22:24:52 0 [Note] WSREP: gcomm: connecting to group 'lanegaleracluster', peer 'ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200'
2022-09-29 22:24:52 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection established to 85998a95-9c63 tcp://ipv4ofnomad2:30394
2022-09-29 22:24:52 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2022-09-29 22:24:53 0 [Note] WSREP: EVS version upgrade 0 -> 1
2022-09-29 22:24:53 0 [Note] WSREP: declaring 85998a95-9c63 at tcp://ipv4ofnomad2:30394 stable
2022-09-29 22:24:53 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2022-09-29 22:24:53 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2022-09-29 22:24:53 0 [Note] WSREP: view(view_id(NON_PRIM,85998a95-9c63,2) memb {
85998a95-9c63,0
89bad37c-9dbb,0
} joined {
} left {
} partitioned {
})
2022-09-29 22:24:55 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192507812 cwnd: 1 last_queued_since: 192808054432148 last_delivered_since: 192808054432148 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:24:56 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') turning message relay requesting off
2022-09-29 22:24:59 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192511812 cwnd: 1 last_queued_since: 192812054845220 last_delivered_since: 192812054845220 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:04 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192516812 cwnd: 1 last_queued_since: 192817055380163 last_delivered_since: 192817055380163 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:09 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192521812 cwnd: 1 last_queued_since: 192822055844655 last_delivered_since: 192822055844655 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:14 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192526816 cwnd: 1 last_queued_since: 192827056304242 last_delivered_since: 192827056304242 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:19 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://ipv4ofnomad2:30394
2022-09-29 22:25:19 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192531816 cwnd: 1 last_queued_since: 192832056871362 last_delivered_since: 192832056871362 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:20 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') reconnecting to 85998a95-9c63 (tcp://ipv4ofnomad2:30394), attempt 0
2022-09-29 22:25:20 0 [Note] WSREP: (89bad37c-9dbb, 'tcp://0.0.0.0:4567') connection established to 99efac6d-8fd6 tcp://ipv4ofnomad2:30394
2022-09-29 22:25:20 0 [Note] WSREP: remote endpoint tcp://ipv4ofnomad2:30394 changed identity 85998a95-4045-11ed-9c63-c2ab3352eb5a -> 99efac6d-4045-11ed-8fd6-67cb170f717c
2022-09-29 22:25:23 0 [Note] WSREP: evs::proto(89bad37c-9dbb, GATHER, view_id(REG,85998a95-9c63,2)) suspecting node: 85998a95-9c63
2022-09-29 22:25:23 0 [Note] WSREP: evs::proto(89bad37c-9dbb, GATHER, view_id(REG,85998a95-9c63,2)) suspected node without join message, declaring inactive
2022-09-29 22:25:23 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at /bitnami/blacksmith-sandox/libgalera-26.4.12/gcomm/src/pc.cpp:connect():160
2022-09-29 22:25:23 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.12/gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2022-09-29 22:25:23 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.12/gcs/src/gcs.cpp:gcs_open():1663: Failed to open channel 'lanegaleracluster' at 'gcomm://ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200': -110 (Connection timed out)
2022-09-29 22:25:23 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2022-09-29 22:25:23 0 [ERROR] WSREP: wsrep::connect(gcomm://ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200) failed: 7
2022-09-29 22:25:23 0 [ERROR] Aborting
2022-09-29 22:25:23 0 [Note] WSREP: Loading provider /opt/bitnami/mariadb/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-09-29 22:25:23 0 [Note] WSREP: wsrep_load(): loading provider library '/opt/bitnami/mariadb/lib/libgalera_smm.so'
2022-09-29 22:25:23 0 [Note] WSREP: wsrep_load(): Galera 4.12(r6311685) by Codership Oy <info@codership.com> loaded successfully.
2022-09-29 22:25:23 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-09-29 22:25:23 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2022-09-29 22:25:23 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 0a8b0646-4045-11ed-9519-72d137146895
Seqno: -1 - -1
Offset: -1
Synced: 1
2022-09-29 22:25:23 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 0a8b0646-4045-11ed-9519-72d137146895, offset: -1
2022-09-29 22:25:23 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2022-09-29 22:25:23 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-09-29 22:25:23 0 [Note] WSREP: Recovering GCache ring buffer: Recovery failed, need to do full reset.
2022-09-29 22:25:23 0 [Note] WSREP: Passing config to GCS: base_dir = /bitnami/mariadb/data/; base_host = 172.26.64.105; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;
2022-09-29 22:25:23 0 [Note] WSREP: Start replication
2022-09-29 22:25:23 0 [Note] WSREP: Connecting with bootstrap option: 0
2022-09-29 22:25:23 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2022-09-29 22:25:23 0 [Note] WSREP: protonet asio version 0
2022-09-29 22:25:23 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-09-29 22:25:23 0 [Note] WSREP: backend: asio
2022-09-29 22:25:23 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2022-09-29 22:25:23 0 [Note] WSREP: access file(/bitnami/mariadb/data//gvwstate.dat) failed(No such file or directory)
2022-09-29 22:25:23 0 [Note] WSREP: restore pc from disk failed
2022-09-29 22:25:23 0 [Note] WSREP: GMCast version 0
2022-09-29 22:25:23 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-09-29 22:25:23 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-09-29 22:25:23 0 [Note] WSREP: EVS version 1
2022-09-29 22:25:23 0 [Note] WSREP: gcomm: connecting to group 'lanegaleracluster', peer 'ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200'
2022-09-29 22:25:23 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection established to 99efac6d-8fd6 tcp://ipv4ofnomad2:30394
2022-09-29 22:25:23 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2022-09-29 22:25:26 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192539164 cwnd: 1 last_queued_since: 192839406704529 last_delivered_since: 192839406704529 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:27 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') turning message relay requesting off
2022-09-29 22:25:30 0 [Note] WSREP: EVS version upgrade 0 -> 1
2022-09-29 22:25:30 0 [Note] WSREP: declaring 99efac6d-8fd6 at tcp://ipv4ofnomad2:30394 stable
2022-09-29 22:25:30 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2022-09-29 22:25:30 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2022-09-29 22:25:30 0 [Note] WSREP: view(view_id(NON_PRIM,99efac6d-8fd6,1) memb {
99efac6d-8fd6,0
9c6ad368-825d,0
} joined {
} left {
} partitioned {
})
2022-09-29 22:25:31 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192544168 cwnd: 1 last_queued_since: 192844409418272 last_delivered_since: 192844409418272 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:36 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192549168 cwnd: 1 last_queued_since: 192849409908666 last_delivered_since: 192849409908666 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:42 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 4000000 lost: 1 last_data_recv: 192554668 cwnd: 1 last_queued_since: 192854910441101 last_delivered_since: 192854910441101 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:47 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192559668 cwnd: 1 last_queued_since: 192859910957208 last_delivered_since: 192859910957208 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:50 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://ipv4ofnomad2:30394
2022-09-29 22:25:51 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') reconnecting to 99efac6d-8fd6 (tcp://ipv4ofnomad2:30394), attempt 0
2022-09-29 22:25:51 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection established to acb058ac-858e tcp://ipv4ofnomad2:30394
2022-09-29 22:25:51 0 [Note] WSREP: remote endpoint tcp://ipv4ofnomad2:30394 changed identity 99efac6d-4045-11ed-8fd6-67cb170f717c -> acb058ac-4045-11ed-858e-dfec29bbfe8d
2022-09-29 22:25:52 0 [Note] WSREP: (9c6ad368-825d, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192564668 cwnd: 1 last_queued_since: 192864911429779 last_delivered_since: 192864911429779 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:54 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at /bitnami/blacksmith-sandox/libgalera-26.4.12/gcomm/src/pc.cpp:connect():160
2022-09-29 22:25:54 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.12/gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2022-09-29 22:25:54 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.12/gcs/src/gcs.cpp:gcs_open():1663: Failed to open channel 'lanegaleracluster' at 'gcomm://ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200': -110 (Connection timed out)
2022-09-29 22:25:54 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2022-09-29 22:25:54 0 [ERROR] WSREP: wsrep::connect(gcomm://ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200) failed: 7
2022-09-29 22:25:54 0 [ERROR] Aborting
2022-09-29 22:25:55 0 [Note] WSREP: Loading provider /opt/bitnami/mariadb/lib/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2022-09-29 22:25:55 0 [Note] WSREP: wsrep_load(): loading provider library '/opt/bitnami/mariadb/lib/libgalera_smm.so'
2022-09-29 22:25:55 0 [Note] WSREP: wsrep_load(): Galera 4.12(r6311685) by Codership Oy <info@codership.com> loaded successfully.
2022-09-29 22:25:55 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2022-09-29 22:25:55 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2022-09-29 22:25:55 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 0a8b0646-4045-11ed-9519-72d137146895
Seqno: -1 - -1
Offset: -1
Synced: 1
2022-09-29 22:25:55 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 0a8b0646-4045-11ed-9519-72d137146895, offset: -1
2022-09-29 22:25:55 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete.
2022-09-29 22:25:55 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2022-09-29 22:25:55 0 [Note] WSREP: Recovering GCache ring buffer: Recovery failed, need to do full reset.
2022-09-29 22:25:55 0 [Note] WSREP: Passing config to GCS: base_dir = /bitnami/mariadb/data/; base_host = 172.26.64.105; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /bitnami/mariadb/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;
2022-09-29 22:25:55 0 [Note] WSREP: Start replication
2022-09-29 22:25:55 0 [Note] WSREP: Connecting with bootstrap option: 0
2022-09-29 22:25:55 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2022-09-29 22:25:55 0 [Note] WSREP: protonet asio version 0
2022-09-29 22:25:55 0 [Note] WSREP: Using CRC-32C for message checksums.
2022-09-29 22:25:55 0 [Note] WSREP: backend: asio
2022-09-29 22:25:55 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2022-09-29 22:25:55 0 [Note] WSREP: access file(/bitnami/mariadb/data//gvwstate.dat) failed(No such file or directory)
2022-09-29 22:25:55 0 [Note] WSREP: restore pc from disk failed
2022-09-29 22:25:55 0 [Note] WSREP: GMCast version 0
2022-09-29 22:25:55 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2022-09-29 22:25:55 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2022-09-29 22:25:55 0 [Note] WSREP: EVS version 1
2022-09-29 22:25:55 0 [Note] WSREP: gcomm: connecting to group 'lanegaleracluster', peer 'ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200'
2022-09-29 22:25:55 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection established to acb058ac-858e tcp://ipv4ofnomad2:30394
2022-09-29 22:25:55 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2022-09-29 22:25:58 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192570608 cwnd: 1 last_queued_since: 192870849919964 last_delivered_since: 192870849919964 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:25:58 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') turning message relay requesting off
2022-09-29 22:26:01 0 [Note] WSREP: EVS version upgrade 0 -> 1
2022-09-29 22:26:01 0 [Note] WSREP: declaring acb058ac-858e at tcp://ipv4ofnomad2:30394 stable
2022-09-29 22:26:01 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2022-09-29 22:26:01 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2022-09-29 22:26:01 0 [Note] WSREP: view(view_id(NON_PRIM,acb058ac-858e,1) memb {
acb058ac-858e,0
af28a9a4-8146,0
} joined {
} left {
} partitioned {
})
2022-09-29 22:26:03 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192575608 cwnd: 1 last_queued_since: 192875850385005 last_delivered_since: 192875850385005 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:26:08 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192580608 cwnd: 1 last_queued_since: 192880850846383 last_delivered_since: 192880850846383 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:26:13 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192585608 cwnd: 1 last_queued_since: 192885851304425 last_delivered_since: 192885851304425 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:26:18 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192590608 cwnd: 1 last_queued_since: 192890851786958 last_delivered_since: 192890851786958 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:26:22 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://ipv4ofnomad2:30394
2022-09-29 22:26:23 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') connection to peer 00000000-0000 with addr tcp://ipv4ofnomad3:31200 timed out, no messages seen in PT3S, socket stats: rtt: 0 rttvar: 250000 rto: 2000000 lost: 1 last_data_recv: 192595612 cwnd: 1 last_queued_since: 192895852255270 last_delivered_since: 192895852255270 send_queue_length: 0 send_queue_bytes: 0
2022-09-29 22:26:23 0 [Note] WSREP: (af28a9a4-8146, 'tcp://0.0.0.0:4567') reconnecting to acb058ac-858e (tcp://ipv4ofnomad2:30394), attempt 0
2022-09-29 22:26:25 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at /bitnami/blacksmith-sandox/libgalera-26.4.12/gcomm/src/pc.cpp:connect():160
2022-09-29 22:26:25 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.12/gcs/src/gcs_core.cpp:gcs_core_open():219: Failed to open backend connection: -110 (Connection timed out)
2022-09-29 22:26:25 0 [ERROR] WSREP: /bitnami/blacksmith-sandox/libgalera-26.4.12/gcs/src/gcs.cpp:gcs_open():1663: Failed to open channel 'lanegaleracluster' at 'gcomm://ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200': -110 (Connection timed out)
2022-09-29 22:26:25 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2022-09-29 22:26:25 0 [ERROR] WSREP: wsrep::connect(gcomm://ipv4ofnomad1:22471,ipv4ofnomad2:30394,ipv4ofnomad3:31200) failed: 7
2022-09-29 22:26:25 0 [ERROR] Aborting
From what I can understand of the logs, it connects to the cluster, but then something happens to make it lose that connection.
Here is my current job spec:
job "galera" {
meta {
run_uuid = "${uuidv4()}"
}
datacenters = ["caravan"]
type = "service"
group "galera" {
count = 3
spread {
attribute = "${node.unique.name}"
weight = 100
}
volume "galeratrunk" {
type = "csi"
attachment_mode = "file-system"
access_mode = "single-node-writer"
read_only = false
source = "galeratrunk"
per_alloc = true
}
network {
mode = "bridge"
port "lanegalera3306" {
to = 3306
}
port "lanegalera4567" {
to = 4567
}
port "lanegalera4568" {
to = 4568
}
port "lanegalera4444" {
to = 4444
}
}
service {
name = "lanegaleraservice3306"
port = "lanegalera3306"
address_mode = "auto"
tags = [
"lanecc-project-lanegalera-service-3306"
]
task = "cluster"
provider = "consul"
check {
command = "/opt/bitnami/scripts/mariadb-galera/healthcheck.sh"
type = "script"
interval = "30s"
timeout = "5s"
}
}
service {
name = "lanegaleraservice4567"
port = "lanegalera4567"
address_mode = "auto"
tags = [
"lanecc-project-lanegalera-service-4567"
]
task = "cluster"
provider = "consul"
check {
command = "/opt/bitnami/scripts/mariadb-galera/healthcheck.sh"
type = "script"
interval = "30s"
timeout = "5s"
}
}
service {
name = "lanegaleraservice4568"
port = "lanegalera4568"
address_mode = "auto"
tags = [
"lanecc-project-lanegalera-service-4568"
]
task = "cluster"
provider = "consul"
check {
command = "/opt/bitnami/scripts/mariadb-galera/healthcheck.sh"
type = "script"
interval = "30s"
timeout = "5s"
}
}
service {
name = "lanegaleraservice4444"
port = "lanegalera4444"
address_mode = "auto"
tags = [
"lanecc-project-lanegalera-service-4444"
]
task = "cluster"
provider = "consul"
check {
command = "/opt/bitnami/scripts/mariadb-galera/healthcheck.sh"
type = "script"
interval = "30s"
timeout = "5s"
}
}
task "cluster" {
driver = "docker"
config {
image = "git.lanecc.edu:8443/web-services/lanegalera:user0"
#entrypoint = ["/bin/sleep", "3600"]
ports = [
"lanegalera3306",
"lanegalera4567",
"lanegalera4568",
"lanegalera4444"
]
#cap_add = ["net_raw", "net_broadcast"]
privileged = true
}
volume_mount {
volume = "galeratrunk"
destination = "/bitnami/mariadb"
read_only = false
}
template {
data = <<EOM
{{- if not (eq (env "NOMAD_ALLOC_INDEX") "0") }}
MARIADB_GALERA_CLUSTER_ADDRESS = "gcomm://{{ $first := true }}{{- range service "lanegaleraservice4567|any" }}{{ if not $first }},{{ else }}{{ $first = false }}{{ end }}{{ .Address }}:{{ .Port }}{{- end }}"
DAVID = "Not index 0"
{{ else }}
MARIADB_GALERA_CLUSTER_ADDRESS = "gcomm://"
DAVID = "Index is 0"
{{- end }}
MARIADB_GALERA_CLUSTER_BOOTSTRAP = "{{ if not (eq (env "NOMAD_ALLOC_INDEX") "0") }}no{{ else }}yes{{ end }}"
EOM
env = true
change_mode = "noop"
destination = ".env"
}
env {
MARIADB_ROOT_PASSWORD = "password"
MARIADB_DATABASE = "lanegalera"
MARIADB_USER = "lanegalera"
MARIADB_PASSWORD = "password"
MARIADB_GALERA_MARIABACKUP_PASSWORD = "password"
MARIADB_REPLICATION_PASSWORD = "password"
MARIADB_GALERA_CLUSTER_NAME = "lanegaleracluster"
# MARIADB_GALERA_CLUSTER_BOOTSTRAP = "yes"
# MARIADB_GALERA_CLUSTER_BOOTSTRAP = "no"
# MARIADB_GALERA_CLUSTER_ADDRESS = "gcomm://"
# MARIADB_GALERA_CLUSTER_ADDRESS = "gcomm://${NOMAD_UPSTREAM_ADDR_mariadb4567}"
MARIADB_EXTRA_FLAGS = "--wsrep_provider_options=ist.recv_addr=${NOMAD_IP_lanegalera4567}:${NOMAD_HOST_PORT_lanegalera4568};ist.recv_bind=0.0.0.0:${NOMAD_HOST_PORT_lanegalera4568} --wsrep_node_incoming_address=${NOMAD_IP_lanegalera4567} --wsrep_sst_receive_address=${NOMAD_IP_lanegalera4567}"
# MARIADB_GALERA_FORCE_SAFETOBOOTSTRAP = "no"
}
resources {
cpu = 1000
memory = 4096
}
}
update {
max_parallel = 1
min_healthy_time = "180s"
health_check = "task_states"
}
}
}
Any ideas on what is going on?