Environment;
3 control nodes and 3 worker nodes.
Control nodes run Nomad and Consul server
Worker node run Nomad and Consul agents
Consul agent have Consul Connect enabled this has been test and working.
Traefik in deploy and can route through tested
On each worker node the following REST query returns the proper response.
curl http://127.0.0.1:8500/v1/catalog/service/whoami
[{“ID”:“183815ff-5d36-435e-c629-f7d3ac10c2c1”,“Node”:“nomadwrk01”,“Address”:“192.168.4.184”,“Datacenter”:“dc1”,“TaggedAddresses”:{“lan”:“192.168.4.184”,“lan_ipv4”:“192.168.4.184”,“wan”:“192.168.4.184”,“wan_ipv4”:“192.168.4.184”},“NodeMeta”:{“consul-network-segment”:“”},“ServiceKind”:“”,“ServiceID”:“_nomad-task-8d5f4b49-5756-e9e9-a1ab-fcfe1e133328-server-whoami-http”,“ServiceName”:“whoami”,“ServiceTags”:[“traefik.enable=true”,“traefik.http.routers.whoami.rule=Host(web.cloudartifacts.io
)”,“traefik.http.routers.whoami.entrypoints=http”],“ServiceAddress”:“192.168.4.184”,“ServiceTaggedAddresses”:{“lan_ipv4”:{“Address”:"
Deployed the following job; (redacted)
job “postgres-cluster” {
datacenters = [“dc1”]
group “patroni-group” {
network {
mode = “bridge”
}
count = 3
constraint {
distinct_hosts = true
}
service {
name = "patroni"
port = "5432"
provider = "consul"
connect {
sidecar_service {}
}
}
task "patroni" {
driver = "docker"
template {
data = <<EOF
scope: cluster01
namespace: service/patroni/testing
name: postgresql-${NOMAD_ALLOC_INDEX}
restapi:
listen: ${NOMAD_IP_db}:5432
connect_address: ${NOMAD_IP_db}:5432
consul:
url: http://127.0.0.1:8500
verify: false
register_service: false
service_check_interval: 30s
bootstrap:
method: initdb
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
max_replication_slots: 5
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 0.0.0.0/0 trust
- host all all 0.0.0.0/0 trust
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: "${NOMAD_IP_db}:5432"
connect_address: "${NOMAD_IP_db}:5432"
data_dir: /var/lib/postgresql/data
pgpass: /tmp/pgpass0
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: "redacted"
rewind:
username: rewind_user
password: rewind_password
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
EOF
destination = "local/postgres.yml"
}
config {
image = "redacted" # Image expects the postgres.yml file
auth {
username = "redacted"
password = "redacted"
}
command = "patroni"
args = ["local/postgres.yml"]
ports = ["db", "api"]
}
}
}
}
Each nodes fails to start with the following error;
2023-05-16 02:26:40,365 WARNING: Retry got exception: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/session/create (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f359de4ee20>: Failed to establish a new connection: [Errno 111] Connection refused'))
2023-05-16 02:26:40,366 ERROR: refresh_session
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/urllib3/connection.py", line 200, in _new_conn
sock = connection.create_connection(
File "/usr/local/lib/python3.9/dist-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/usr/local/lib/python3.9/dist-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
The above exception was the direct cause of the following exception:
Does anyone have a idea what how to fix this issue?
As next steps, plan to recreate the image with some testing tools inside of it to test with.
1.) Test curl http://127.0.0.1:8500/v1/catalog/service/whoami to see the response.
Thanks,