Hi. I try to understand why my two client nodes always deregistering all jobs? I expect that all jobs will be remain register in consul. Thanks for any help!
What happens:
- task 8085765b register in consul by nomad-stage-02
- a few seconds passed
- task 8085765b deregister by nomad-stage-03
Cluster:
- 1 server
- 2 clients
# nomad server members
Name Address Port Status Leader Protocol Build Datacenter Region
nomad-stage-01.lon 10.0.15.50 4648 alive true 2 0.10.2 dc1 lon
# nomad node status
ID DC Name Class Drain Eligibility Status
0eb5e83c dc1 nomad-stage-02 <none> false eligible ready
2e0df0d5 dc1 nomad-stage-03 <none> false eligible ready
job status
# nomad job status api
ID = api
Name = api
Submit Date = 2019-12-14T22:36:22Z
Type = service
Priority = 50
Datacenters = dc1
Status = running
Periodic = false
Parameterized = false
Summary
Task Group Queued Starting Running Failed Complete Lost
api 0 0 2 0 0 0
Latest Deployment
ID = 899143f4
Status = successful
Description = Deployment completed successfully
Deployed
Task Group Desired Placed Healthy Unhealthy Progress Deadline
api 2 2 2 0 2019-12-14T22:46:34Z
Allocations
ID Node ID Task Group Version Desired Status Created Modified
8085765b 0eb5e83c api 0 run running 8m29s ago 8m17s ago
8e256817 2e0df0d5 api 0 run running 8m29s ago 8m18s ago
Consul server http requests
I see requests for deregister tasks.
# tshark -i 2 -T fields -e ip.src -e http.request.uri -Y "http.request.uri contains \"/deregister/\""
10.0.15.2 /v1/agent/service/deregister/_nomad-task-8085765b-03bd-b8ce-ff85-3dea58b3a1fd-server-api-http
10.0.15.144 /v1/agent/service/deregister/_nomad-task-8e256817-9c4e-a486-9faa-fee36e56fe88-server-api-http
10.0.15.144 /v1/agent/service/deregister/_nomad-task-054371fd-8635-84ab-bab6-b9d6f634577b-redis-cache-redis-db
10.0.15.144 /v1/agent/check/deregister/_nomad-check-ace23129793db27d02688b0fe2e809600bb12a18
10.0.15.2 /v1/agent/service/deregister/_nomad-task-8085765b-03bd-b8ce-ff85-3dea58b3a1fd-server-api-http
10.0.15.144 /v1/agent/service/deregister/_nomad-task-054371fd-8635-84ab-bab6-b9d6f634577b-redis-cache-redis-db
10.0.15.144 /v1/agent/service/deregister/_nomad-task-8e256817-9c4e-a486-9faa-fee36e56fe88-server-api-http
10.0.15.144 /v1/agent/check/deregister/_nomad-check-ace23129793db27d02688b0fe2e809600bb12a18
Consul log
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Service "_nomad-client-y3ummobs4wezbb2uf2t5a6i24eaiecat" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Service "_nomad-server-naeump42jytincxz3m2tyan3lsoyrlpm" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Service "_nomad-client-sal345rz4h4ypmuqbrpfw6fi74yrbnr6" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Service "_nomad-server-v3oww4banlushdezlxku5g2ho5f24cir" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Service "_nomad-server-3au7gyp32cqshfolntomr4edckoxupij" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Check "_nomad-check-ae73c17743eda6d8176d4a3e6e984cf94027a392" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Check "_nomad-check-be4eefd46339cd5ce496d026621570bd4a49a9eb" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Check "_nomad-check-a709d1a775b05fb7bbc6dc6896f215c5dc63fd26" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Check "_nomad-check-c73d87a15cab159cd2d6cbc18de9e25dc238907c" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Check "_nomad-check-704a3311d007b397c42b3697997dea4cf848c64d" in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] agent: Node info in sync
consul[137]:2019/12/14 22:47:56 [DEBUG] http: Request PUT /v1/agent/service/deregister/_nomad-task-8085765b-03bd-b8ce-ff85-3dea58b3a1fd-server-api-http (6.371574ms) from=10.0.15.2:48068
Nomad logs
nomad[14317]: 2019-12-14T22:49:56.055Z [DEBUG] consul.sync: sync complete: registered_services=1 deregistered_services=2 registered_checks=0 deregistered_checks=1
nomad[14317]: consul.sync: sync complete: registered_services=1 deregistered_services=2 registered_checks=0 deregistered_checks=1
nomad[14317]: 2019-12-14T22:50:00.881Z [DEBUG] http: request complete: method=GET path=/v1/agent/health?type=client duration=614.186µs
server.hcl
server {
enabled = true
bootstrap_expect = 1
rejoin_after_leave = false
enabled_schedulers = ["service","batch","system"]
num_schedulers = 1
node_gc_threshold = "24h"
eval_gc_threshold = "1h"
job_gc_threshold = "4h"
encrypt = ""
}
client.hcl
client {
enabled = true
node_class = ""
no_host_uuid = false
max_kill_timeout = "30s"
network_speed = 0
cpu_total_compute = 0
gc_interval = "1m"
gc_disk_usage_threshold = 80
gc_inode_usage_threshold = 70
gc_parallel_destroys = 2
reserved {
cpu = 0
memory = 0
disk = 0
}
options = {
"docker.auth.config" = "/root/.docker/config.json"
"docker.cleanup.image" = "0"
"driver.raw_exec.enable" = "1"
}
}
base.hcl
name = "{HOSTNAME}"
region = "lon"
datacenter = "dc1"
enable_debug = true
bind_addr = "{BIND_ADDR}"
advertise {
http = "{BIND_ADDR}:4646"
rpc = "{BIND_ADDR}:4647"
serf = "{BIND_ADDR}:4648"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
consul {
# The address to the Consul agent.
address = "{CONSUL_ADDR}:8500"
token = "myToken"
# The service name to register the server and client with Consul.
server_service_name = "nomad-servers"
client_service_name = "nomad-clients"
tags = {}
# Enables automatically registering the services.
auto_advertise = true
# Enabling the server and client to bootstrap using Consul.
server_auto_join = true
client_auto_join = true
}
data_dir = "/var/nomad"
log_level = "DEBUG"