Newbie Question: Consul High Availability with Java Spring Microservices

We are running a microservice environment based on java spring microservices & netflix eureka as service registry.

Our plan is to migrate from netflix eureka to consul as service registry.

The infrastructure consists of 2 VMs for Spring Applications & 1 VM for Database. On every of these 3 servers there is running the consul agent in server mode and a bunch of microservices. Basically every microservice is running twice - one on the first, one on the second application server.

There is spring support for consul - every microservice is registering itself at the consul agent, all agents are running in a 3 node cluster, acls are set up, encryption is set up, even TLS is set up. Thats all working perfectly as far as I tested it.

The problem is when restarting the consul agent (server) where the spring microservices are registered then all microservices become unavailable (the service instances are removed from consul cluster). As far I understood - this is by design - because only the agent where the service is registered is responsible for checking the service - and the service instance is registered at exactly 1 agent. The information about the available instances is of course available on all consul nodes.

1 possible solution to get it work the way I want would be to start a consul agent client for every microservice instance and connect every microservice to a separate consul agent client. I really dont like this option.

The question is: Is there a possibility to keep the service instances intact when restarting the consul agent where the services are attached? Are my thoughts correct? What would be a real world setup where I can restart or upgrade a consul agent without making all attached services unavailable.

1 Like

We have the exact same issue/question. Since we are running all services with at least one replica, and spreading these on multiple machines, one approach could be to run one agent per machine and force the service registration to be pointed to the agent running locally. This method allows for one agent to be taken down in the case of rolling upgrades, machine maintenance or outages. It does not, however, handle services that need to run with one instance only.

Iā€™m also interested in what the best current practice would be for this scenario.