Connecting to Consul Cluster - Directly to Remote Server Agent or via Local Client Agent?

Hi all,

I’m new to Consul, I want to access it from a JVM-hosted application and I have briefly studied 1 of 3 Java libraries at https://www.consul.io/api/libraries-and-sdks.html.

There is a gap in my understanding: how to connect to Consul cluster.

What are pros and cons of connecting directly to a remote server agent versus via a local client agent, please?

So far I have not found a clear recommendation despite I suppose it is a common dilemma every newcomer faces (also “Your topic is similar to…” returns no match).

Best regards
Cc.

[…]
In some places, client agents may cache data from the servers to make it available locally for performance and reliability. Examples include Connect certificates and intentions which allow the client agent to make local decisions about inbound connection requests without a round trip to the servers. Some API endpoints also support optional result caching. This helps reliability because the local agent can continue to respond to some queries like service-discovery or Connect authorization from cache even if the connection to the servers is disrupted or the servers are temporarily unavailable.
[…]

That’s the only thing that came to my mind (but the docs explain it with better words :wink:)

Hi Wolfsrudel,

thank you for trying to help. I went over the authoritative documentation second time, still unclear. I just guess that the consul tool REST-calls the agent client and the agent client REST-calls the agent server. My problem could be reformulated: how to discover agent servers, please? (given that consul is supposed to address service discovery it’s a classical chicken&egg problem)

Having failed to find an answer in the consul documentation I went over non-authoritative documentation, libraries documentation and I have found one requirement that makes sense to me:

A Consul Agent client must be available to all Spring Cloud Consul applications. By default, the Agent client is expected to be at localhost:8500 . …

https://cloud.spring.io/spring-cloud-consul/reference/html/

Spring is known for being opinionated so I hope that they rationalised their pick. Despite I do not like to act based on hope there is nothing better at the moment.

If the library REST-calls the agent client and the agent client REST-calls the agent server it would be a performance hit comparing to directly calling a remote agent server but again lets hope the library smartly avoids that after the discovery phase.

Best regards
Cc.

Hi @pekuz!

Welcome to the forums, and thank you for your first post!
Let’s walk through your questions a bit.

What are pros and cons of connecting directly to a remote server agent versus via a local client agent, please?

Consul, in general, will always send the request to the client, and that is then forwarded to the appropriate servers. This intentional design distributes the request/read load, handles resolution for the request during leader elections or other changes in the cluster, and stops single point failures during the request path. This also allows the application to cache responses for better performance and lower network traffic, as @Wolfsrudel pointed out in his snippet.

From the Architecture Doc that @Wolfsrudel linked;

It is expected that there be between three to five servers. This strikes a balance between availability in the case of failure and performance …

I also recommend you take a look at the Gossip Protocol documentation, as this describes the libraries used to distribute communications and handle failure scenarios.

Hopefully these help cover the pros of using this set up.

As a general rule, there shouldn’t be any reason to connect directly to the server, except for restoring from a snapshot as an operator.

While you can design your application to read in any server IP at start up, that doesn’t mean that that server is even available, let alone able be the leader. This is where Consul agents work well, because you can have your application connect to any agent (client OR server) for initial start up. Because of this, we recommend launching your app with a locally available agent.

Just for some clarification;

I just guess that the consul tool REST-calls the agent client and the agent client REST-calls the agent server.

I just wanted to clarify the request path. Request goes to client agent, which forwards request to a server agent, which forwards the request to the leader agent.

So, for your second question;

how to discover agent servers, please?

To reiterate, we discourage connecting your application directly to the server agents directly. As part of our bootstrapping guide, we describe how you can connect to any Consul node to facilitate connections.

Clients are much easier as they can join against any existing node. All nodes participate in a gossip protocol to perform basic discovery, so once joined to any member of the cluster, new clients will automatically find the servers and register themselves.

Finally, I want to address your message about performance.

If the library REST-calls the agent client and the agent client REST-calls the agent server it would be a performance hit comparing to directly calling a remote agent server but again lets hope the library smartly avoids that after the discovery phase.

While it is true that there is additional “hops” for your application to communicate with the server, consider that your application is part of a larger ecosystem. The additional hops give you them benefits of caching of data, depending on your consistency mode. If your application can handle stale data, then you can gain a lot more benefits from querying your local client rather than connecting to the server.

Thanks for all your questions! I know there is a lot to unpack here, but I hope this helps!

Best,
Jono

2 Likes

Awesome explanation. :+1:

1 Like

Hi Jono,

thank you, your answer is very helpful.

I have a slight problem fully understand:

Why do you mention “let alone able be the leader”, please? IMHO it does not matter as the request would be forwarded to the leader anyway. Do I understand correctly that you have just safeguarded against a common bad practice/temptation that an application tries to detect and connect explicitly to the leader, please?

Best regards
Cc.

Hi @pekuz,

That is correct. We recommend against trying to connect directly to the leader, as the leader could change at any time.

Jono

Hi all,

I confirm that the connection via the local client agent approach works well. Probably not a big surprise for those who already know that it is a good practice…

My feedback:

  • reading about the consensus and gossip protocol, despite interesting, should not be needed for a quick start in the correct direction and

  • the 10,000 foot view image https://www.consul.io/assets/images/consul-arch-420ce04a.png, referenced from several places, tends to be misleading in terms of CLIENT and SERVER are paired, creating an illusion they should be installed side-by-side on a host. I’d recommend to use any visual mean to highlight there are many CLIENTs, may be just to add a fourth CLIENT to break the visual alignment with the three SERVERs.

Hope it helps others too
Cc.

1 Like

Thanks @pekuz,

Happy to hear everything is working well :smiley:

I’ll look at updating the image that you referenced. Thanks for the feedback regarding your experience with the internals document. Consul is able to do a lot for a large number of use cases, so we recommend having folks go through the internals, as well as revisit them frequently! I’ll look into ways where that experience can be more accessible.

Best!
Jono

Hello @jsosulska,

I have a use case where I’m planning to use the KV store + watchers for configuration management, with consul server agents being deployed on k8s.

What is the downside to having a service-endpoint routing to all server agent pods and then connecting directly to the servers via this service-endpoint? If my understanding is correct, requests will always be routed to the leader node and since any failed node will automatically be removed from the service-endpoint, this approach does not have a single point of failure.

The only downside I can think of is that since service-endpoints will round robin between the server pods, requests are not always using the most optimal path. Is there anything I’m missing here?

Another approach we’re thinking of is to run the client agents as a k8s deployment behind a service-endpoint and routing requests via this service-endpoint.

The objective here is to avoid daemonset for client agents since this is a relatively small piece of our infrastructure and we’re not planning on using any of the other features provided by consul.

Thanks,
Ashwin

1 Like

Hi, @jsosulska

If i understand correctly, without the Client node, you can still get the stale data by directly requesting the Server node. The stale mode only determines that all Server nodes can read the request accordingly. It has nothing to do with whether there is a Client.

Thanks !

Hi, @jsosulska

I read your answer carefully. The main advantages of the cIient mode you mentioned are that there are three main purposes of this design:

  • Distribute read requests;
  • And solve that after the cluster chooses the leader or the cluster state changes (Leader changes, Leader is responsible for writing), the write request can be forwarded to the correct Leader node;
  • And request the Server node through the Client node (the Client Node automatically discovers the Server through the Gossip protocol), which can avoid the failure of a single Server node;

For these purposes, I think that the above three points can be solved by mounting LB (or k8s Service) on the Server node, and Client is not necessarily required. Although the Service registration may be a bit special, because the Client request to write to the Service node is only the Catalog, but I think this is not a big problem.

Are there any other considerations for using Client? For example, the introduction of the Client mode is essentially for ServiceMesh.

Thanks !

Hi @jsosulska

Regarding the question of whether Client is required in the Consul deployment architecture, does it need to be discussed in different scenarios?

The KV problem mentioned by @ashwinsadeep can be solved by using LB directly. Normally, there is no need to client node;

Similarly, for the scene as a registry, does it not need to introduce the client, and the problem can be solved by directly using LB or enhancing the SDK. Is there any disadvantage?

The client is required in servicemesh.

Looking forward to your suggestions. :smiley:

Thanks!

Hi @jsosulska @wangyushuai @pekuz

With consul 1.14 the client agent is getting removed and the architecture is planned to use the consul data plane to communicate directly with the server

What would be the changes to the use cases discussed above?

Does hitting the consul server directly not cause an impact on performance, caching and state data scenarios discussed above?

Agents are not even available in the first place to query ${localhost:8500}

Hi Magesh,

(we are migrating away of Consul so I have not investigated, anyway) if the concept of the local agent is removed, its responsibilities should be shifted elsewhere, such as your application. Try to read release notes if there is some migration path or explanation why it’s not needed anymore.

Hope it helps
Cc.