Service instance fallback?

Hello,

If I register 2 instances of a service, let’s say memcached, instances A and B.
I want to use only one at a time.
So A is used and B is standby.
A break, B is used.
A is back, it will be the new standby.

How would I achieve this with consul? service-resolver?

Thank you.

I would advice to use service-splitter with a meta information of primary and standby and a corresponding weight.

@Wolfsrudel Thank you, trying it.

Isn’t service-splitter part of L7? In which case, is it adapted to a TCP only service such as memcached?

Also, I can’t find where to define a service subset. Is it a meta? or tag?

Oh, tcp… i’ve overseen this. Yes, you’are right, tcp is only working with the resolver. My fault.

So would the resolver fulfill my use case?

If I interpret the documentation correctly, the resolver with failover could be the right solution. https://www.consul.io/docs/agent/config-entries/service-resolver.html#failover

OK, will try. Thanks again for your time. :slight_smile:

You’re welcome. subset is taken out of the meta data.

Yes, I spotted this should be defined with a filter.
The documentation is not as clear with consul as with other hashicorp products. Or maybe I’m looking at it from a different philosophy than the one it was designed for.

Service resolver is also part of L7.
And from looking hard at the documentation, I can’t seem to get the config right. Not sure what’s wrong.

Can you share your configuration/ setup and maybe some log/ debug output?

Found!
I was setting it as a normal configuration file in /etc/consul.d where consul config write was what I should use.
It doesn’t work yet but The config is validated, etc… I’ll work deeper on it.
Thank you.

OK, I think I’m missing a last thing.
If I to a DNS query over memcached.service.consul, I still get both instances and the order changes upon requests, just like if my config did nothing.

kind = "service-resolver",
name = "memcached",
default_subset = "on",
subsets = {
  "on" = {
    filter = "Service.Meta.taskedwith == on"
  },
  "off" = {
    filter = "Service.Meta.taskedwith == off"
  }
},
failover = {
  "*" = {
    service_subset = "off"
  }
}

and service definition:
(one node has taskedwith: on, the other has taskedwith: off)

{
    "service": {
        "id": "memcached",
        "name": "memcached",
        "meta": {
            "taskedwith": "off"
        },
        "port": 11211,
        "checks": [
            {
                "args": ["nc", "-zv", "<IP>", "11211"],
                "interval": "5s"
            }
        ]
    }
}

My expectation is that taskedwith: on would be used by default and if it’s down, the taskedwith: off would take over.

Any idea what’s wrong?

“The service-resolver config entry kind controls which service instances should satisfy Connect upstream discovery requests for a given service name.”
(https://www.consul.io/docs/agent/config-entries/service-resolver.html)

I guess you’re missing the Consul Connect part.

Try the following (additional connect-entry between port and checks)

{
    "service": {
        "id": "memcached",
        "name": "memcached",
        "meta": {
            "taskedwith": "off"
        },
        "port": 11211,
        "connect": { "sidecar_service": {} },
        "checks": [
            {
                "args": ["nc", "-zv", "<IP>", "11211"],
                "interval": "5s"
            }
        ]
    }
}

and spawn a connect proxy on every node

consul connect proxy -sidecar-for memcached

And now try again… couldn’t do a final test in my local lab environment. :frowning:

@aminancelot I tested your configuration and was able to get it to work in my environment. I don’t see any issues with it.

Are you spawning the proxy with consul connect envoy -sidecar-for memcached?

The Envoy proxy is required to use any of the L7 features in Connect. Be sure to use that over the built-in proxy (i.e., consul connect proxy -sidecar-for …).

Alright.
Thank you guys, I decided to close my evaluation of consul.
I will not recommend it’s usage.

Reason is the horrendous documentation.
I fully agree with the rant here and sadly think it’s still true as of Feb. 2020: https://www.reddit.com/r/devops/comments/9vnyq9/rant_consul_docs_are_terrible/

And I’m terribly disappointed.
Every Hashicorp product I interacted with in my career was a pleasure, great products, great doc.

Even if someone would craft the solution of this specific simple use case for me, I cannot tell a dev team to craft their own services definitions, knowing it will be a pain and wasted time.

Consul is supposed to make our life easy, be easily understood.

All I found was contradictory examples for which none worked, unhelpful error messages, etc…

I think the principle of consul is great, full of great ideas, full of potential, after all it’s an hashicorp product but I cannot inflict the pain to my coworkers.

I will look for an alternative.

Thanks a lot for the help, I hope this will get fixed and once it does, I will look at it again with pleasure.

Hi Pierre (@aminancelot),

I’m sorry to hear that you have had such a frustrating experience with Consul’s documentation. From looking at this thread I have small glimpse into some of the challenges you’ve faced. It sounds like there were quite a few more, and there is a significant opportunity to improve Consul’s documentation.

I am a product manager on the Consul team. I am interested in understanding the specific areas of docs that you found lacking, or containing contradictory information. Would you be willing to speak with me to discuss this in more detail? I would like to ensure we address these problem areas so that you have a more positive experience should you decide to re-evaluate Consul in the future.

Best,

Blake

1 Like