Service discovery and fault tollerance

Hello!
First of all thanks for the greate idea! I think we are one of the many who has black hole in application configuration automation and I look forward to use habitat for all our apps.

I did not managed to find good example of service discovery with habitat as well as about fault tollerance or scalability.
Simple example:
We have an app that depends on another app and also we have loadbalancer (nginx for example) in front of.
We need to have HA fault tollernace and scalable architecture. Something like:
DNS (route53)
/ \
nginx-availabilityzone-1 nginx-2-availabilityzone-2
|\ /|
app1 app2 … appN
| | |
backend1 backend2 … backendN

We need to loadbalance incoming traffic to all healthy appN, and appN should have in its config IP of healthy backendN
Also we need to automatically add more app nodes in case of high load, and we need that all configs discover that changes as well.
is it possible with habitat or we need to use something like consul+consul-template to have only healthy nodes IP in our configs?

The second question is about habitat client for Windows :slight_smile: Does someone know planned date for release? :slight_smile:

Thanks again!
Roman.

Hi Roman. Yes, the design of Habitat is for it to handle scenarios like the one you describe without the need for external service discovery tools like consul or etcd. The service discovery is built into the process supervisor.

The design is such that all services in Habitat run in service groups and then you resolve (“bind”) the placeholder name in a plan to a real one at runtime (using the --bind flag to the supervisor). An example templated config is the one in haproxy: https://github.com/habitat-sh/core-plans/blob/master/haproxy/config/haproxy.conf

This section shows how, when binding is enabled, the config will be written out such that haproxy has a backend for each healthy instance in the bound group: https://github.com/habitat-sh/core-plans/blob/master/haproxy/config/haproxy.conf#L18-L23 (You could also use the supervisor’s /health API endpoint on the backend instances as a way of checking health)

As for the Habitat client for Windows, this month we are working on getting just the “hab” binary to compile and run on Windows. After that we will move onto the design of the Windows process supervisor and build system, which as you can imagine is a large undertaking. If you’d like to be involved in giving your feedback please pop into the #windows channel in the Habitat Slack and I’d love to discuss this further with you.

  • Julian
1 Like

Interesting. Learn something new.

So by looking at https://github.com/habitat-sh/core-plans/blob/master/haproxy/config/haproxy.conf#L18-L19 specifically, you want to specify any bindings at hab start?

Something like this?

hab start core/haproxy --bind backend:someservice-ring
# where backend is a keyname that is defined in the config
1 Like

That’s exactly it @bdangit - binding resolves the name “backend” to an actual service group name.

That looks very promising!!
Thank you very much for your time taken!

Could you please clarify some more? :blush:

  1. Are other service configs variables accessible with bind except {{ip}} {{port}}?
    sorry if I’m getting it wrong, for example I have also parameter {{url}} for servises in group foo, could it be accessible during runtime in services bar, if bar has been binded to foo like:
    hab start core/bar --bind backend:foo

The goal is to have dynamic configuration of LB depending on backend application properties, like {{url}} etc.

  1. Does supervisor use the same health checks that were implemented for /health endpoint to get service health in runtime? did not find where interval should be configured if so.
    https://www.habitat.sh/docs/reference/plan-syntax/#hooks

  2. Do you have some time measurements already? How fast will be reaction on health changes in binded service groups?

  3. And last very silly question: does peer mean node(host)?
    For example we have 3 websrv behind LB how to start service foo on 1 and 2 websrv and not on 3 in service group foo-servicegroup?

Are other service configs variables accessible with bind except {{ip}} {{port}}?

It depends... where would you be defining this url parameter? Would it be a config element in the backend's default.toml?

Does supervisor use the same health checks that were implemented for /health endpoint to get service health in runtime? did not find where interval should be configured if so.
Chef Habitat Overview

Yes, the supervisor does exactly that. Not sure what you mean by interval. You mean the interval that a load balancer would poll that endpoint for health? Or...? The supervisor itself does not poll the underlying service.

Do you have some time measurements already? How fast will be reaction on health changes in binded service groups?

We haven't run any detailed measurements yet, but it should be fairly fast, on the order of seconds at most. We would love to know the results of any experiments you run.

And last very silly question: does peer mean node(host)?

Yes, when we talk about peer we mean a node; the reason we call it a peer is because if that supervisor is running in a container, it's not really a "node" in the traditional sense anymore.

1 Like