Chef Tiered Approach vs Chef HA w/ DRBD

Looks like according to this post it isn't recommended to run Chef in HA mode at scale

What is the reason someone would deploy Tiered vs HA architecture?

How many nodes can single/tiered/HA approach handle and at what point/how many nodes can be added before performance degradation become noticeable for each?



looks like DRBD was deprecated. Does the latest implementation of chef recommend HA at scale? I think the answers to previous questions will help me decide what is best.


HA is basically just two stand alone hosts providing basic redundancy with active/passive failover, but no improvements for scalability, indeed performance is degraded by the DRBD/replication aspect.

Tiered breaks out the services from the single host, and most cloud solutions allow for those split services individually to be both redundant and scalable.


The DRBD solution for Chef HA is in fact EOL. The current architecture for Chef Backend HA is more modern.

This sort of set up can handle many thousands of nodes accessing the server on regular intervals. For smaller installations, single instances with robust backup policies are often fine. The tiered architecture allows for additional controls if your hosts are in multiple physical or network locations and you'd prefer to lock down access to the data components. Overall performance is also related to the Chef features you choose to use, such as whether or not you are utilizing policyfiles (which remove dependency solving from the Chef server).

If you have a large environment you'd like to run Chef in, our customer success engineering folks can help with sizing and planning.

First I'll note that the DRBD based HA is no longer supported, and has been removed as of Chef Server 13. Even if you are running older versions, I'd still discourage using it. DRBD/Keepalived works well when you have two servers in the same rack, with multiple ethernet interfaces tied together with crossover cables; e.g. the datacenter of 2010. It doesn't work well in a modern datacenter with virtualized hosts and more complex network structures. In particular, latency is not your friend with keepalived, and the HA solution probably provides lower actual uptime and availability.

Tiered with multiple front ends is the preferred route to scale a Chef Server, and provides a little bit of redundancy, with the caveat that the backend data store is the point of failure. It's important to keep the latency between the frontends and the backend low to maximize throughput; deploying it across data centers is not recommended. Add a good backup strategy for disaster recovery, and that should work for about 95% of the use cases.

If you need a HA solution (as opposed to disaster recovery), the modern approaches are:

  • BYoDB (Bring Your Own DB), possibly using a cloud provided Postgres and Elasticsearch service, or using your own and managing the availability issues yourself.

  • Chef Backend, which sets up a HA cluster for Postgres and Elasticsearch.

  • Chef Automate cluster (currently available if you have a support relationship with Chef)

These come with various operational costs and potential provider dependencies, and I'd recommend careful thought before choosing one of those. Once you go beyond BYoDB on a cloud provider, the higher availability comes with the need to be more attentive operationally.