Postgres master/slave replication and failover to hot standby

I am starting to work on master/slave streaming replication for postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine
that I can do it w/ some kind of script to check whether master is
available or not, then promote a slave. However, then I have make sure that
a promoted slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written something
that fulfills my requirements, please let know!

Cheers

BryanWB

I too would be interested in this.

On Mon, Jan 23, 2012 at 8:30 PM, Bryan Berry bryan.berry@gmail.com wrote:

I am starting to work on master/slave streaming replication for postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine that
I can do it w/ some kind of script to check whether master is available or
not, then promote a slave. However, then I have make sure that a promoted
slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written something
that fulfills my requirements, please let know!

Cheers

BryanWB

You could take a look at repmgr: https://github.com/greg2ndQuadrant/repmgr

-----Original Message-----
From: Tim Uckun [mailto:timuckun@gmail.com]
Sent: Monday, 23 January 2012 10:26 p.m.
To: chef@lists.opscode.com
Subject: [chef] Re: postgres master/slave replication and failover to hot standby

I too would be interested in this.

On Mon, Jan 23, 2012 at 8:30 PM, Bryan Berry bryan.berry@gmail.com wrote:

I am starting to work on master/slave streaming replication for
postgres, based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I
imagine that I can do it w/ some kind of script to check whether
master is available or not, then promote a slave. However, then I have
make sure that a promoted slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written
something that fulfills my requirements, please let know!

Cheers

BryanWB

Think about using traditional HA tools, like Heartbeat/Corosync and Pacemaker to do the actual promotion. No need to re-invent the wheel.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 22, 2012, at 11:30 PM, Bryan Berry wrote:

I am starting to work on master/slave streaming replication for postgres, based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine that I can do it w/ some kind of script to check whether master is available or not, then promote a slave. However, then I have make sure that a promoted slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your advice.

If anyone is working on a similar cookbook or has already written something that fulfills my requirements, please let know!

Cheers

BryanWB

If you're interested in a corosync/pacemaker cookbook I started (but
have yet to finish), here you go:
https://github.com/mattray/barclamp_ha_service/tree/pacemaker_service/chef/cookbooks/pacemaker

It installs pacemaker and configures the master/server, it just
doesn't actually manage the services yet. I wanted to get fancy and
let Chef think it was still managing services even when Pacemaker was
managing them, but I got a little blocked and other stuff took
precedence. Even though it's in a Crowbar barclamp, I was testing it
with just Chef and hadn't touched the other Crowbar stuff (so just
grab that cookbook). There's also an existing drbd cookbook that works
on the community site.

Thanks,
Matt Ray
Senior Technical Evangelist | Opscode Inc.
matt@opscode.com | (512) 731-2218
Twitter, IRC, GitHub: mattray

On Mon, Jan 23, 2012 at 7:46 PM, Adam Jacob adam@opscode.com wrote:

Think about using traditional HA tools, like Heartbeat/Corosync and
Pacemaker to do the actual promotion. No need to re-invent the wheel.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 22, 2012, at 11:30 PM, Bryan Berry wrote:

I am starting to work on master/slave streaming replication for postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine that
I can do it w/ some kind of script to check whether master is available or
not, then promote a slave. However, then I have make sure that a promoted
slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written something
that fulfills my requirements, please let know!

Cheers

BryanWB

On Tue, Jan 24, 2012 at 8:38 AM, David Leaver dave@smartrak.co.nz wrote:

You could take a look at repmgr: https://github.com/greg2ndQuadrant/repmgr

Does this project take the place of pgpool or are you supposed to use
in conjunction with something like that?

The native postgresql 9 hot standby / streaming replication takes the place of pgpool.
http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling#Features_in_the_Core_of_PostgreSQL

Repmgr has commands for doing the failover part and adding new nodes as slaves (or setting up a previous master to be a slave etc).
This should be much nicer than doing it manually. (I haven't used repmgr yet, setting it up is on my extended todo list).

-----Original Message-----
From: Tim Uckun [mailto:timuckun@gmail.com]
Sent: Tuesday, 24 January 2012 11:05 a.m.
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: postgres master/slave replication and failover to hot standby

On Tue, Jan 24, 2012 at 8:38 AM, David Leaver dave@smartrak.co.nz wrote:

You could take a look at repmgr:
https://github.com/greg2ndQuadrant/repmgr

Does this project take the place of pgpool or are you supposed to use in conjunction with something like that?

Repmgr has commands for doing the failover part and adding new nodes as slaves (or setting up a previous master to be a slave etc).
This should be much nicer than doing it manually. (I haven't used repmgr yet, setting it up is on my extended todo list).

According to their documentation Pgpool does SQL routing sending your
reads to the slaves and writes to the master, it also does sharding.
I guess I'll have to play with both to see which one is more suitable
but it looks like they could be used together too.

Interesting.

-----Original Message-----
From: Tim Uckun [mailto:timuckun@gmail.com]
Sent: Tuesday, 24 January 2012 11:05 a.m.
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: postgres master/slave replication and failover to hot standby

On Tue, Jan 24, 2012 at 8:38 AM, David Leaver dave@smartrak.co.nz wrote:

You could take a look at repmgr:
https://github.com/greg2ndQuadrant/repmgr

Does this project take the place of pgpool or are you supposed to use in conjunction with something like that?

thanks Adam, I am not familiar w/ those tools so I will take a look at them

On Mon, Jan 23, 2012 at 8:46 PM, Adam Jacob adam@opscode.com wrote:

Think about using traditional HA tools, like Heartbeat/Corosync and
Pacemaker to do the actual promotion. No need to re-invent the wheel.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 22, 2012, at 11:30 PM, Bryan Berry wrote:

I am starting to work on master/slave streaming replication for postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine
that I can do it w/ some kind of script to check whether master is
available or not, then promote a slave. However, then I have make sure that
a promoted slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written
something that fulfills my requirements, please let know!

Cheers

BryanWB

As a total n00b i haven't thought out all these parts yet. You are correct
that neither repmgr nor pgbouncer handle routing of reads and writes. I
will have to investigate pgpool for the distribution part.

On Tue, Jan 24, 2012 at 1:08 AM, Tim Uckun timuckun@gmail.com wrote:

Repmgr has commands for doing the failover part and adding new nodes as
slaves (or setting up a previous master to be a slave etc).
This should be much nicer than doing it manually. (I haven't used repmgr
yet, setting it up is on my extended todo list).

According to their documentation Pgpool does SQL routing sending your
reads to the slaves and writes to the master, it also does sharding.
I guess I'll have to play with both to see which one is more suitable
but it looks like they could be used together too.

Interesting.

-----Original Message-----
From: Tim Uckun [mailto:timuckun@gmail.com]
Sent: Tuesday, 24 January 2012 11:05 a.m.
To: chef@lists.opscode.com
Subject: [chef] Re: RE: Re: postgres master/slave replication and
failover to hot standby

On Tue, Jan 24, 2012 at 8:38 AM, David Leaver dave@smartrak.co.nz
wrote:

You could take a look at repmgr:
https://github.com/greg2ndQuadrant/repmgr

Does this project take the place of pgpool or are you supposed to use in
conjunction with something like that?

very exciting!

I very well may go w/ pacemaker for both a GFS2 cluster and postgres. In
that case, I may be able to help you finish that cookbook.

Anyone has any recommendations on ocfs2 vs GFS2 vs other? I would love to
hear them

On Mon, Jan 23, 2012 at 8:57 PM, Matt Ray matt@opscode.com wrote:

If you're interested in a corosync/pacemaker cookbook I started (but
have yet to finish), here you go:

https://github.com/mattray/barclamp_ha_service/tree/pacemaker_service/chef/cookbooks/pacemaker

It installs pacemaker and configures the master/server, it just
doesn't actually manage the services yet. I wanted to get fancy and
let Chef think it was still managing services even when Pacemaker was
managing them, but I got a little blocked and other stuff took
precedence. Even though it's in a Crowbar barclamp, I was testing it
with just Chef and hadn't touched the other Crowbar stuff (so just
grab that cookbook). There's also an existing drbd cookbook that works
on the community site.

Thanks,
Matt Ray
Senior Technical Evangelist | Opscode Inc.
matt@opscode.com | (512) 731-2218
Twitter, IRC, GitHub: mattray

On Mon, Jan 23, 2012 at 7:46 PM, Adam Jacob adam@opscode.com wrote:

Think about using traditional HA tools, like Heartbeat/Corosync and
Pacemaker to do the actual promotion. No need to re-invent the wheel.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 22, 2012, at 11:30 PM, Bryan Berry wrote:

I am starting to work on master/slave streaming replication for postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine
that
I can do it w/ some kind of script to check whether master is available
or
not, then promote a slave. However, then I have make sure that a promoted
slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written
something
that fulfills my requirements, please let know!

Cheers

BryanWB

If you are just looking for traditional HA with PostgreSQL, you might consider not using a clustered filesystem, and instead just doing block level replication and letting Heartbeat manage PostgreSQL as a resource. While clustered filesystems have come a long way, my opinion is that it's still a stretch to recommend them for data-intensive applications (like database masters can be.) If availability is your concern, think about DRBD+Heartbeat.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 26, 2012, at 12:38 AM, Bryan Berry wrote:

very exciting!

I very well may go w/ pacemaker for both a GFS2 cluster and postgres. In that case, I may be able to help you finish that cookbook.

Anyone has any recommendations on ocfs2 vs GFS2 vs other? I would love to hear them

On Mon, Jan 23, 2012 at 8:57 PM, Matt Ray matt@opscode.com wrote:
If you're interested in a corosync/pacemaker cookbook I started (but
have yet to finish), here you go:
https://github.com/mattray/barclamp_ha_service/tree/pacemaker_service/chef/cookbooks/pacemaker

It installs pacemaker and configures the master/server, it just
doesn't actually manage the services yet. I wanted to get fancy and
let Chef think it was still managing services even when Pacemaker was
managing them, but I got a little blocked and other stuff took
precedence. Even though it's in a Crowbar barclamp, I was testing it
with just Chef and hadn't touched the other Crowbar stuff (so just
grab that cookbook). There's also an existing drbd cookbook that works
on the community site.

Thanks,
Matt Ray
Senior Technical Evangelist | Opscode Inc.
matt@opscode.com | (512) 731-2218
Twitter, IRC, GitHub: mattray

On Mon, Jan 23, 2012 at 7:46 PM, Adam Jacob adam@opscode.com wrote:

Think about using traditional HA tools, like Heartbeat/Corosync and
Pacemaker to do the actual promotion. No need to re-invent the wheel.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 22, 2012, at 11:30 PM, Bryan Berry wrote:

I am starting to work on master/slave streaming replication for postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine that
I can do it w/ some kind of script to check whether master is available or
not, then promote a slave. However, then I have make sure that a promoted
slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written something
that fulfills my requirements, please let know!

Cheers

BryanWB

I haven't examined the other, more special purpose tools, but this is a
classic scenario for Zookeeper.
i.e. the master holds the lock on a well known node and the potential
masters in waiting queue up for that lock.
If the connection drops then one of those potential masters gets it and can
then configure as the master with confidence that it shouldn't be
un-mastered.

On Thu, Jan 26, 2012 at 9:32 AM, Adam Jacob adam@opscode.com wrote:

If you are just looking for traditional HA with PostgreSQL, you might
consider not using a clustered filesystem, and instead just doing block
level replication and letting Heartbeat manage PostgreSQL as a resource.
While clustered filesystems have come a long way, my opinion is that it's
still a stretch to recommend them for data-intensive applications (like
database masters can be.) If availability is your concern, think about
DRBD+Heartbeat.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 26, 2012, at 12:38 AM, Bryan Berry wrote:

very exciting!

I very well may go w/ pacemaker for both a GFS2 cluster and postgres. In
that case, I may be able to help you finish that cookbook.

Anyone has any recommendations on ocfs2 vs GFS2 vs other? I would love to
hear them

On Mon, Jan 23, 2012 at 8:57 PM, Matt Ray matt@opscode.com wrote:

If you're interested in a corosync/pacemaker cookbook I started (but
have yet to finish), here you go:

https://github.com/mattray/barclamp_ha_service/tree/pacemaker_service/chef/cookbooks/pacemaker

It installs pacemaker and configures the master/server, it just
doesn't actually manage the services yet. I wanted to get fancy and
let Chef think it was still managing services even when Pacemaker was
managing them, but I got a little blocked and other stuff took
precedence. Even though it's in a Crowbar barclamp, I was testing it
with just Chef and hadn't touched the other Crowbar stuff (so just
grab that cookbook). There's also an existing drbd cookbook that works
on the community site.

Thanks,
Matt Ray
Senior Technical Evangelist | Opscode Inc.
matt@opscode.com | (512) 731-2218
Twitter, IRC, GitHub: mattray

On Mon, Jan 23, 2012 at 7:46 PM, Adam Jacob adam@opscode.com wrote:

Think about using traditional HA tools, like Heartbeat/Corosync and
Pacemaker to do the actual promotion. No need to re-invent the wheel.

Adam


Opscode, Inc.
Adam Jacob, Chief Customer Officer
T: (206) 619-7151 E: adam@opscode.com

On Jan 22, 2012, at 11:30 PM, Bryan Berry wrote:

I am starting to work on master/slave streaming replication for
postgres,
based on the database cookbook
https://github.com/opscode/cookbooks/tree/master/database

Please note that I am a total n00b to postgres.

I have to figure out to manage failover from master to slave. I imagine
that
I can do it w/ some kind of script to check whether master is available
or
not, then promote a slave. However, then I have make sure that a
promoted
slave is not demoted on the next chef run.

If anyone has experience w/ these issues, I would much appreciate your
advice.

If anyone is working on a similar cookbook or has already written
something
that fulfills my requirements, please let know!

Cheers

BryanWB