WordPress / Git management with Chef

I’m currently running a WordPress cluster on AWS for livemocha.com. This
has been interesting (especially overlapping it with the existing legacy
site), but one of the big problems that I’ve been facing is trying to keep
the themes/plugins/etc consistent across the EC2 nodes. WordPress gets
unhappy if you activate a new plugin on one node and then can’t find it on
another node. The plugin activation state is stored in the DB, not the
local fs, so when it finds it missing it de-activates it across the
board… not helpful.

So what I currently do is put the homepage into maintenance (by shoving all
relevant requests over to the login page for the actual site), point my
hosts file to a build host (that does double-duty as the cron host), do my
updates, build a new AMI, spin down the old instances and spin up new
instances. When everything is copacetic, I then take the site out of
maintenance. This is, needless to say, time consuming. The timeframe from
queuing an AMI build to out of maintenance is about 20 minutes, let alone
the time taken to update things and test them.

Updates are becoming frequent, so I want to trim the amount of time this
process takes by updating the nodes already in service and saving full AMI
rebuilds for things like major OS updates (kernel updates, etc). I’ve
thought of a few different ways to handle this, but the one that I’m
currently leaning towards is a mix of Git and Chef. Basically I would put
the site into maintenance, do my changes on the build host, and if it
passes testing, then I would commit the changes into Git. This would
essentially be everything underneath the WordPress wp-contents/ directory
(if WordPress core updates become a big enough issue as well, then I can
expand it to be the entire web root, but for right now I’ll avoid that).
Once everything is pushed up in Git, I’ll then use Knife SSH to trigger a
chef run across the cluster that will do a git pull and ideally bounce
Apache (or I can use Knife to do that as well; either way, not a big deal).
If it’s something like a new plugin being added, I don’t even have to put
it into maintenance at all, just commit it in Git and push away with Knife.

Has anyone done something along these lines before? Can you think of a
better way to handle it? I thought about using s3fs to update files
in-place, given that APC would (or at least, should…) cache the PHP files
so that the latency on s3fs’s io wouldn’t (or at least, shouldn’t) be a
concern. However, I’m not 100% confident that it would perform up to snuff,
and if s3fs errors out, then the website goes offline, which my boss and I
agree makes it a non-starter approach. I imagine that I could store all the
files on the Chef server and do a recursive file pull, but I’m thinking Git
is simpler and easier to manage. It’ll also make tracking changes from our
theme designer a bit easier, too, which is one of the reasons I thought
about this in the first place.

Assuming that folks don’t point out some idea why this is bad, is the a
specific cookbook/LWRP that I should use when setting up this recipe? Also,
is there an easy way for me to configure this so that the chef-client
service wouldn’t update on every git commit (like the theme dev pushing
stuff up to us) but allow me to kick it off specifically via Knife? If not,
then I’ll just use different branches… the dev will push to a testing
branch, and then I’ll push from there to a release branch that Chef is
pulling from. I just want to make sure that he doesn’t make a tweak that
suddenly appears on the site before it’s ready.

Thanks for all the help!


~~ StormeRider ~~

“Every world needs its heroes […] They inspire us to be better than we
are. And they protect from the darkness that’s just around the corner.”

(from Smallville Season 6x1: “Zod”)

On why I hate the phrase “that’s so lame”… http://bit.ly/Ps3uSS

If you've got multiple nodes but not directly sharing the data between
them (hosting from NFS or whatever) you could probably use lsyncd to
handle replication for you. Chef would allow you to easily manage that too.
Lsyncd uses the dnotify hook on a kernel level, so when a file gets
changed under a monitored directory it triggers off an action, such as
an rsync off to relevant boxes.

Paul

On 10/09/2012 10:27 AM, Morgan Blackthorne wrote:

I'm currently running a WordPress cluster on AWS for livemocha.com
http://livemocha.com. This has been interesting (especially
overlapping it with the existing legacy site), but one of the big
problems that I've been facing is trying to keep the
themes/plugins/etc consistent across the EC2 nodes. WordPress gets
unhappy if you activate a new plugin on one node and then can't find
it on another node. The plugin activation state is stored in the DB,
not the local fs, so when it finds it missing it de-activates it
across the board... not helpful.

So what I currently do is put the homepage into maintenance (by
shoving all relevant requests over to the login page for the actual
site), point my hosts file to a build host (that does double-duty as
the cron host), do my updates, build a new AMI, spin down the old
instances and spin up new instances. When everything is copacetic, I
then take the site out of maintenance. This is, needless to say, time
consuming. The timeframe from queuing an AMI build to out of
maintenance is about 20 minutes, let alone the time taken to update
things and test them.

Updates are becoming frequent, so I want to trim the amount of time
this process takes by updating the nodes already in service and saving
full AMI rebuilds for things like major OS updates (kernel updates,
etc). I've thought of a few different ways to handle this, but the one
that I'm currently leaning towards is a mix of Git and Chef. Basically
I would put the site into maintenance, do my changes on the build
host, and if it passes testing, then I would commit the changes into
Git. This would essentially be everything underneath the WordPress
wp-contents/ directory (if WordPress core updates become a big enough
issue as well, then I can expand it to be the entire web root, but for
right now I'll avoid that). Once everything is pushed up in Git, I'll
then use Knife SSH to trigger a chef run across the cluster that will
do a git pull and ideally bounce Apache (or I can use Knife to do that
as well; either way, not a big deal). If it's something like a new
plugin being added, I don't even have to put it into maintenance at
all, just commit it in Git and push away with Knife.

Has anyone done something along these lines before? Can you think of a
better way to handle it? I thought about using s3fs to update files
in-place, given that APC would (or at least, should...) cache the PHP
files so that the latency on s3fs's io wouldn't (or at least,
shouldn't) be a concern. However, I'm not 100% confident that it would
perform up to snuff, and if s3fs errors out, then the website goes
offline, which my boss and I agree makes it a non-starter approach. I
imagine that I could store all the files on the Chef server and do a
recursive file pull, but I'm thinking Git is simpler and easier to
manage. It'll also make tracking changes from our theme designer a bit
easier, too, which is one of the reasons I thought about this in the
first place.

Assuming that folks don't point out some idea why this is bad, is the
a specific cookbook/LWRP that I should use when setting up this
recipe? Also, is there an easy way for me to configure this so that
the chef-client service wouldn't update on every git commit (like the
theme dev pushing stuff up to us) but allow me to kick it off
specifically via Knife? If not, then I'll just use different
branches... the dev will push to a testing branch, and then I'll push
from there to a release branch that Chef is pulling from. I just want
to make sure that he doesn't make a tweak that suddenly appears on the
site before it's ready.

Thanks for all the help!

--
~~ StormeRider ~~

"Every world needs its heroes [...] They inspire us to be better than
we are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS

I was doing something like this where I had nginx in front of wordpress,
with lsyncd replicating data from a master server to the read-only slave
I had. If you don't want a release process and just want to be able to
update a cluster of wordpress servers as easily as updating one server
then you can force admin traffic to SSL in your wp-config.php:

define('FORCE_SSL_ADMIN', true);

Then you need to route https traffic to only one designated master, and
then synchronize off of that to the slaves -- i used lsyncd to make it
practically real-time to do updates. That worked, but for my personal
wordpress site it was overkill and I just went to taking backups and
automating builds rather than maintaining constant high availability.

So, that should make it so that you can do wordpress software and plugin
updates in the admin console and which will then be replicated to the
slaves via lsyncd, which will be very snappy. You probably could replace
this with git if you want to. If you want to test updates first, then
make the master not take external traffic and instead of lsyncd use
git/rsync after you've Q/A'd it and decided to push to the slaves...
You could even use a hybrid approach where you had one master production
server which used lsyncd to the rest of the prod servers, then you
pushed to that master via git or rsync or chef or whatever release
mechanism you like.

s3fs supports rsync so you could rsync from the master to s3, then rsync
down to the clients. That might be 'nice' in also giving you a backup
copy of your data in s3 (and you could timestamp copies so that you
could rollback, etc). I don't think you'd run into too many i/o
problems with S3, but I don't know how big your dataset is... You
probably want to spend a little time monitoring and writing some
automatic recovery scripts to deal with sick s3fs mountpoints since I've
seen that happen -- that could be as simple as a script that ran once a
minute and touched a file on s3fs and if it error'd out it tried to
remount the filesystem and if that failed then killed the software and
raised an alert or something... I don't know what the compelling
reasons would be to use git vs. rsync-over-s3fs or vice versa...

I'd also most likely keep the scripts that did the synchronization
managed and pushed with chef, but external to chef so that they could be
executed outside of a chef-client run for ease of hitting them with
knife ssh and doing quick pushes (and then also hit them within
chef-client to ensure that all servers were converged on a schedule).

I also used lsyncd across the whole root of wordpress to synch the
software as well, and this required quite a bit of hacking up the
opensource wordpress cookbook since it tends to assume that you're
running on one server and likes to install a specific version of the
software and likes to think it owns the database (or at least did, I
haven't looked at it in months...)

On 10/9/12 1:27 PM, Morgan Blackthorne wrote:

I'm currently running a WordPress cluster on AWS for livemocha.com
http://livemocha.com. This has been interesting (especially
overlapping it with the existing legacy site), but one of the big
problems that I've been facing is trying to keep the
themes/plugins/etc consistent across the EC2 nodes. WordPress gets
unhappy if you activate a new plugin on one node and then can't find
it on another node. The plugin activation state is stored in the DB,
not the local fs, so when it finds it missing it de-activates it
across the board... not helpful.

So what I currently do is put the homepage into maintenance (by
shoving all relevant requests over to the login page for the actual
site), point my hosts file to a build host (that does double-duty as
the cron host), do my updates, build a new AMI, spin down the old
instances and spin up new instances. When everything is copacetic, I
then take the site out of maintenance. This is, needless to say, time
consuming. The timeframe from queuing an AMI build to out of
maintenance is about 20 minutes, let alone the time taken to update
things and test them.

Updates are becoming frequent, so I want to trim the amount of time
this process takes by updating the nodes already in service and saving
full AMI rebuilds for things like major OS updates (kernel updates,
etc). I've thought of a few different ways to handle this, but the one
that I'm currently leaning towards is a mix of Git and Chef. Basically
I would put the site into maintenance, do my changes on the build
host, and if it passes testing, then I would commit the changes into
Git. This would essentially be everything underneath the WordPress
wp-contents/ directory (if WordPress core updates become a big enough
issue as well, then I can expand it to be the entire web root, but for
right now I'll avoid that). Once everything is pushed up in Git, I'll
then use Knife SSH to trigger a chef run across the cluster that will
do a git pull and ideally bounce Apache (or I can use Knife to do that
as well; either way, not a big deal). If it's something like a new
plugin being added, I don't even have to put it into maintenance at
all, just commit it in Git and push away with Knife.

Has anyone done something along these lines before? Can you think of a
better way to handle it? I thought about using s3fs to update files
in-place, given that APC would (or at least, should...) cache the PHP
files so that the latency on s3fs's io wouldn't (or at least,
shouldn't) be a concern. However, I'm not 100% confident that it would
perform up to snuff, and if s3fs errors out, then the website goes
offline, which my boss and I agree makes it a non-starter approach. I
imagine that I could store all the files on the Chef server and do a
recursive file pull, but I'm thinking Git is simpler and easier to
manage. It'll also make tracking changes from our theme designer a bit
easier, too, which is one of the reasons I thought about this in the
first place.

Assuming that folks don't point out some idea why this is bad, is the
a specific cookbook/LWRP that I should use when setting up this
recipe? Also, is there an easy way for me to configure this so that
the chef-client service wouldn't update on every git commit (like the
theme dev pushing stuff up to us) but allow me to kick it off
specifically via Knife? If not, then I'll just use different
branches... the dev will push to a testing branch, and then I'll push
from there to a release branch that Chef is pulling from. I just want
to make sure that he doesn't make a tweak that suddenly appears on the
site before it's ready.

Thanks for all the help!

--
~~ StormeRider ~~

"Every world needs its heroes [...] They inspire us to be better than
we are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS

On Tue, Oct 9, 2012 at 9:27 PM, Morgan Blackthorne
stormerider@gmail.com wrote:

I'm currently running a WordPress cluster on AWS for livemocha.com. This has
been interesting (especially overlapping it with the existing legacy site),
but one of the big problems that I've been facing is trying to keep the
themes/plugins/etc consistent across the EC2 nodes. WordPress gets unhappy
if you activate a new plugin on one node and then can't find it on another
node. The plugin activation state is stored in the DB, not the local fs, so
when it finds it missing it de-activates it across the board... not helpful.

Apologies for not answering your question but I'm facing a very
similar issue to this shortly so this is very well timed for me! Be
good to hear the solution.

If you could share some of cookbooks/recipes for managing the WP
cluster that would help me a great deal as well.

I'd also be very interested in any scripts you have to check Wordpress
and plugin theme version to detect if an update is required.

Cheers,

Andy

--
Andy Gale

http://twitter.com/andygale
https://alpha.app.net/andygale