Chef server. Disaster recovering

Hello,

I wonder, lets say I have chef-server setup on ec2 instance and Postgres in RDS.Chef server manages around 10 Application nodes.

One day ec2 with chef server gets terminated and no way to restore. If I install a new chef server and point to old Postgres which is RDS with all data from old Chef server, will I be able to recover Chef at state it was prior termination or not?

What would be an options?

Thank you.

Is my question too trivial, or noone knows?

Are you storing your Chef Bookshelf on S3? I canā€™t answer authoritatively,
but I think most state exists in Postgres and the Bookshelf. Thereā€™s data
in Redis and RabbitMQ as well, but I think that data is meant to be
short-lived and may be okay to lose if youā€™re not worried about a
sqeaky-clean cutover.

Iā€™ll have to ping some of the server maintainers to find the exact current state of things, but what I know is this:

In the past, there were two sources of persistent data, the postgres db and the filesystem store for cookbook files. In addition there is a search index based on Solr which can be rebuilt from data in the database. If youā€™re going all-in on AWS, you can use RDS for the database, and S3 for the file store. If you do that, you can create a new server and at least all of your data will be there.

I know there was also some work done to let you use AWS elastic search (forget the brand name) instead of Solr, which would mean that you wouldnā€™t need to regenerate any data after recovering from a disaster and the Chef Server would be totally stateless. But I donā€™t recall if that work made it into a release yet.

Thereā€™s been a lot of work on this kind of stuff over the past few Chef Server releases so you may find more specific answers by reading the release notes for the past few Chef Server versions.

Thank you for your reply, it did help to clear things and I did make it work using S3 and PostgreSQL on AWS. Also I found in docs, that I can point VIP for rabbitmq as-well (https://docs.chef.io/config_rb_server.html#rabbitmq). Does it mean that I can separate Queueing as-well?

@kallistec Have you find out about ElasticSearch instead of Solr? That would help as-well, as I am using ElasticSearch for logs, and can dedicate some of it power for Chef.

Also couple questions on calculating CCRā€™s (I use this as reference https://docs.chef.io/server_components.html#ccrs-min):

  1. if I put things onto RDS and S3 and rest on server, how CCRā€™s will be calculated?
  2. Also in reference that I provided above, it showed on RAM extensive node, what about node with 4 CPU and 4 RAM, what CCR it will have?
  3. Is there a formula to calculate CCR?

Thank you.

Hi,

Have you find out about ElasticSearch instead of Solr? That would help as-well, as I am using ElasticSearch for logs, and can dedicate some of it power for Chef.

To use ElasticSearch as the search backend, you can put the following in chef-server.rb:

opscode_solr4['external'] = true
opscode_solr4['external_url'] = 'http://IP_FOR_ES:PORT_FOR_ES'
opscode_erchef['search_provider'] = 'elasticsearch'
opscode_erchef['search_queue_mode'] = 'batch'

chef-server-ctl reconfigure should create the required index in elasticsearch.

Also I found in docs, that I can point VIP for rabbitmq as-well (chef-server.rb Settings). Does it mean that I can separate Queueing as-well?

While I believe it is possible to run RabbitMQ externally, we don't test this confirmation and thus it likely isn't well supported by chef-server-ctl reconfigure and chef-server-ctl upgrade. If you are using ElasticSearch as your search backend and aren't using Chef Analytics, then it is possible to turn off RabbitMQ completely with the following configuration item:

rabbitmq['enable'] = false
rabbitmq['management_enabled'] = false
rabbitmq['queue_length_monitor_enabled'] = false
opscode_expander['enable'] = false
dark_launch['actions'] = false
  1. If I put things onto RDS and S3 and rest on server, how CCR's will be calculated?
  2. Also in reference that I provided above, it showed on RAM extensive node, what about node with 4 CPU and 4 RAM, what CCR it will have?
  3. Is there a formula to calculate CCR?

If you are using S3 for cookbook storage, RDS for Postgresql, and ElasticSearch for the search index, I would expect that fewer resources would be needed on the chef-server VM itself. This is especially true if you offload search to ElasticSearch as a good deal of RAM is used by Solr. Unfortunately, I can't give you hard numbers for what you can expect to see.

If you do use ElasticSearch for your search backend, please do not hesitate to report issues either on GitHub or here. The ElasticSearch support is rather new and we welcome feedback on it.

I hope this helps,

Cheers,

Steven

1 Like

Thank you for your help, it Indeed helped, I will try to go this direction and see how it works.

Meanwhile, I put PostgreSQL onto RDS and everything else onto virtual instance. When I destroyed instance and tried to re-create, I got following errors:

  1. I think it was about opscodes_chef database already exists, and that I have to get rid of it.
  2. After when I got rid of databsases, it started complain about users that has to be removed.

Is that expected, or without Solr/Elasticsearch I have to drop and create database again?

I can go through process one more time if it necessary and provide precise errors.

Hi Vasilij,

Were you able to get this figured out? I am looking to try the same thing
in the near future, and was curious to see where you landed.

Thanks,
Ameir

@ameir I was able to plug external PostgreSQL to Chef, and I failed with S3, I opened tickets and awaiting response:

And I opened issued in GitHub (hope will not be closed):

Also, if you are using Ubuntu 15.10, probably you should be aware:

And as mentioned above, if you instance/node goes down, you will not be able re-create it back, as it will ask you to clean DB.

Hope it helps.

Sincerely,
Vasilij

I am apologise for a spam, but I am waiting for 2 weeks on somebody just to give me a tip that I am wrong, even on chef-server repo in issues I can not get a clue of what might be wrong?

Hi Vsyc,

I had similar problems, hereā€™s the config I used that seems to work:

bookshelf['access_key_id'] = 'access_key'
bookshelf['secret_access_key'] = 'secret'
opscode_erchef['s3_bucket'] = 'my-bucket-name'
bookshelf['external_url'] = 'https://s3-eu-west-1.amazonaws.com'
bookshelf['vip'] = 's3-eu-west-1.amazonaws.com'
bookshelf['enable'] = false

I think the key was having https on the bookshelf['external_url'] attribute, but not on the bookshelf['vip']

Cheers
Kieran

Thank you @kdoonan your way did work for me as-well. The reason is that I add bucket name to vip and external_url, the reason for that, is that I had bucket name with periods, and it was complaining that I have to include bucket name into hostname as-well or something like that.

https can be in vip as-well, at least it worked for server 12.4.1