Chef Database Growth

I have a chef server that has been running for about three weeks. Initially the backups were around 15MB - 25MB. However, about a week ago, the backups started doubling in size. to the point that I’m now seeing backups in the 1.2GB range. The server is still in testing and has at most 5 clients at any given time - so the massive growth is unexpected. This is the first chef server out of many that I’ve ever seen this behavior on.

Doing some digging it appears to be growth in the postrgres database. I’ve looked for options to prune or cleanup DB but cannot seem to find any. Not sure if it’s any help, but this is the path that seems to be consuming the most space:

/var/opt/opscode/postgresql/9.6/data/base/16387

Any suggestions would be appreciated.

Regards,

Jeff

One possible cause of the database growing even though you aren’t increasing the number of nodes is the database not being vacuumed often enough. See https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT for an explanation of what vacuuming is for. Normally this happens automatically with autovacuums, but there are a few situations in which this doesn’t work properly.

You can manually run a vacuum by running sudo chef-server-ctl psql to connect to the database and running VACUUM VERBOSE. This will only (temporarily) stop the database growth, and help identify if this is indeed the problem.

As your chef server is in testing, you can also try running VACUUM FULL VERBOSE. This will cause downtime on the chef server due to locks needed on the database while the vacuum is ongoing, but it will also reclaim any space used up.

If you find vacuuming is indeed the issue, you can begin investigating why autovacuuming isn’t doing the job, and possibly tweak the autovacuum settings (https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html#AUTOVACUUM). If that fails, you can try setting up a cron job to manually vacuum the database at regular intervals.

Hi Mark,

Thanks for the quick response. I ran the VACUUM FULL VERBOSE on all of the databases. This resulted in no changes to the backup size.

In the output, this was the biggest item that jumped out to me. Is this normal?

INFO: vacuuming "public.cookbook_version_checksums"
INFO: “cookbook_version_checksums”: found 0 removable, 20899968 nonremovable row versions in 342623 pages
DETAIL: 0 dead row versions cannot be removed yet.

Thanks,

Jeff

That does seem extremely high. Having nonremovable row versions is fine, it’s not removing those rows because they contain your data, but I wouldn’t have expected 20 million of them in cookbook_version_checksums. How many cookbooks (including older versions) do you have uploaded? And how big is the cookbook_version_checksums table (e.g. select count(1) from cookbook_version_checksums;).

If you aren’t expecting there to be lots of cookbooks, I’m guessing there is something uploading new versions of cookbooks or the same cookbook over and over (maybe a CI system), that’s causing that value to be high.

Here’s the results of the count. There is a CI tool that imports cookbooks in batch when it’s run. So I guess that’s a problem.

opscode_chef=> select count(1) from cookbook_version_checksums;
  count
----------
 37747062
(1 row)

How can I clean up this table?

Thanks,

Jeff

If you really do have a lot of cookbooks (and versions of them) then I would suggest using a utility such as this to help groom your your cookbooks on your chef server: https://github.com/majormoses/knife-cookbook-cleanup

We also have a tool called knife-tidy that will help with this: https://github.com/chef-customers/knife-tidy

A couple Chef customer engineers will be doing a webinar in a couple of weeks regarding this tool and how to keep your Chef server data clean: https://pages.chef.io/201803-Webinar-BestPracticesforKeepingYourChefServerTidy_Register.html