Important Compliance Outage Information On Automate 2 April 15th Upgrade

To support some new and upcoming features (control filtering for Compliance, IAMv2), we need to migrate the data for compliance results to a new format.

For most users, this migration will happen quickly and in the background. However, if the system has tens of gigabytes of compliance results for the current day, the migration may take hours.

In order to protect the integrity of the data in the system while the current date's results are being migrated, the system will be unresponsive in the following ways:

  • Compliance APIs and UI (Compliance page, Scan Jobs, Asset Store) will not be responsive
  • Scan jobs and incoming scan reports (from audit cookbook
  • or inspec exec) will not be processed

The process starts by migrating the current day's data, and then works its way back through history from there. Once the current day's data is migrated, the system will be responsive again. At this point the historical data will be migrated in the background. During this time, the results for previous days may appear inconsistent. We expect this process to complete at a rate of about 6 GB of index data per hour.

To make this process as painless as possible, we're going to promote this release to Automate's "current" channel at 00:01am UTC (5:01 PM PDT). For customers that have automatic upgrades enabled, this will ensure that most customers have very little data in the current day's results.

We will also recommend that customers upgrading manually do so at 00:01 UTC or shortly after to minimize downtime.

Customer Impact

All customers will have a period of time where Compliance APIs and UIs (Compliance page, Scan Jobs, Asset Store) are non-responsive and incoming scan reports will not be processed. Chef-client runs will not be affected during this migration, and Event Feed, Client Runs and Settings UI will not be affected. The length of this period is solely determined by the amount of data in the current date and the throughput allocated (CPU, IO, etc.) to your environment. Additionally, some performance impact could continue to occur as older data is migrated to the new format based on your hardware profile and the resources assigned to the various Automate services. We recommend the following steps be taken to ensure a painless experience:

  1. Ensure that your system has an appropriate amount of heap memory assigned to Elasticsearch: https://automate.chef.io/docs/configuration/#setting-elasticsearch-heap

  2. Schedule the upgrade to occur as close to 00:01 UTC as possible to mitigate the length of time necessary to convert the most recent days data. Customers with automatic upgrades enabled will have this change initiated at 00:01 UTC to mitigate downtime. If you have automatic upgrades configured and wish to change it, you can toggle this feature on and off by following the process described at https://automate.chef.io/docs/install/#disable-automatic-upgrades

  3. Test the upgrade in a non-prod environment prior to upgrading if you have more than a few GBs of data. Monitor your resource consumption to ensure you have enough throughput and, if necessary, allocate more resources to allow for minimal impact

  4. Make sure all other resource intensive processes (such as backups, reindexing, etc.) are disabled or run at another time before or after the upgrade

  5. If you have problems, notify support to aid you in resolution via https://www.chef.io/support/get-started/