Chef in a Windows monoculture - success examples?

Hi all -

Could anyone here point to sites of over 100 nodes that have successfully used Chef for infrastructure automation when the node install base is over 90% Windows?

This has nothing to do with the technology (I know chef-client works just fine on Windows), but whether folks in a Windows monoculture, with experience working in Powershell and running Win2012r2, are going to embrace a Ruby-based system with servers running on Linux.

I’d like to point some skeptics I’m working with to success stories where Windows shops have embraced Chef, run their own ChefServer, and succeeded with it. If you can point to your own success (anonymously) or folks you’ve worked with directly, that would be a big help.

Thanks in advance!

Peter

I work for a SAAS provider with 100% of the front end running on windows servers. The back end is a mix of windows and linux, with linux growing in popularity.
We have well over 100 windows 2012 machines that are fully managed with chef including joining an active directory domain. We destroy and recreate all web servers on a monthly basis using knife.

I’m happy to go into more details about our infrastructure if you wish to contact me privately.

Also, Nordstroms.com is a heavy windows + chef user. You can checkout their github page to see the cookbooks they’ve opensourced.

How do you use Chef to join to AD? How do you handle the reboot part of the domain join process?

I always thought AD join should be part of a provisioning step instead of in config management.

There are a couple of active directory join cookbooks.
https://supermarket.chef.io/cookbooks?utf8=✓&q=active+directory&platforms[]=

chef has a native reboot resource for windows.

We wrote this ad-join cookbook that lays down a scheduled task that runs chef again after the reboot.

Yikes that’s a lot of handling.

Not to hijack this post, but I’m confused why this wouldn’t be better served as a provisioning step.

It would be rare a computer would ever be disjoined and considering this is a one time “config” it doesn’t make sense to me to have it in CM. If a computer is disjoined in production it could be related to serious issues worth investigating, not sure I would want chef to kick off and rejoin it.

I’m using Chef with Windows myself and I’m new to this stuff so curious about why you choose to domain join with Chef.

Does anyone have an example of the ad-join cookbook using chef-vault to secure the credentials? Im using chef for windows builds in AWS. Any help would be greatly appreciated!

Best-John

Thanks for weighing in Spuder. I'm glad to hear that Chef is working out for you w/ the FE windows servers. I have no question around whether Chef works in Windows environment as a technology, but whether it succeeds in a Windows culture.

To make an analogy, if 10 years ago you'd come to me in my Linux shop and asked me to trust my automation to a tool running on, and built for, Windows, I would've been deaf and blind to any strengths of the tech as such, because the culture around operations would have been so foreign to me.

I worry that Chef brought in by management or a few enthusiasts to Windows-centered shops will fail because the dominant culture will be infertile ground, perhaps despite the best intentions of the dev&ops teams.

So, I'd like to know if there are case studies where Chef has succeed in shops with essentially zero Unix footprint. I posted the same question on Twitter and have two examples:

1 - MSN https://channel9.msdn.com/Shows/DevOps-Dimension/3--MSN-and-Universal-Store-Combining-PaaS-with-Configuration-Management (Thanks to Matt Ray)

2 - NCR Corp (Thanks to Michael Hedgepeth)

Are those the only two? And what needed to happen internally for Chef to succeed? Michael already noted

@pburkholder I've found @chef to be an easy sell to development org, especially if I show them powershell integration
@pburkholder we are 90% windows - the *unix acceptance came before this - the traditional windows culture was the biggest barrier

Others ideas how to break through traditional windows culture?

Thanks! -Peter

This is a great topic, about which I have a lot of thoughts. I have spent most of my career in a Microsoft technologies-dominant culture and so I’m well aware of the thoughts within that culture.

For me it all boils down to the business case. Do we need to do configuration management with Chef or do we need to use System Center or some other related technology? It’s a great question. People will likely want to use the “Microsoft” approved solution because, frankly, they’ve built their careers around using the Microsoft-approved solution! So don’t fight that.

Instead, find a way to demonstrate that the current process isn’t working. Are you using GPOs for managing configuration state on the active directory? Let’s do a compliance scan against your nodes and see how they line up with the CIS benchmark for Windows Server 2012. Oh wait…it’s total chaos. Why? Hint: people are using remote desktop to make your system unmanageable by making one-off changes…everywhere. Another hint: this is absolute insanity.

Can you get a machine up and running quickly? Whey not? Would chef help with that?

If you need to configure a third party tool like monitoring or logging, can you do that effectively? Sure it’s great when all you do is Microsoft and it all fits together nicely, but is that realistic?

What happens to your operations costs when we take away the UI when looking at the MSFT stack (or even Windows Server 2016)?

Do you want to go to azure? Do you realize that going to azure without an automation plan is like buying a tank, driving around a city (your business) and pushing random buttons? It’s going to cause damage if you don’t have a radical change towards automation. In other words, the problems you have been facing related to scale do not have anything to do with the fact that you had to call Dell before to get hardware racked. It’s everything after that too! So will System Center help you there?

The answers to all these questions, as with many technologies, is…maybe Microsoft is the best way, but usually not. That’s another quite irritating aspect of Microsoft stuff. It can do everything. It solves everyone’s problems. So when you’re in this environment look at the results!. Don’t let the MSFT sales person or the single excited Microsoft fan get you sucked into ignoring common sense for your business. If the tools you are using don’t drive you to the outcomes you want, then consider changing the tools and the culture behind those tools (the people).

The real question is what level of support do you need to get these things done? I think Microsoft is a fantastic platform for enterprise-level development and they have an excellent cloud solution for enterprises. But they also have a long legacy and entire culture centered around the message that you can do IT with little training and a few button clicks. By the way, this is the exact culture that Jeffrey Snover has fought for years and years. Snover has done great things, but it’s important to remember that the culture still exists and is going strong.

So as a business who do you want to align with? Sure, you have a strong and great history with Microsoft and an entire staff that knows about it. But you also need a partnership with another company to get you to where you want to be in above opportunities. Chef is an excellent choice in this regard. You have a whole group of people at Chef Inc. who really get Microsoft (like Matt Wrock, Steve Murawski, and Stuart Preston (a partner), Jessica DeVita, Trevor Hess (a partner) to name a few). This core brings Windows into the Chef ecosystem as a first class citizen. They advocate for DSC and align themselves with PowerShell/Snover. It’s a fantastic Windows configuration management platform.

Also for a large 100+ node organization the other sell is that having a relationship with Chef gets you access to those best practices and people to accelerate the transformation. The consulting I’ve gotten from Chef regarding my approach is probably more valuable than the software itself because it has been absolutely critical to get us to the point where we can take advantage of the software.

OK so to not ignore one more question you had: what about linux?

If you focus on the business outcomes, create an early adopter groundswell of support, then the linux question should solve itself. If it doesn’t someone is being an asshole. If that’s true take them to lunch and understand their needs, then incorporate that into your overall strategy. If they still don’t listen then by this point they’re clearly being an asshole, so make that reality visible to leadership and work towards getting around that person. The fact of the matter is that a company whose leadership is incapable of taking advantage of fantastic strategic and ROI business opportunities because of a few people who can’t handle learning another OS is not one with a bright future. Someone at some level should be able to see this.

If they fail to see it after all that, then you are indeed on a sinking ship. That would be quite depressing if there weren’t so many non-sinking ships all around you that will embrace and love what you’re doing. :slight_smile:

2 Likes

I agree with everything you have to say here, Michael.

I’d still love to find more examples besides NCR and MSN.

Anyone??

Just adding our use case.

~40% of our fleet is on windows.

We use chef for everything (windows installation, domain join, windows updates, first application deployment, …) and have found this method pretty reliable so far.
Full automation has allowed us to scale from 2k windows nodes to >7k.

Main difficulties:

  • windows sometimes set some registry keys to ask for reboot (that are read by reboot_point resource from windows cookbook)
  • theorical debates between full chef vs dsc vs domain gpos. It was much harder to evangelize our windows admins than the linux sysadmins.
  • chef self upgrade and everything where files are modified by chef while being used
  • time to install a node (around 6hours because of windows updates+ reboots)

So far we are pretty happy about it.

@Gregiore_Seux,

We’re a company where 98% of our fleet is Windows. We’re still early on in our adoption of Chef. Currently, we’re only configuring the base operating system with Chef. Would you mind sharing more details about how you go about updating Widows with Chef? We haven’t explored patching with Chef yet and currently our patching process takes place outside of the Chef configuratoin process.

Thanks!

Seth

Hi All,
Im new to Chef (July 16’) but have been doing Windows admin at an Enterprise level for a while. My organization is strong on linux but migrating to AD from LDAP (yahoo). This will be a global AD and in addition we also have a few Enterprise application environments on Windows OS. We are truly hybrid cloud/onprem and have a mandate to do all systems configuration programatically. Our devops stack is jenkins/terraform/chef/git/artifactory also looking at ansible.

So me being the only Windows guy embracing devops has been an interesting challenge.Ive got my base windows server deploying to AWS from Terraform (with a packer built ami) and am currently testing domain-join and package installs. Definitely running into some tough issues esp around packages. Is anyone else deploying from internal repos? if so are you using chocolatey or choco_package?? Also any tips around domain_join? im using chef vault for passing credentials but keep getting random failures…have yet to get domain join to work.

Be great to get an online chef/windows subgroup going!

Cheers-John Dito

On the topic of packages, we’ve been utilizing the built in windows_package resource to install .exe and .msi packages. We deliver the installation files from an internal repo (a Windows Server hosting files through IIS). We deliver packages to the system with the remote_file resource prior to installing them with windows_package

To do domain joins, we use the execute resource and call the NETDOM JOIN command. Here’s a scrubbed snippet of our resource. We deliver sensitive information such as the service account name and password through encrypted databags.

execute "join #{domainName}" do
  command "NETDOM JOIN #{hostName} /domain:#{domainName} /OU:\"#{OU]}\" /USERO:\"#{domainServiceAccount}\" /PASSWORDO:\"#{domainServiceAccountPassword}\""
  sensitive true
  notifies :request_reboot, 'reboot[domain_join_reboot]', :immediately
  not_if "NETDOM VERIFY #{hostName} /domain:#{domainName} /OU:\"#{OU}\" /USERO:\"#{domainServiceAccount}\" /PASSWORDO:\"#{domainServiceAccountPassword}\""
end

Once again, this resource is scrubbed to demonstrate how we perform domain joins and doesn’t represent the actual resource we execute.

Seth, Thanks for that, if domain_join continues to be pesky Ill have to try that. Would you mind including a sample of your windows_package usage? Does remote_file work with a URL? So could I specify a URL that would download the binary>move it to a location then execute?

Thanks!-John

Yes, we use artifactory, remote-file, and package to do Windows installs.

@freimer Thanks!!

@chainsawbuddha, here’s an example of how we use remote_file and windows_package:

remote_file 'C:\chef\agents\[Agent Name]' do
  source 'http://[URL to IIS Repo]/[Agent Name]'
end

windows_package 'install [Agent Name]' do
  source 'C:\chef\agents\[Agent Name]'
  options '[Options]'
end
1 Like

I use an EC2 User Data script that uses PowerShell to do the following on a newly spun up/bootstrapped EC2 instance: 1) rename the hostname to an EC2 tag value, 2) set the DNS server values, 3) join the node to an AD domain, 4) install the chef-client and run the initial run. I had problems using cookbooks to rename hostnames and add to domains (problems with incorrect FQDNs breaking the communication) so for me, installing the chef client after renaming hostnames, joining domains, etc was the best solution. It’s kind of a rube goldberg machine as it creates a scheduled task that is set to run immediately after reboot, which runs a script to install/configure the Chef client (then delete the scheduled task). My department is an all-Windows setup and I’m responsible for managing the 400+ nodes in all environments with Chef.

To any of you renaming WIndows hostnames or joining to AD domains via cookbooks, do you not have any issues of it breaking the communication with the Chef server because the FQDN has changed? And/or, what are you using for a chef node name? If you install the Chef client before it’s in a domain, what do you use? Currently, I add standalone EC2 nodes as “hostname.ec2.internal”.

We don’t rename hostnames, but we do domain joins, so the FQDN does change. We mitigate this by putting a node_name entry in client.rb that represents the FQDN of the system being built. This way the system calls into the chef server using the same name regardless of whether it’s attached to a domain or not.

1 Like