SANS Penetration Testing

Using Let's Encrypt in Google Compute Engine...


...Or How I Learned to Stop Worrying and Love the Cloud

By Daniel Pendolino

The Issue

While working on last year's SANS Holiday Hack Challenge, we came to the point where the target systems were stable and the mad rush had ebbed. At this point, I started focusing on long-term stability (read: I want Holiday Hack 2016 to maintain itself so I can start on other projects without being interrupted by failures in critical North Pole infrastructure systems). As you might expect, the North Pole runs on Google Cloud Platform, managed via the gcloud command (Santa seems to have a thing for cloud providers over the past couple of Holiday Hack Challenges... wait 'til you see what he does with his 2017 infrastructure... but I digress...)

To support my plans for long-term stability and minimized maintenance, I had all the non-stateful machines auto-revert to a known stable state on a regular schedule. That would make each machine fresh for all the hax0rs trying to save Christmas from the nefarious villain.

Here's an excerpt from the auto-revert script:

gcloud compute instances delete "$vm" --zone "$ZONE" -q

gcloud compute disks create "$vm" --zone "$ZONE" --source-snapshot "$vm-$SNAP"

gcloud compute instances create "${vm}" --boot-disk-auto-delete --zone "$ZONE" --disk name="${vm}",boot=yes,auto-delete=yes --address "${vm}" --machine-type "$MACHINE" --tags=http-server,https-server

Fantastic! This year we were able to make use of Let's Encrypt for all of our SSL certs. This gave us the ability to SSL enable targets without incurring significant costs. The downside? 3 month renewals! Normally this isn't a problem, but when your box reverts daily or weekly, this can get in the way. Every time we turn around, it's time to renew the darned things again. That's ok if you do this every year manually, but every three months starts to stack up. Clearly, manual intervention is not ideal, and Let's Encrypt supports automatic renewal for this exact reason. BUT...those renewed certs will be lost when the machine reverts, and spamming the nice (and free) Certificate Authority with numerous automatic requests isn't viable either.

The Solution?

So how can you have a virtual machine in Google's Compute Engine automatically revert while protected via a Let's Encrypt SSL Certificate, but still have that certificate renew every 90 days?

MULTIPLE DISKS! By adding another disk to each machine with SSL enabled, that disk can persist through the reverts, while the rest of the system goes back to a nice stable state. Here is the rough process to convert a machine:

    1. Create a standard disk and attach it to the instance (manually this time)
    2. Remote into the machine and format the disk (I used ext4)
    3. Mount the disk to a temporary location
    4. Copy current SSL configuration there
    5. Edit /etc/fstab and mount the new disk to where the SSL config is traditionally stored
    6. Backup the old configuration to completely different machine, just in case!
    7. Stop the web server and remount the new disk onto the SSL configuration directory
    8. PROFIT! (..hopefully)

If everything works as expected, the site should be up and running after the web server is restarted. Now the commands to revert the machine will look something like the following:

gcloud compute instances delete "$vm" --zone "$ZONE" -q
gcloud compute disks create "$vm" --zone "$ZONE" --source-snapshot "$vm-$SNAP"
gcloud compute instances create "${vm}" --boot-disk-auto-delete --zone "$ZONE" --disk name="${vm}",bo
ot=yes,auto-delete=yes --disk name="${vm}-disk-ssl",auto-delete=no --address "${vm}" --machine-type
"$MACHINE" --ta

*** Pay special attention to 'auto-delete=no' - it is crucial to not auto-delete the new disk with the SSL configuration it.


And from here on out, Let's Encrypt's auto-renew process will work to keep the SSL certificate valid, when all the while, the machine itself will be reverting to a (mostly) pristine state whenever needed =D.

Yay Automation!

Have fun!

-Daniel Pendolino
Counter Hack Ops Team


Posted May 26, 2017 at 4:29 PM | Permalink | Reply

Kevin W

Seems like a lot of work (slow and dependent on cloud build time) to build and destroy the entire VM each time. Any reason you're not using docker containers which could also go through continuous integration and treat it more like a microservice?
This is still good info and I'm already eager to see what you guys come up with this year!

Post a Comment


* Indicates a required field.