How to Automate Your Cloud Resource Cleanup – for Good
Find leaky resources, manage quota limits, detect drift, and clean up… It’s the stuff that no engineer wants to deal with. Especially with today’s cloud environments reaching unmanageable levels of complexity.
Cleanup usually means hand-jamming resources with shell scripts or in your cloud console. But new open source technologies allow you to achieve a constant state of visibility & cleanup – automatically. This does not only save precious engineering time in the present – it makes cleanup (and resource efficacy) truly scalable.
Well-Architected is a Two-Edged Sword
A well-architected environment separates resources and workloads into multiple AWS accounts. And the more accounts you have, the harder it becomes to navigate your infrastructure. You have to sift through every single account to find what you’re looking for.
Cloud Infrastructure Has Changed
Cloud-native architectures have caused a change in the way companies operate their infrastructure. What used to be manual processes and ITIL practices is now infrastructure-as-code and self-service automation.
Developers are in the driver’s seat—they determine what cloud services they want to use, and spin up resources as needed. The goal is feature velocity, and to ship new digital products faster. But—the larger and more distributed an organization gets, the more challenges these new dynamic environments create.
CI/CD pipelines and tear down jobs fail. Things break and don’t get cleaned up—the result is “drift” and a rising cloud bill. Despite garbage collection, “stuff” is leaking. Artifacts get left behind, growing the number of orphaned resources. It’s not just servers, databases and VPCs. It’s also things like accounts, SSH keys, IAM policies, roles and certificates, volumes and their snapshots, and dynamic IP addresses. It’s a long tail of resources.
Drift = Downtime
Drift can cause all sorts of downtime issues. Today’s infrastructure management tools don’t offer a solution. They do a good job of managing resources they know about. But they do a poor job managing resources they didn’t create.
For SREs in charge of maintaining cloud infrastructure, it’s a drag. They are typically outnumbered by developers by a factor of 50-80x. At the same time, SREs are expected to build features that increase development velocity. Over time, that widens the “gap” between the desired state and the actual state of the infrastructure.
A New Approach to Mapping Your Cloud Infrastructure
Your cloud infrastructure is really a graph, with lots of dependencies. So why is it then that all the cloud tools out there give you a “rows & columns” view of your resources? What if there’s a way to visualize & automate the cleanup of orphaned resources?
Some new cloud solutions automatically clean up drift that shouldn’t be there and close the gap. They can index resources, capture dependencies, and map out your infrastructure in a graph so that it’s understandable for a human being – all from the CLI command line.
The graph contains attributes for each resource. Developers and SREs can search the graph with a query language, and create event-based workflows that automate high-value but also high-effort tasks such as deleting unused resources or documenting cloud inventory for audit purposes. The cleanup workflows run on a schedule—you can just sit back, relax, and let the workflows do the heavy lifting.
Cleanup Automation: The Practical Side
It doesn’t matter if you use Amazon Web Services (AWS) or Google Cloud Platform (GCP) or both – this is a horizontal product that supports both. What’s even better is that you do it via a command-line interface (CLI), which engineers are comfortable working with.
It all started with collecting metrics from AWS and GCP cloud services most relevant for users, and then expanded, offering a number of supported services based on feedback.
This solution collects bare-metal information, hardware specifications differ, even for the same type of instance, and gives developers estimates about the fastest and/or cheapest hardware per region, and once they have the instance they can know exactly what they got. It also integrates with Terraform and shows the difference between the planned state and the current state of infrastructure in our CLI.
On top of the CLI, there is a UI/UX that makes navigation with the resource graph intuitive. Someone can search, navigate and click on resources in the graph, and annotate and build workflows across all resources. And there’s optional “approval workflow” within these tools (” I will delete these resources. Are you OK with that?”). This is basically everything that a developer would want!
Bottom Line – Is Cleanup Automation Made for You?
If you want to:
- Build a resilient cloud infrastructure in a growing, cloud-native company
- Clean up drift from aborted Terraform runs and broken CI/CD pipelines
- Bring order to large multi-account structures
- Reduce cloud spend in a low-touch, zero-cost manner, to a level that your CFO will leave you alone forever
This is the tool for you. Contact us for implementation that’s quicker than this 3-min read.