2nd October, 2023
12 Min read
Book a Demo
Amazon is locked in the race against rising energy and infrastructure costs. Their per-instance pricing has halted its freefall after years of relentless reductions; at the same time, external pressures are threatening a big squeeze on organizations’ budgets. Identifying and understanding AWS spend is quasi-analytical – controlling it is wholly cultural. Unifying these requires a fluency with FinOps that many organizations still need to develop.
AWS and other hyperscalers drive an immense amount of innovation, with total cloud expenditure now predicted to account for $84.6 billion dollars of corporate spend. Having placed itself at the cornerstone of emerging technologies such as artificial intelligence, AWS grants teams the tools to incorporate their immense compute power into everyday business ops. Despite the performance increases and reduced latency, even AWS’ market dominance does not make it immune to external factors, however.
Heavy investment in infrastructure around Europe sits uncomfortably close to the war in Ukraine, highlighting a dependency on Russian energy supplies. Even following heavy investment in its high-efficiency Graviton 3 chip – which Amazon CFO Brian Olsavsky has credited with a-billion-dollar expenses drop in the first quarter of 2022 – further international tensions surrounding Taiwan’s chip industry still shed a gloomy shadow on cost prospects.
With cloud vendors battling increased spend, it’s up to the customer to navigate increasingly complex cost models. With over 160 cloud services, the sheer variety on offer – alongside the nitty-gritty demands of each unique integration – can threaten the most well-intentioned cost-saving aspiration.
There are three fundamental drivers of cost within AWS: compute, storage, and outbound data transfer. Understanding the impact of each is one of the primary ways that organizations can begin to reclaim control over their cloud cost.
The first cost driver within the AWS is compute: elastic cloud compute (EC2), for example, provides secure resizable compute to developers, offering a variety of virtual hardware across CPU, memory, storage and network capabilities. Each instance type demands its own cost alongside the geographic region it’s placed within – the cost is proportional to the energy and storage requirements of the associated hardware, alongside the overall demand currently being placed on that datacenter.
While the wide selection of instance types provides DevOps with unmatched flexibility, it provides the first clue to how AWS costs can increase. With DevOps granted direct access to Cloud compute power – and therefore cost – a siloed development team can easily disregard budget constraints in favor of higher availability or power. With no guardrails for cloud spending, this newfound autonomy can result in unexpected and unexplainable costs. Compute is one area that quickly accrues extra cost, partly thanks to the speed with which AWS introduces new product families – building a technical debt of outdated compute instances – and partly thanks to the sheer speed with which DevOps can spin up and release new projects.
With compute power outsourced, many organizations choose to make further use of AWS’ data storage. Amazon simple storage service (S3) grants endpoints the ability to access stored files from any location. Now an established backbone of cloud-native applications and AI data lakes, S3 lends incredible configurability to data storage. One of the more intuitive cost drivers within S3 is the size of the original being stored; this facet of cloud cost is often just assumed to be a cost of business.
However, many DevOps overlook or underestimate the importance of access frequency. AWS offers storage solutions based on how often each database is accessed – S3 Standard’s low latency and high throughput makes it perfect for rapid-access fields such as content distribution, gaming applications, and big data analytics. However, their products range all the way to Glacier Deep Archive – built for low-cost storage of data with very rare access; for example, backups in the event of widespread compromise.
Policies are able to be implemented that automatically transition objects between storage classes, and Intelligent-Tiering is an S3 offering that even does it for you. The assumption that storage classes are simply set-and-forget can be a major driver of cost.
With storage and compute power costs accounted for, the last major consideration is AWS data transfer fees. One of the more overlooked areas of cost impact, the quantity of data being transferred is as important as its destination. Keep in mind that only outbound transfers incur fees, as the incoming gigabytes aren’t transported by Amazon-owned architecture.
The first ‘layer’ of cost impact is based on whether your data is going from AWS to the internet or to another Amazon-based workload. Compute is more than just remote servers: once the third party server has processed the request, each byte must be sent back to the user’s device over the public internet. As the standard AWS compute service, outbound EC2 pricing can shed some light on the sheer variety of cost. Up to one gigabyte is transferable for free every month. From there, the following 9.9 terabytes in that month are charged at $0.108 dollars per GB. Given that the average AWS customer is managing a total of 883 terabytes of data, today’s enterprise demands are lightyears beyond even the most inexpensive tier. To reflect this, AWS offers economy-of-scale tiered pricing, with each tier offering a better price per GB of transferred data. This is yet another factor in the sheer unpredictability of AWS cost: even if you’re relying on one service in one region, changes in demand mean that your cloud spend is subject to constant fluctuation.
As we zoom out and start to look at services within their surrounding networks, the view of data transfer costs becomes even more cluttered. Consider the architectural best practice of setting up multiple availability zones: with a primary RDS database set up, both the ingress and egress of data to the second availability zone is charged. This means that – should your organization suddenly have to rely on the secondary availability zone – getting data to your consumers could suddenly cost a great deal more.
Cost optimization starts with visibility. In this way, controlling AWS cost differs significantly from the traditional approach used for physical servers and on-premises software licenses. The traditional spending model can be neatly segmented into a few major roles: finance teams authorize budgets; procurement teams oversee vendor relationships; and IT teams handle installation and provisioning. In contrast, cloud computing has enabled virtually any end user from any business sector to independently and rapidly acquire technology resources. As IT and development teams are pushed to the end of this procurement chain, there’s very little incentive left to optimize resources that have already been acquired.
AWS cost optimization recognizes that this approach leaves IT teams chronically unprepared. One on-the-ground consequence of this is the popularity of on-demand instances – the highest-cost form of resource fulfillment. If siloed teams are the barrier to individual cost responsibility, then democratized access to real-time cost information becomes a major key. While efficient DevOps teams further help the organization remain agile and competitive, a solid foundation of Financial DevOps (FinOps) is vital as cloud service adoption expands. Without this, an aspiring cost optimization project risks slipping back into cloud confusion.
When leveled at AWS, FinOps breaks the complex run environments and their disparate pricing processes into four key principles: See, Save, Plan, and Run. Each of these areas can further be bolstered with intuitive policies and innovative solutions.
Thanks to the highly distributed nature of AWS spend, understanding where each cost is coming from can be one of the hardest challenges. Visibility into each resource, therefore, is at the core of Cloud Financial Management (CFM). Not only does this allow development teams to begin exercising their own impact on the budget, it also grants the finance department a greater degree of insight, building a foundation of cross department collaboration.
In AWS, a tag is a label assigned to a specific resource. Each tag is made up of a key and an optional value, both of which are defined by the Dev teams when spinning up that resource. Without a strict tagging policy, concerns around time wastage can result in tag homogenization, making reports incredibly muddy. Instead, organizations need to start realizing the potential of AWS tags: by clearly assigning the purpose, owner and environment of every resource, your cloud environment can benefit from the same degree of visibility as on-prem knowledge management systems.
Tag guidelines need to highlight the importance of standardized, case-sensitive formatting, and be applied consistently across all resource types. A multifaceted DevOps team can also keep in mind that tag guidelines can also help support other critical areas such as resource access control, automation, and organization.
Many organizations beginning their FinOps journey are facing a significant tagging backlog: manually updating every tag would be an overwhelmingly time and cost-demanding process. Furthermore, tagging doesn’t necessarily work for every business use case.
GlobalDots’ keen eye on innovation grants organizations new and innovative ways of battling these challenges. In this case, third-party tools can provide a new degree of visibility. By uncoupling your tagging process from AWS’ own demands, it becomes possible to automatically associate resources to projects – even those that span the width of multiple cloud providers. Furthermore, containerized environments can now be associated with specific owners; once-untaggable shared resources can further be included in cost reports and beyond.
With previously established resources now placed into context, your DevOps teams need to be supported with guardrails that maintain this standard for newly created resources. AWS CloudFormation provides a foundation for all your infrastructure resources, and allows you to enforce policies on tag creation. Some specific CFM tags include the project that the resource is associated with; the owner (aka the developer responsible for winding it up); and the customer this broader project is for. With resource ownership defined – preferably with the aid of cross-functional perspectives – your cost allocation reports are ready to be transformed.
AWS’ default reporting structure is the monthly cost allocation report. Activating tags on your cost allocation report requires navigating to your billing preferences in the AWS management console and then choosing which tags to include in the report. While this lends brand-new visibility into the costs associated with the technical and security dimensions of every application, it’s only the first step toward optimizing AWS costs.
With sustainable visibility achieved, it’s the right moment to delve into cost savings. This extends beyond just paying less for the resources you use; it requires a full assessment of your cloud usage, and a fine-tuning of IT investments to fully realize your business objectives. At the core of AWS cost are the pricing models themselves. While on-demand is seen as the default, it’s worth keeping in mind that other purchasing options can better fit your business’ requirements while also saving money.
Steady state usage is one area that can benefit from AWS’ Savings Plans. The basis of the Savings Plan model is commitment: by agreeing to pay for a specified amount of compute power over a one or three-year period, you’re offered a discount on every hour’s worth of compute. This consistent, flexible pricing can allow for immediate cost-savings and predictability – two major FinOps wins. Savings Plans are subdivided into three types: compute savings plans; EC2 savings plans; and Amazon SageMaker savings plans. The segmented options allow for a staircase approach when first starting out, where specific on-demand instances can slowly be faded out as Savings Plan coverage is increased.
With consistent-usage discounts available through Savings Plans, Reserved Instances (RIs) allow for the same commitment-focused pricing strategy for instances within the same region and of the same type. As the name suggests, this upfront payment is essentially reserving portions of instances for the year at great discount. Thanks to the fact that all of these instances need to be of the same type, this option is best used for EC2 and RDS Elasticache. Unused instances can be bought and sold on the reserved instance marketplace, further helping drive rightsizing.
The final type of pricing model is the Amazon EC2 Spot plan. This lets you take advantage of any unused EC2 capacity at up to 90% discount – with the caveat that EC2 may claim that capacity back with two minutes of notice. As a result, Spot workloads need to be fault tolerant. This makes Spot a perfectly viable, savings-heavy approach for containerized, CI/CD, and machine learning workloads.
The sheer scale of many organizations’ AWS architecture can itself seem a significant roadblock to rightsizing. This is where automation lends a new approach to dynamic and adjustable commitments. Automated RI optimization, for example, leverages machine learning algorithms that analyze your on-demand workload usage and patterns. With this, the full portion of suitable workloads are covered with RIs bought from across AWS and the marketplace for the most lucrative discounts.
Drilling down into the architecture of instance generation, there are some further optimizations that can be made. For example, making sure to deploy on the latest instance type allows you to take advantage of AWS’ increasingly powerful capacity. AWS graviton and AMD are two of the latest examples thereof. These processors offer vastly higher performance compared to the older Intel cores; AMD-powered instances now benefit from a 35% jump in efficiency compared to the previous generation m5a instances.
Furthermore, modern microservice architectures can offer increased agility at lower cost. Serverless options such as AWS Lambda can deliver per-millisecond compute savings by cutting development time. As one saving builds atop another, it’s vital to maintain the previous visibility and track exactly how much each action is saving. This is where the third component kicks in.
Less than one in five organizations can accurately predict their cloud expenses, with most being unable to hit even a 5% variance between their prediction and the financial reality. Part of this is thanks to a lack of cloud cost data and optimization opportunities. Now, however, we’re at a stage where accurate forecasting is tangible. When forecasting and planning, it’s essential to keep two distinct cases in mind: existing workloads and new workloads, as each of these presents its own set of unique considerations.
Existing workloads hold a wealth of previous information. A retrospective analysis of past spending and usage patterns will help you determine the relationship between their growth and cost. Add to this by establishing whether current growth will continue by collaborating closely with technical stakeholders. The pace of these growth trends — whether slowing, accelerating, or continuing – play a major role in future costs. As cost optimization measures mature over time, KPIs become a lifeline for forecasting models. One example is the percentage of each workloads’ allocated cloud spend – both highly actionable for the engineering teams and another pillar of cost over time analysis. Other focal points can be the amount of instances taking advantage of Spot and Reserved plans.
To streamline the quantity of data, a particularly useful tool is the AWS Cost Explorer. By imbibing your existing workloads’ past demands and spend, it forecasts costs in a defined future time range. This prediction is based on machine learning and rule-based models, and further integrates with budget alerts that flag any predicted cost overruns. After establishing your trend-based forecast with the assistance of AWS Cost Explorer, the next step is to employ the AWS Pricing Calculator. This tool allows you to estimate your AWS use case and anticipate future costs by considering various factors such as expected usage metrics (e.g., traffic and requests-per-second), the necessary Amazon Elastic Compute Cloud (Amazon EC2) instances, and more.
New workloads offer a blank slate for FinOps prediction. The choice between lift and shift or complete re-architecturing needs to be closely analyzed, further taking into account the time demands on technical teams. As migration picks up pace, performance indicators can be put in place that support cost management. Transforming the migration business case into a budget plan can be achieved with AWS Budgets, helping keep every turning cog of a migration in financial sync.
The final component of FinOps strategy recognizes that the entire process is iterative. Regular review cycles help keep everyone in the loop and actively reviewing the efficiency of every project. Reporting on the metrics of cost visibility and reduction – alongside the cost of achieving them – helps foster the collaborative culture that FinOps thrives in. This also lends a greater degree of agility to your FinOps culture, as negative trends can be fixed at a far more proactive rate – and positive results allow for team wins and deserve promotion across the organization. Pulling representatives in from across the application development, finance, and management spectrum further helps foster a communicative and proactive culture.
When GlobalDots partnered with a major eCommerce giant, their cloud operations were spread across 74 different accounts, and some inefficient development habits had snowballed into a multi-million cloud spend. GlobalDots spent several months consolidating this client’s cloud resources, doubling the number of machines running on reservation contracts, and systematically streamlining outdated architecture. In this FinOps case study, the eCommerce giant would go on to enjoy a 20% reduction in its cloud bill. At the same time, they grew by a third thanks to increased feature development and web traffic.
Large-scale cultural changes can feel glacially slow at times. However, FinOps adoption sits at the divide between analysis and action – with just a few major stakeholders on board, it becomes possible to transform your organization’s approach from the inside out.
Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.