Scott Carey
Managing Editor, News

How enterprises are bringing pandemic-driven cloud costs under control

feature
Jun 28, 202112 mins
Cloud ComputingCloud Management

Cloud cost optimization is rapidly moving up the agenda as the dust settles on the technology decisions driven by the COVID-19 pandemic. Here are the key tools and principles to help developers stop cloud costs outpacing revenue.

controling it costs
Credit: Thinkstock

To the concern of finance officers everywhere, the quick issuance of stay-at-home orders during the early days of the COVID-19 pandemic gave IT teams the authority to quickly scale up cloud usage to meet a rapidly changing set of employee and customer needs, almost overnight. In fact, cloud spending outpaced on-premises investments for the first time ever in 2020, marking a tipping point in enterprise consumption of cloud services.

Now, as the end of the pandemic hopefully comes into view, the party might be coming to an end, and cloud cost optimization will be on the agenda for many organizations as purse strings tighten.

“At the start of the pandemic, people were focused on getting stuff working. The cost of cloud was on the back burner,” 451 Research analyst Owen Rogers told InfoWorld. “Over the past three months there has been increased interest in managing that cost and optimizing for the future. The thing I am less sure about is when the dust will fully settle for enterprises.”

Where that cloud-cost level finally rests will vary from organization to organization, but according to Eugene Khvostov, vice president for product and engineering at cost-optimization specialist Apptio, “The core pains are the same: Who is responsible for what cost, how do I make them aware and accountable, and how do I scale that across the organization.”

What is for certain is that cloud costs can’t continue to grow out of line with revenue. This is less a story about cutting cloud costs than it is about right-sizing for your environment. The great cloud correction is upon us.

Why finops may help manage cloud costs

“Over the years, we’ve heard the same stories over and over again. Engineering teams spend more than they need to on cloud, with little understanding of cost efficiency. Meanwhile, finance teams struggle to understand and keep up with what teams are spending. Then, to top it off, leadership doesn’t have enough input into company spending—and sometimes doesn’t even show a willingness to influence priorities,” JR Storment and Mike Fuller wrote in their 2020 book Cloud Finops.

Finops” is a divisive term that is essentially a new model for technology cost governance, where organizations use a set of techniques and tools to better plan, budget, and forecast cloud spending requirements. As Storment and Fuller wrote:

In the simplest terms, finops brings financial accountability to the variable spend model of cloud. But that description merely hints at the outcome. The cultural change of running in cloud moves ownership of technology and financial decision-making out to the edges of the organization. It flips long-held, forward-looking capacity planning methodology on its head to become rate-optimization analysis for technology that’s already been used. And it forces IT, finance, and business professionals to work together in unfamiliar ways. It’s an acceptance that the old ways of managing infrastructure aren’t just ineffective in cloud; they are irrelevant.

As Storment and Fuller alluded, a new hub-and-spoke model for cloud cost management is emerging, where a central group is tasked with finding optimization opportunities and negotiating the best rates with vendors, while engineers and product owners must take ownership of their own cloud costs.

As you would expect, online grocery delivery company Ocado saw a huge spike in demand for its services during the pandemic, and cost savings weren’t the priority. Instead, “it was scaling to meet demand,” Alex Howard Whitaker, principal cloud engineer at Ocado Technology, told InfoWorld.

To help manage that increased usage, Ocado tasked its central platform team with identifying company-wide savings opportunities, such as using spot instances for its more stateless applications and shutting off environments on the weekends. The central observability team also now incorporates cost in the information it provides individual product owners, who are increasingly responsible for the cost of their applications.

Why finops may not really help contain cloud costs

“Finops is flexible and built into the processes of the business, but I don’t think enterprises are at that stage yet, where cloud is still consumed on an ad hoc basis,” said 451’s Rogers. “Investigating the use of reserved instances, cutting out waste, and bringing in optimization tooling are the interesting things, as not many organizations will have the guts to make a cultural change of this scale while there is still so much uncertainty.”

In a recent survey of 304 IT and business decision-makers, cloud-management provider CloudCheckr found 63% of respondents still see cost management as a key area of improvement when it comes to their public cloud usage, with only 31% reporting that they monitor and optimize public cloud costs effectively. Or, as cloud billing consultant Corey Quinn told InfoWorld, “everyone says they have a handle on [cloud costs], ‘kind of, but we aren’t doing it well.’”

That’s because the scale of cultural change can be offputting for established enterprises with well-established ways of working. This is why Quinn isn’t completely sold on the organization-wide approach espoused by finops advocates. “There is no golden path, because you are talking about changing your culture in engineering, finance, and how they communicate, and that is a heavy lift,” he said.

Instead, Quinn advises organizations to get better at the small things, like switching off what they don’t need and taking advantage of committed use discounts. However, “even the easy stuff doesn’t always get done, because inertia is a powerful thing. … Companies tell themselves after a sprint completes they will make good decisions, like resizing and using higher level managed services, but they don’t.”

The CloudCheckr survey backs up Quinn’s assertions. It found a lack of rightsizing (43%), unused instances that are never turned off (35%), low usage of cloud instances (34%), and application migrations that don’t adjust architecture for the cloud (32%) were all commonly cited as key areas where cloud cost optimization could be improved.

How HSBC, Sainsbury’s, Airbnb, and Spotify are managing cloud costs

Still, regardless of finops’s suitability, enterprises of all kinds are starting to get serious about tackling their rising cloud costs.

Global bank HSBC had to shift several major applications onto cloud infrastructure during the pandemic. “We saw huge increases in online banking services, hundreds of government aid and benefit schemes to be implemented, and some very dynamic markets. This accelerated our deployments to the cloud. And, for existing workloads, we were able to react to some very dynamic demand,” said Ian Haynes, CTO for global cloud services at HSBC, during a recent AWS event. Now, the bank is “at the start of our [finops] journey and need to make it a part of everyone’s job.”

Like many enterprises today, cloud investment at British retailer Sainsbury’s has been focused on building new features and digital capabilities, which led to a rapid escalation in cloud service consumption. “Somewhere down the line, the operations team was trying to keep a lid on spend,” group CIO Phil Jordan told InfoWorld.

Following an intensive four-month change process throughout the pandemic—complete with training for suppliers, finance, staffers, and service teams —Sainsbury’s shifted the entire technology function into a new operating model called “end-to-end product life cycle management.” It essentially pushes accountability for a product or service out to the engineering teams, including cost, vulnerability, risk, and partner management.

These teams continue to be supported by a central platform team, which is responsible for activities such as building reusable assets, eliminating waste, finding cost savings opportunities, negotiating discounts, and identifying more efficient service use with vendors and partners.

“We did this for lots of reasons, and one was to make engineers fully accountable for total cost of ownership, including third-party costs and how we use partners. As we consume from the cloud almost by default, optimization becomes part of it,” Jordan said. “At its core, cloud is a consumption model, and consumption has been rife, so we needed to find some control and manage things more proactively.”

Even well-funded, cloud-first companies like Spotify and Airbnb have recently gone public in their efforts to bring some cost rigor to their engineering teams, with each company recognizing the time and effort this requires, even for highly mature cloud organizations with excellent engineering teams.

Both companies faced similar issues: Their monthly cloud bills were growing faster than company revenue. “We had a problem, but we lacked an in-depth understanding of how teams use AWS resources, and how planned architectural and infrastructure changes would impact our future AWS costs,” Airbnb engineers Jen Rice and Anna Matlin wrote in a company blog post.

As a result, they set out to build up the cost-attribution data required to start to show their data-driven developer community just how big a problem they were facing, to gain some buy-in. Now, a centralized cost efficiency team takes a birds-eye view of the entire Airbnb ecosystem to make more efficient infrastructure purchasing decisions, and incentives were extended to engineering teams to make their services more cost-effective.

At Spotify, as RedMonk analyst James Governor recently wrote, the approach was similar: Give engineers the data and tools required to make better cost decisions for their applications. “The idea is that engineers and engineering teams are incentivized to take more responsibility for the costs associated with the products they’re building. Modeling cost becomes part of the engineering process, rather than being a separate process for finance teams to manage,” Governor wrote.

Maybe you shouldn’t use the cloud so much

There is also a growing school of thought that in certain cases the cloud isn’t the answer at all. “As industry experience with the cloud matures—and we see a more complete picture of cloud life cycle on a company’s economics—it’s becoming evident that while cloud clearly delivers on its promise early on in a company’s journey, the pressure it puts on margins can start to outweigh the benefits, as a company scales and growth slows,” Andreessen Horowitz partners Sarah Wang and Martin Casado wrote in a recent blog post.

Dropbox provides the most prominent example of a company that has swum against the industry tide here, shifting away from its roots as a storage company built on AWS by building its own infrastructure, reportedly saving about $75 million over two years as a result. But, as Quinn wrote in his popular newsletter, this figure doesn’t fully account for the sheer amount of operational overhead involved in maintaining, patching, and updating servers this decision added, nor the cost of missed innovation.

And Quinn noted that while Dropbox was nonetheless right to bring the infrastructure in-house, the cloud storage provider is not a good model for most enterprises. In his newsletter, Quinn wrote, “It was reportedly a single, very large, very well-understood workload: storing user files. That use case wasn’t scaling in dramatic swings up and down, the capacity growth was easy to predict, and Dropbox engineers certainly understood it start to finish.” This cannot be said for most enterprises, whether they were born in the cloud, cloud-native, starting their cloud journey, or still running things on mainframes.

Get started with better cloud cost governance

Whether you adopt finops or use simpler cloud-optimization principles, there are several ways to start.

For finops, wrote Storment and Fuller in their book, “A single person might be assigned to manage commitments to cloud providers. They set out to implement an initial account, label, and tagging hierarchy. From there, as the practice gets larger, each part of the process can be scaled up.”

Visibility is the key first step to finops maturity. Once everyone is working from the same data, education can begin, helping an organization scale up its cost optimization practices around a small team of experts. “Gain visibility into what’s happening in your cloud environment and do the hard—but important—work of cleaning up your allocation so that you know who is truly responsible for what before you start making changes,” Storment and Fuller wrote.

More broadly, “come in with a plan,” said Apptio’s Khvostov. “Learn the frameworks like finops and consume the free resources from the cloud providers themselves. Don’t come in and wing it by downloading the cloud usage report from AWS and going from there. Make this a consistent approach and iterate and get better at how you analyze, optimize and control cloud costs as you go.”

451’s Rogers advised starting with your cloud vendor’s own cost-optimization tools, or turning to a third-party vendor like Apptio or VMware CloudHealth if you are running a multicloud environment.

Next, identify the right business metric to measure your cloud costs. “The business metric is important, because it allows you to change the conversation from one that is just about dollars spent to one about efficiency and the value of cloud spend. Being able to say ‘It costs $X to serve customers who bring in $Y in revenue’ brings a context that helps you make the decision whether $X and $Y are reasonable for the organization. Then, as services evolve or change entirely with new features, companies are able to measure the impact of these changes via these business metrics,” Storment and Fuller wrote.

This may all sound intimidating and complicated, but as cloud makes up a bigger and bigger chunk of an organization’s technology bills, it’s only a matter of time before these practices become mainstream and every engineer will need to know their way around a cloud bill.