Leveraging live customer example (example.com, anonymized data)
At zOpt.ai, we have been helping dozens of customers improve returns on their cloud investment (ROI). We have been learning a lot about customer priorities and struggles while they walk the path with us. This blog post highlights a few of these learnings that we have incorporated in our operations and processes so that we can better help customers.
Background:
Example.com (we will refer to this company as example.com in this blog) is a successful tech company that has been using the public cloud for many years. They have developed in-house cloud expertise to manage their cloud operations, and deployments in a systematic manner. Example.com had leveraged Savings Plans and Reserved Instances for ~80% of their cloud.
Example.com is growing, seeing spikes in demand and is actively deploying newer product features, and components. This is resulting in their cloud spend increasing month over month. The Q3-2023 board meeting defined cloud spend control as a priority. They have been working with us since Q4 2023, last 12+ months.
Inside story: Example.com tried cloud optimization on their own several times before, 1/ setting up war-room, 2/ setting up part-time FinOps teams, 3/ cloud optimization day of the month processes, 4/ tried using third-party services. These attempts resulted in some cost optimizations; at times also resulted in production environment issues and outages which led to halting all such attempts.
Start of the engagement:
zOpt.ai team started working with example.com and identified optimization opportunities from the get go. The first wave included a series of easy optimizations, which should be a no-brainer to implement. In addition, zOpt.ai removed the need for DevOps bandwidth through automated implementation of these optimizations using our Human Vetted Automation engine.
It took 2 months for example.com to act on these initial recommendations. Example.com acted on 40% of the identified recommendations, taking a very cautious approach to implementing the recommendations in a test-dev environment before making any changes in the production environment.
Why: Earlier cloud optimization efforts had resulted in ill-effects on their production environment and no one in the organization wanted to take any chances.
Next phase of engagement:
zOpt.ai team was happy to see the customer building the trust with the identified optimizations. Over the next 4 months, the continuous optimization opportunity identification kept on adding newer optimization opportunities. The total identified recommendations were ~15% of cloud spend and promised performance improvements of ~10% in the critical system components.  But, to our surprise, example.com did not act on any recommendations. Example.com was leaving many performance improvements and cost savings on the table.
Why: Imagine taking a fall while riding a bike, or getting into a car accident. The incident scars your mind and you would always worry about repeating the mistake and getting into the same mess. Example.com did not want to touch any of their mission critical system components. The word ‘automation’ is scary for most DevOps teams and they want complete control over any and every change in their cloud environment.
Once we understood the reason for non-action, we had to build the confidence for example.com’s DevOps leadership team. What worked to our advantage 1/ all optimization recommendations comes with performance guarantee 2/ Automated remediation takes away need for DevOps bandwidth requirements 3/ Any downtime needed is called out clearly and changes could be scheduled in specific maintenance window 4/ we do not take over your infrastructure, but allow you to schedule, review, approve changes for specific resources to be acted upon 5/ in case any infrastructure changes go wrong, the changes are rolled back automatically.Â
Example.com team appreciated these well thought out controls and the ease of use. They finally started acting on the recommendations in cautious manner. They implemented the optimization in a test-dev environment, saw it was bringing in the promised optimization before implementing the optimizations in the production environment.
Learning: Cloud infrastructure optimization is similar to changing flat tire on your car, but while the car is in motion. The changes need be done in precise & careful manner using the right guard rails. Example.com started trusting the guard rails provided by zOpt.ai out of box.
Current stage of engagement:
After going through the full cycle of automated identification, performance guarantee, and automated implementation; Example.com put together an internal process for reviewing and implementing changes every 4 weeks. The automated implementations of performance improvement, cloud debt retirement and cost optimization get reviewed regularly and acted upon during the maintenance window, typically once a month.
The chart above show forecasted cloud spend, the actual spend with help of zOpt.ai and the effective savings realized over the course of the engagement. The chart does not show performance improvements and retiring cloud debt as its hard to put at $$ value to these benefits.
Summary:
Example.com has saved ~18% of their cloud spend over the last 12 months (Million+ of dollars). They have retired cloud debt and migrated to the latest price performant hardware/software/managed services. This has resulted in 15-20%  improvement in the reliability & performance of their critical system components.
Key takeaways - customer perspective:
DevOps bandwidth is premium. Everyone dreads production issues. Optimization with performance guarantee is a must. Automated implementation with human override is critical. Trust building with an optimization platform takes time, and it takes longer if the customer has faced production issues due to ‘wrong’ optimizations. Continuous cloud optimization is the only way to maximize ROI from cloud investment.
Comments