How to Get the Most from your Public Cloud Infrastructure Investment
Liran Zvibel. August 16, 2019
Public cloud is all the rage, but there is a lot of confusion around the economics of switching to public cloud. Organizations often hear that they must switch to the public could to save costs, but then get shocked when a surprise public cloud bill arrives. This shock comes from a major gap between expectations and reality. Don’t be stuck with a surprise public cloud bill. Just consider the best use of public cloud.
The reality is that public cloud should not be simply used as if it were a bunch of servers in an on-premises data center. The best way to gain the economic advantage of the cloud is to leverage it for elasticity through on-demand instances, often taking advantage of spot instances. The lifetime cost of on-premises infrastructure is less than the public cloud in absolute dollars, but the economics look very different if you view it from a utilization perspective. Traditional corporate data centers run at very low utilization, frequently less than 25-30%. At that rate, most of the up-front investment is wasted. As utilization of on-premises IT infrastructure is typically so low, the economics do not look good at all.
How can that be improved by using the public cloud? We have already established that replacing on-premises servers with cloud instances 1-1, with no change in business process, is not the right answer, but will actually result in increased spending. We must dig into the underlying reasons for why enterprise IT utilization is so low.
First, HA (High Availability) and DR (Disaster Recovery) usually require a DR site, which will often sit idle until it needs to take over for the main site because of an outage. This already reduces theoretical utilization to 50%, as half of the infrastructure needs to be procured up-front but is in stand-by mode.
Next, organizations do not run the same volume of workloads all the time, and must account for infrastructure demand spikes usually set off by periodic analyses and reports (e.g., weekly, monthly, quarterly reports, etc). To cope with these spikes, many organizations keep spare compute resources as a standby buffer. However, because these resources remain mostly unused, the active site is forced to run at 50-60% of peak capacity, further reducing the effective utilization to 25-30% (50% utilization due to DR x 60% at peak usage).
How does Weka solve this infrastructure challenge? Weka not only runs natively on premises, but also bursts to the public cloud. The same product runs in both locations and functions in exactly the same way, providing a seamless bridge between on-premises demands and public cloud elastic demand. This is completely different from legacy storage solutions that address these demands with separate and distinct products.
Also, at Weka, we realized that object storage is the most cost-effective and scalable way to store large amounts of data with incredible durability. Using our Snap-to-Object functionality, we can save the state of the storage (“snapshot”) to an object storage bucket in such a way that any other Weka cluster can access the snapshot and continue to run from its point in time. If that cluster then makes some changes and saves them, the original cluster can download the incremental changes and review the results as if the completed work had been performed locally.
Many companies have shied away from the cloud because of security issues and concerns that data in flight between on-premises and the public cloud is vulnerable. Weka has the option to encrypt the data that is sent to the cloud, allowing the cloud instances to use the same on-premises Key Management Service (KMS) solution to access the data. In this way, the data is safe even though it sits on a public cloud environment.
Further, a Weka cluster can expand easily while the system is live, with capacity or performance increasing linearly with the added resource. This makes initial sizing of the infrastructure easy, as you only have to predict the near future rather than trying to plan for 5 years out. Once new hardware is available, it can be seamlessly leveraged to expand the same cluster.
The most obvious use of the cloud for elasticity is in moving the DR site to the cloud. By leveraging the cloud for DR, customers can slash DR infrastructure costs by almost 100%. You still have to pay for object storage on the cloud, and depending on the required Recovery Time Objective (RTO), maybe some compute resources as well. However, once cloud DR is employed, IT efficiency almost doubles.
A second great use of the cloud is for spillover of workload (“cloud bursting”). Customers should plan to have their on-premises IT utilized at 80-90%. When there is a need for extra compute resources, burst the extra work to the public cloud. In many cases, the extra work is batch-oriented; if that is the case, spot-instances can be used as a very economical model, slashing cloud expenses by up to 90%, so long as the workload has cloud agility. Now the economics of both platforms start to make sense. On-premises infrastructure does not need to be sized and architected for peak usage anymore and system ROI can increase dramatically; and the cloud can serve as the agile platform for peak usage.
So, don’t be stuck with a surprise public cloud bill. You don’t need to be disappointed by unrealistic expectations. You just need to use the public cloud for elasticity, cloud bursting, and long-term durable storage. These techniques of cloud DR and compute elasticity enable customers our customers to enjoy cost savings and the greatest advantages of both on-premises and public cloud infrastructures!