Amazon last week launched a contest for companies to show their Spot Instance pricing strategies, with $5,000 in AWS credits going towards the best use cases and $3,000 in credits going to the runner up. But the second year of the contest is as good a time as any to look at the often-mysterious beast that is AWS Spot Instances.
While not often used, they are an important element in Amazon’s bag of tricks as well as something that startups are using to save tens of thousands on certain workloads. I’ve spoken with several companies to understand the tips, tricks and strategies involved in playing the AWS spot market.
Why Amazon’s into Spot
Amazon launched Spot pricing in 2009. The service lets companies bid for access to a compute instance for one hour at a set price. This is in contrast to a reserved instance, where you buy reserved capacity in advance for a set time at what will often also be a lower price than the general instance. Spot Instances are more volatile than a general instance where a developer spins up a virtual instance at the going rate. And it doesn’t appear they are taking the world by storm just yet.
Cloudyn, a cloud management platform, estimates that spot use is only 3 percent to 5 percent of the types of instances used on AWS. It did say however, that spot use has grown by 40 percent from July to August and that growth was consistent for September as well.
However, for Amazon, Spot Instances fill an important role. Matt Wood, general manager, data science at AWS, explains that Spot Instances are a way to help Amazon use all of its capacity — keeping its server utilization as high as possible. When there’s a lot of capacity, the price of a spot instance goes down, and theoretically the market comes in ready to use it. When capacity is taken up by other compute jobs, especially the higher reserved instances the spot capacity diminishes and prices might rise.
In some ways the economics of the spot market are like a tuning knob for the AWS margins. If you think about the costs of offering AWS, much of it is in the infrastructure. Most of the revenue is generated by products that have a set price. But Spot Instances can not only help Amazon make sure as much capacity is used as possible, the availability of that capacity can also force prices higher. Given that the servers are a sunk cost, spot pricing helps make sure that cost is used at capacity and can even be tweaked to make people pay more for it.
Approaching the spot market
For developers hoping to take advantage of the spot market, it can be totally worth it. It’s possible to use this to save significant chunks of money. For example Cycle Computing ran an 11-hour job for a pharmaceutical client that used 10,600 servers. The infrastructure cost in traditional IT would have been $44 million (and it would have taken longer) but Cycle Computing used Spot Instances and paid $4,372.00 said Jason Stowe, the CEO of the high performance computing company. This just doesn’t cut costs, it enables a company to do things that were previously cost prohibitive.
The key is understanding your workloads. Spot Instances are lost if the price goes above your bid rate, which means if you are trying to use Spot Instances to get computing on the cheap you have to be prepared for highly volatile instances.
Stowe says he uses it for massively parallel workloads that can flee from a decayed instance to another without disrupting any of the processing work. Monte Carlo simulations would work too. Ooyala, online video software and services provider, uses Spot Instances for transcoding. Basically, if you can use a GPU, Spot Instances are probably good for you. But, they would be problematic though for transactions, web hosting or anything to do with production databases.
Once you have the workload, you have to think about the other technical costs of using Spot Instances. For example, if you’ve got an app that can handle a revolving door of new virtual machines, that’s only part of your criteria. You also have to make sure your applications can start running on those ephemeral instances quickly, explains Mike Tung, CEO at Diffbot.
Tung says that he’s broken down the latency between the bid getting accepted and the instance running into two block: one Amazon controls, the other he does. Amazon has to get the machine booted up and running but after that Diffbot’s software needs to configure the instance and load the job onto it.
“The bidding and allocating the machines is pretty fast from Amazon,” Tung says. “Some providers will take minutes, but on Amazon, the time on their side is within 2-3 minutes booted and then it is up to us to get app running as soon as possible.”
But he cautions that in Spot Instances getting the machine up and running can take longer than on-demand, so you must consider that when you are building a model for buying spot instances versus general on-demand instances.
The importance of the bid
But when it comes to Spot Instances, after finding the right workload, the biggest challenge is figuring out a bidding strategy. There are several resources that delve into this, but generally you can bid high as a means of ensuring you get instances with less volatility, or bid lower to optimize your costs and have software that sends the overflow to on-demand or reserved instances.
Vittaly Tavor, a co-founder at Cloudyn noted that in a spot check of Spot Instances currently running on Monday, October 7, about 46 percent were the c1.xlarge instances, which makes sense given the compute intensive nature of many of the workloads people currently run on Spot Instances. The second largest category at almost 26 percent were then t1.micro, which looks like it might be useful for testing databases.
Tavor also noted that while the longest continuous running Spot Instance is a C1.xlarge running since August 9, 2012 in the Amazon U.S. East region, that region is actually the most volatile of regions — in part because it’s the largest and the capacity there is most in demand. But in the long run, you can save a bundle using Spot Instances if you:
- match them to the right workload;
- have baseline on-demand and perhaps reserved instance capacity to fall back on;
- figure out a bidding strategy that works for your economic and compute goals;
- write software that manages your compute acquisition strategy that takes Spot pricing into the equation;
- and, think about buying computing like you would any other commodity.
That last bit is what I find most fascinating about the way companies are using spot pricing. Many of these companies recognize that computing is their raw material and have built applications that take that into consideration. For example, Tung’s business intelligence software that bids on Spot instances is a competitive advantage and not something he’d open source. All of the people I spoke with who use Spot Instances have built their apps with a “ready-to-fail” mindset that assumes computing is fungible and the app just needs to continue.
And so far Amazon is winning over the savviest of developers with Spot Instances, which, while not hugely popular, are a competitive advantage so far in the race to offer compute infrastructure. Sure, this is a way for Amazon to use up unused capacity, but it’s also a way to offer lower pricing to highly sophisticated users — exactly the clients Amazon wants to keep on its cloud.
This story was updated at 1:25 pm to correct the characterization of Reserved Instances.