32 Comments

Summary:

Brace yourself. For established workloads, dedicated hardware is often a better and cheaper option than straight-up cloud deployment.

Using cloud infrastructure is the natural starting point for any new project because it’s one of the ideal use cases for cloud infrastructure – where you have unknown requirements; the other being where you need elasticity to run workloads for short periods at large scale, or handle traffic spikes. The problem comes months later when you know your baseline resource requirements.

Let’s consider a high throughput database as an example. Most web applications have a database storing customer information behind the scenes but whatever the project, requirements are very similar – you need a lot of memory and high performance disk I/O.

Evaluating pure cloud

Looking at the costs for a single instance illustrates the requirements. In the real world you would need multiple instances for redundancy and replication but will just work with a single instance for now:

Amazon EC2 c3.4xlarge (we can’t consider m2.2xlarge because it is not SSD backed)
= 30GB RAM, 320GB SSD storage
= $1.20/hr or $3726 + $0.298/hr heavy utilization reserved

Rackspace Cloud 30GB Performance
= 30GB RAM, 300GB SSD storage
= $1.36/hr

Databases also tend to exist for a long time and so don’t generally fit into the elastic model. This means you can’t take advantage of the hourly or minute based pricing that makes cloud infrastructure cheap in short bursts.

So extend those costs on an annual basis:

Amazon EC2 c3.4xlarge heavy utilization reserved
= $3,726 + ($0.298 * 24 * 365)
= $6,336.48
Rackspace Cloud 30GB Performance
= $1.36 * 24 * 365
= $11,913.60

Another issue with databases is they tend not to behave nicely if you’re contending for I/O on a busy host so both Rackspace and Amazon let you pay for dedicated instances — on Amazon this has a separate fee structure and on Rackspace you effectively have to get their largest instance type. Calculating those costs out for our annual database instance would look like this:

Amazon EC2 c3.4xlarge dedicated heavy utilization reserved
= $4099 + ($0.328 + $2.00) * 24 * 365
= $24,492.28
Rackspace Cloud 120GB Performance
= $5.44 * 24 * 365
= $47,654.40

(The extra $2 per hour on EC2 is charged once per region)

Note that because we have to go for the largest Rackspace instance, the comparison isn’t direct — you’re paying Rackspace for 120GB RAM and x4 300GB SSDs. On one hand this isn’t a fair comparison because the specs are entirely different but on the other hand, Rackspace doesn’t have the flexibility to give you a dedicated 30GB instance.

Consider the dedicated hardware option…

Given the annual cost of these instances, the next logical step is to consider dedicated hardware where you rent the resources and the provider is responsible for upkeep. At my company, Server Density, we use Softlayer, now owned by IBM, and have dedicated hardware for our database nodes. IBM is becoming very competitive with Amazon and Rackspace so let’s add a similarly spec’d dedicated server from SoftLayer, at list prices:

To match a similar spec we can choose the Dual Processor Hex Core Xeon 2620 – 2.0Ghz Sandy Bridge with 32GB RAM, 32GB system disk and 400GB secondary disk. This costs $789/month or $9,468/year. This is 80 percent cheaper than Rackspace and 61 percent cheaper than Amazon before you add data transfer costs – SoftLayer includes 5,000GB of data transfer per month which would cost $600/month on both Amazon and Rackspace, a saving of $7200/yearly.

… or buy your own

There is another step you can take as you continue to grow — purchasing your own hardware and renting data center space i.e. colocation. We’ll look into the tradeoffs on that scenario in a post to come.

David Mytton is founder and CEO of Server Density which offers a SaaS tool to provision and monitor infrastructure.  He can be reached at david@serverdensity.com or followed on Twitter: @davidmytton

  1. ballychohandubai Friday, November 29, 2013

    good…

    Share
  2. Well, quite intuitive!! The figures are well calculated but with a data analysis from Amazon CloudWatch and implementing Auto-scaling could make a difference, In my opinion, consulting professionals to benchmark and cut the cost would be a better idea as the strategies could be held for longer than renting hardware for a start up, which would definitely not be universally beneficial for start-ups.

    Share
    1. Auto scaling works well for elastic workloads like processing or web servers (which are usually stateless), but for databases it’s not really that useful. Databases tend to be around for a long time and take more effort than just spinning up new servers, with the one exception perhaps of read slaves. They also usually have a known resource requirement i.e. memory and disk space which is more difficult to optimise based on metrics.

      The point is that the cloud is great for those elastically scalable use cases, but not so much for the long running permanent instances like databases.

      Share
  3. Indeed, the rented out setup could definitely work for a set of people, but for a startup, I agree its better to have a solution oriented long run approach rather than just a crunch of short term numbers.

    Share
    1. Cloud works well for startups to begin with because you don’t know what kind of traffic you’ll get, but then agreed you need to plan for the long term and that’s where the cost savings of dedicated (and maybe later, colo) come in.

      Share
  4. It doesn’t seem like there’s a lot of information for comparison here. And maybe there’s some architecture considerations? I wonder why you’d choose the expensive dedicated instances? Of course, without the dedicated instances, it seems that AWS would be cheaper. How about the provisioned I/OPS available on AWS’s EBS volumes? Why are you only considering instances with enough local storage? It seems like you’re not taking advantage of multi-tier architectures? For example, running a SharePoint environment can utilize a small web-front end, an application tier, and a database tier. It doesn’t all have to be on one giant instance. Because of this, you could have one dedicated DB server that other app servers connect to? How about AWS’s RDS service?

    How about the data transfer pricing? You note the price to transfer OUT 5k GB of data from AWS to the general internet. Does Softlayer’s 5k GB free only monitor data OUT? Or does data IN count against that 5k GB tier? If so, it’s important to note that AWS doesn’t generally charge for data IN. How about Softlayer transferring data within its regions? is that against the 5k GB cap? The most expensive in-region transfer for AWS is $0.01/GB, which works out to about $50/month.

    I’m not sure we’re getting an apples-to-apples comparison? A little more info would help me understand those benefits.

    Share
    1. As mentioned in the article, you would choose a dedicated instance to avoid host contention. The first set of prices looks at the non-dedicated option.

      This is also only considering local instance storage, which gives you the best performance, backed by SSDs. This makes it fair to compare against Rackspace and AWS. Block storage and EBS/provisioned IOPS are not considered.

      This also looks only at an instance suitable for a database, but the same principle applies to any long running instance. RDS is also not considered because it’s more difficult to translate this to dedicated or colo, because it’s a service. The point was to compare “raw” compute.

      Data transfer was ignored but mentioned in passing because Softlayer include it. They do not charge for data in nor do they charge for transfer between their data centers on the private network.

      These points mean the comparison is legitimate and pretty much “apples to apples”.

      Share
      1. Do you expect a measurable performance benefit by utilizing the dedicated instances? Is there indication that an EC2 instance experiences host contention that needs to be neutralized by going with a dedicated instance?

        What is Softlayer’s charge for Data OUT, which is what you threw in in passing for AWS? The data transfer number, at least, is not at all apples-to-apples based on the article and subsequent follow up.

        Share
        1. There is definitely a measurable performance benefit by using a dedicated instance and it’s something we’ve seen in real world production environments. The problem is random contention on the host, particularly with i/o and CPU and it’s the randomness that really hits performance. The hypervisor has some overhead but that’s usually nothing compared to noisy neighbours.

          And if anything, data transfer is the easiest thing to compare because it’s a simple per GB calculation. Softlayer charge nothing for data in, and for data in/out on their private network across any of their data centres. They include 5000GB out as part of their pricing and then you pay extra for anything above that. For example, 6000GB transfer is $50/m on Softlayer. If you take that 5000GB off then that’s $50/m for 1000GB, which works out to $0.05 per GB, significantly cheaper than Amazon’s $0.12 per GB.

          This is raw compute being compared in the article. It’s the easiest thing to look at, especially given the use case. It gets more complex when you add in IOPS, API requests, variable traffic, etc, which is why I picked this as the thing to compare. It’s also a workload that everyone can understand – most apps have databases behind them.

          Share
  5. For a correct comparison you also have to consider hardware and software maintenance costs. I guess this means insurance for the hardware When you use a cloud server and the machine blows up, you don’t have to replace it. And you don’t need a part-time of full-time IT person to install and maintain software on it…

    Share
    1. Same applies with dedicated – you’re paying for the data center to replace parts and have teams in the data center. Your comment is really referring to colocation, which will be considered in the followup article next week (GigaOM wanted to split them although it was originally a single article).

      Share
      1. No, colo is different. If you are not using AWS or Rackspace public cloud, you will need data center space, heating, cooling, personnel etc costs you need to account for. Unless you add those costs, you are not getting apples-apples comparison. Also keep in mind that for true apples-apples comparison you need to have similar levels of security, durability, etc in your own data center as in AWS data centers for examples.

        When you factor in all those costs (which is the only real apples-apples comparison) public clouds win out by a huge margin. There is a reason why so many companies, including large enterprises now, are moving to AWS..

        Share
        1. Sure, if you build and run your own data centers, but that’s not colo. Colo is where you rent space in a facility that has all these features already, and everything is included in the pricing. That makes it easy to compare directly.

          (Not that pricing is necessarily easy when it comes to calculating power, but cloud pricing has its own difficulties particularly around figuring out IOPS and API call usage.)

          Share
          1. Well you are forgetting the cost of IT personnel to setup and manage the compute, network, storage, databases etc. All of that adds up, especially if you want the facility quality and caliber of the folks at AWS.

            Also – in general AWS has spent a tremendous amount of engineering in reducing hardware costs down. It uses cheap commodity hardware, but with a phenomenal layer of software on top that makes it reliable and durable. Check out for example James Hamilton’s talk in re:Invent for some of the details. Also, Amazon is very different from other companies in that it passes on the cost savings to customers in the form of low prices (not necessarily altruistic – more to drive competitors out of business). With the combination of extensive engineering to drive down its cost basis and the low low margins it lives off of, it would be very difficult for hardware/software from higher margin vendors to compete. Hence it is difficult for me to buy into your math or logic. You really need to do a comparison that is fully apples-apples, and I can’t see how someone outside of AWS can do that (given its penchant for secrecy).

            Share
            1. Colo is specifically not discussed in this post – it’ll be the topic of a followup article next week (GigaOM split the articles into 2 which were originally a single article). Buying your own hardware directly is so much cheaper in fact that the amount you save can be used to buy extra hardware for durability and hire all the staff you need.

              This post is about cloud vs dedicated and how it is significantly cheaper. You pay the likes of Softlayer to handle all the staffing at the data center. For designing the architecture, ordering the right servers, building your application you need the same engineering abilities that you would on the cloud so that’s not up for discussion.

              You’re welcome to present your own calculations to counter my argument. Happy to be proven wrong with actual figures.

              Share
            2. Well all cloud vendors have case studies showing how cloud saves over buying your own hardware or colo. AWS’s is more compelling, hence I am including the link here http://aws.amazon.com/solutions/case-studies/enterprise-it/. Rackspace has similar case studies too.

              Again, you have to do an apples-apples comparison. If you are a small startup and just want to use a few servers and don’t care about S3 level durability, DynamoDB scale etc, then you *might* find it more affordable to just buy the couple of servers and manage it yourself. But as you start increasing scale, you can’t steal time from existing employees to manage, and then the costs start increasing over pure cloud as you have to hire dedicated employees just to manage your data centers. Especially as you start getting into enterprise use cases where you need the multiple services AWS offers at the same levels of scale, durability and quality, AWS will come out far cheaper. Think about it – who else has such a low cost structure, and is so comfortable with low low margins.

              So yes, for some small startups that are just getting started, can manage their infra with existing employees and don’t care about the AWS level of quality, durability and reliability, buying a few boxes might work out cheaper. But as you grow and start having enterprise level needs, AWS will be much cheaper on a TCO basis. Comparison has to be apples-apples.

              Share
            3. That’s completely the opposite of what I’m trying to say. If you’re a small startup just getting going then you definitely do not want to buy your own hardware – it’s going to be a large outlay and you have no idea what your traffic patterns are going to be. This is a great use case for cloud like AWS or Rackspace because it’s cheap in the short term.

              The whole point of this article is to show that long term instances are not cheaper on the cloud, and the followup will show that colo is even cheaper than going dedicated. When you know your traffic patterns and what demand you have then you can make significant savings by handling that capacity with purchased hardware. You could still use the cloud to handle spikes or any other elastic workload – which is what it is designed for.

              The cloud does not magically solve your technical sysadmin requirements. You might not need staff who can rack servers but you still have to design around failure, hook up load balancers and elastic volumes, deal with internal networking, firewalls, databases, etc. That expertise is needed whether you use the cloud, dedicated or colo.

              But again you’re discussing colo which is not mentioned in this article. Wait until my next one this coming week which has figures for all those things you mention.

              Share
            4. Ah, if that is what you are trying to say then I think you are definitely wrong. There is probably a case for a small startup to get savings buying own hardware, but as their needs grow and they want enterprise class offerings and functionality, cloud is cheaper. Just take the use case of EBS volumes being automatically backed up into S3. If you need to have similar durability, you will need a SAN, and then costs shoot up.

              Anyways, will wait till your article next week.

              Share
  6. David-

    I agree with your points for a small startup.
    I did the calculations back in ’09 and came to the same conclusions, so I went with dedicated hardware instances. I’m also a systems engineer, so I opt for the minimally managed hosts without the bells and whistles. Having a few dedicated instances allows me to scale enough. Remember that as a startup you only want to scale when needed and not needlessly scale before necessary. One unique advantage that softlayer, rackspace, and other hybrid providers have is that they offer a mix of dedicated and cloud, which enables you to keep your dbs on dedicated hardware whilst scaling your front end via cloud instances. The backend fiber network that softlayer provides between data centers worldwide is also invaluable, because it allows you to places hosts anywhere you want without the region type restrictions that Amazon has. For instance, try migrating an RDS instance or failing over between regions. You will also run into issues hitting iops limits on RDS and experience erratic performance on EBS. It’s a known issue tht you need to design around. The notion that Amazon is somehow more awesome at scaling that many commenters presently or may not be true, but their publicly noted region outages do not reflect that notion. The cloud is not some magical place that solves all your scaling issues. Cloud or no cloud, you will always need to write solid code, properly de-couple the pieces of your systems, design a good architecture, and do capacity planning. To not do so is folly. I look forward to your next few articles.

    Share
    1. Agreed – the private networking side of things is often overlooked and it’s been great to have it just work without any cost or effort when we’re hosted with Softlayer. This is more complex when you run from colo data centres as often big providers like Equinix will give you city/region metro connect fibre services but across region e.g. London to Amsterdam they’re carrier neutral so you have to set up your own redundant connections.

      Share
  7. I think it’s easier/cheaper to alter the architecture to fit around the cloud rather than paying exorbitant fees to avoid contention. If you track CPU steal time you can respond to noisy neighbors if you have a distributed database by simply re-provisioning those nodes. Going the “one single big instance” route for a database is really not going to translate well to a cloud no matter what. Taking archaic centralized application architecture and trying to just put it on the cloud will always be a misfit.

    I think a better title for this would be: Applications not architected for the cloud may *gasp* not do well on the cloud

    Share
    1. Re-provisioning nodes with large volumes of data is not so simple because of the time it takes to complete the resync. You could spread it out over many smaller instances but that has its overheads too.

      Share
      1. Again it’s about designing the application for the platform. If you design around single large instances then yes re-provisioning will not bode well. If, however, you segment your data, build the application around that type of architecture then the CPU steal time is a minor nuisance. The $17k/year cost for dedicated hardware is probably good for those in the middle of transitioning architecture but ultimately if you are paying for dedicated hardware on the cloud you are “doing it wrong”.

        I’m not suggesting that all applications belong on the cloud, far from it, but this article vastly oversimplifies and dismisses the fact that it’s really about how you want to architect your application. Most applications can be architected to be distributed and it’s more than just shoehorning it in, it should be built like that from the beginning.

        The idea that just putting something as-is into the cloud and expecting it to be cheaper and more reliable then being disappointed is a common cautionary tale that people seem to take the wrong lesson from. The lesson IS NOT that the cloud is expensive and problematic it’s that you can’t solve problems the same way on infrastructure like EC2.

        Yes of course there are different problems that come with distributed architecture.. there is no golden hammer.

        Share
        1. It’s not as simple as just allowing for CPU steal, the biggest issue I hear anecdotally is with databases because they’re very i/o sensitive and this is very difficult to architect around. When you share the host with other users you have no predictability or guarantees about performance and it’s the variability and randomness that causes the biggest headaches.

          You can rearchitect your application as much as you want but you’ll still find the database is a big area of concern when you have no control over the underlying infrastructure. It’s easy to rebalance HTTP traffic to other nodes when response times start to increase but it’s much more challenging to do so with databases.

          You can spend huge amounts of time trying to engineer a solution or you could just not use the cloud for something it is not designed for, and is much more expensive at anyway. Even if you use the standard non-dedicated EC2 instances then it’s far more expensive than buying dedicated hardware.

          Share
          1. As far as IO as long as you are not using EBS it’s not a huge issue. I think the biggest problem is when people rely on EBS and expect consistent performance. Stick to ephemeral storage and it’s not nearly as big of an issue.

            You make a big assumption:

            “It’s easy to rebalance HTTP traffic to other nodes when response times start to increase but it’s much more challenging to do so with databases.”

            and

            “You can spend huge amounts of time trying to engineer a solution”

            Pricing points aside the point is there are MANY technologies designed to solve exactly these problems. You don’t have to spend huge amounts of time recreating the wheel.

            If you’re just talking about databases, HBase and Cassandra are quite adept at the distributed data store thing. You also have tons of open source projects dealing with cluster coordination (paxos and raft consensus on things like etcd, zookeeper, doozer) and such for other distributed services. There’s a ton of off the shelf solutions for solving these problems. You just have to utilize them.

            Sure it’s a huge amount of work for people not used to distributed systems, but that doesn’t mean the task is insurmountable. Anything unfamiliar will seem difficult.

            Single monolithic databases don’t escape these problems of the cloud either. At some point you will start having to split up your database as well. The only difference is the point where you need to split up your data into smaller nodes hits faster in the cloud than it does on dedicated hardware. So it’s not a new problem, just a problem you need to solve sooner. IMHO it’s better to start off distributed on smaller nodes and work on horizontal scaling than vertical scaling hoping you never run out of “runway” whether you are in the cloud or not.

            I stand by my point that a single node monolithic database is a terribly inefficient way to utilize the cloud. You also more than double the AWS pricing by using dedicated servers based off of anecdotal reports about IO contention.

            I suspect if your application was designed to run on a Hadoop infrastructure you could match the costs, gain failure tolerance, and gain flexibility for architectural changes and quick scale out.

            Share
  8. Hey David

    Could you share the numbers and the math you used to get to the 60% and 80% cheaper outcome? I couldn’t quite duplicate it myself, so was wondering if you could share.

    Share
    1. The figures from the article:

      Softlayer price: $9,468/yr
      Amazon price: $24,492.28/yr (61% more expensive than Softlayer)
      Rackspace price: $47,654.40 (80% more expensive than Softlayer)

      Share
  9. Hello,

    I fail to see how this calculation is correct:

    Amazon EC2 c3.4xlarge
    = $1.20/hr or $3726 + $0.298/hr heavy utilization reserved

    $1.20/hr or $3726? Shouldn’t be $1.20/hr = $10512/year ?
    Or am I missing something?

    Thanks.

    Share
    1. At $1.20/hr the annual price is $10512 per year but if you’re going to run it for a year you should pay for the reserved instance, so the actual best price is

      $3726 + $0.298/hr heavy utilization reserved
      = $3,726 + ($0.298 * 24 * 365)
      = $6,336.48

      Share
  10. So I followed your advice and looked into services from Rackspace and SoftLayer. I don’t have quotes yet, but here’s my problem:

    The total 3 year cost of a large server at Amazon is $32,000. It’s a cr1.8xlarge. This is a backend server, so there are no data transfer costs. Disk is actually quite small.

    The same thing at list price from SoftLayer is $153,000. I appreciate that they’re willing to negotiate on the initial sale, and maybe they undercut Amazon and sell it for less than wholesale cost. But this is the cloud. In a year, I may want to replicate to a 3rd or 4th global region… I’m not paying list price and I have no tolerance for renegotiating.

    There’s no way I’d enter an agreement with uncertainty about what I’m paying in a year. With Amazon, I know prices are going one direction: down. A company that can’t compete on wholesale prices cannot compete in a scalable cloud. And why on earth would I want to start a discounting dance with a sales team when I need to scale?

    My Rackspace conversation was equally difficult. They wanted to assure me that when my dedicated server RAID failed, that an employee would watch over and make sure the RAID restored correctly. For heaven’s sake, this is the cloud. Just give me a different server and let me bootstrap it myself! I can have a server integrated into my cloud architecture in the time it takes RAID to repair. Fix the RAID on your own time.

    I’ll see what they finally have to say, but this is not going well.

    Share
    1. Here’s the followup 50 days later… Both companies abandoned the sales process. They agreed that my situation should have been well suited to this approach: a large, CPU and memory intensive always-on app server in 3 global regions and the rest of the infrastructure made of VMs. But for various small reasons, neither Rackspace nor Softlayer could put together a working architecture for anywhere near Amazon’s prices, let alone beat Amazon.

      Share

Comments have been disabled for this post