Blog Post

Which is less expensive: Amazon or self-hosted?

Updated. Amazon Web Services (AWS), as the trailblazing provider of Infrastructure as a Service (IaaS), has changed the dialog about computing infrastructure. Today, instead of simply assuming that you’ll be buying and operating your own servers, storage and networking, AWS is always an option to consider, and for many new businesses, it’s simply the default choice.

I’m a huge fan of cloud computing in general and AWS in particular. But I’ve long had an instinct that the economics of the choice between self-hosted and cloud provider had more texture to it than the patently attractive sounding “10 cents an hour,” particularly as a function of demand distribution. As a case in point, Zynga has made it known that for economic reasons, they now use their own infrastructure for baseline loads and use Amazon for peaks and variable loads surrounding new game introductions.

An analysis of the load profiles

To tease out a more nuanced view of the economics, I’ve built a detailed Excel model that analyzes the relative costs and sensitivities of AWS versus self-hosted in the context of different load profiles. By “load profiles,” I mean the distribution of demand over the day/month as well as relative needs for bandwidth versus compute resources. The load profile is the key factor influencing the economic choice because it determines what resources are required and how heavily these resources are utilized.

The model provides a simple way to analyze various load profiles and allows one to skew the load between bandwidth-heavy, compute-heavy or any combination. In addition, the model presents the cost of operating 100 percent on AWS, 100 percent self-hosted as well as all hybrid mixes in between.

In a subsequent post, I will share the model and describe how you can use it for scenarios of interest to you. But for this post, I will outline some of the conclusions that I’ve derived from looking at many different scenarios. In most cases, the analysis illustrates why intuition is right (for example, that a highly variable compute load is a slam dunk for AWS). In other cases, certain high-sensitivity factors become evident and drive the economic answer. There are also cases where a hybrid infrastructure is at least worthy of consideration.

To frame an example analysis, here is the daily distribution of a typical Internet application. In the model, traffic distribution is an input from which bandwidth requirements are computed. The distribution over the day reflects the behavior of the user base (in this case, one with a high U.S. business-hour activity peak). Computing load is assumed to follow traffic according to a linear relationship, i.e. higher traffic implies higher compute load.

Note that while labor costs are included in the model, I am leaving them out of this example for simplicity. Because labor is a mostly fixed cost for each alternative, it will tend not to impact the relative comparison of the two alternatives. Rather, it will impact where the actual break-even point lies. If you use the model to examine your own situation, then of course I would recommend including the labor costs on each side.

For this example, to compute costs for Amazon, I have assumed Standard Extra Large instances and ELB load balancer for the Northern California region. The model computes the number of instances required for each hour of the day. Whenever the economics dictate it, the model applies as many AWS Reserved Instances (capacity contracts with lower variable costs) as justified and fills in with on-demand instances as required. Charges for data are computed according to the progressive pricing schedule that Amazon publishes. To compute costs for self-hosting, I assume co-location with the peak number of Std-XL-equivalent servers required, each loaded to no more than 80 percent of capacity. The costs of hardware are amortized over 36 months. Power is assumed to be included with rackspace fees. Bandwidth is assumed to be obtained on a 95th percentile price basis.

Now let’s look at a sensitivity analysis. Notice in the above example, that a bit more than half of the total cost for each alternative is for bandwidth/data transfer charges ($35,144 for self-hosted at $8/Mbps and $36,900 for AWS). This is important because while Amazon pricing is fixed and published, 95th percentile pricing is highly variable and competitive

The chart above shows total costs as a function of co-location bandwidth pricing. AWS costs are independent of this and thus flat. What this chart shows is that self-hosting costs less for any bandwidth pricing under about $9.50 per Mbps/Month. And if you can negotiate a price as low as $4, you’d be saving more than 40 percent to self-host. I’ll leave discussion of the hybrid to another post.

This should provide a bit of a feel for how I’ve been conducting these analyses. Above is a visual summary of how different scenarios tend to shake out. The intuitive conclusion that the more spiky the load, the better the economics of the AWS on-demand solution is confirmed. And similarly, the flatter or less variable the load distribution, the more self-hosting appears to make sense. And if you’ve got a situation that uses a lot of bandwidth, you need to look more closely at potential self-hosted savings that could be feasible with negotiated bandwidth reductions.

Update (Feb. 14): This post has garnered a lot of much appreciated attention. From the comments, I see that two clarifications would be helpful:

  1. The key point here is that a comparison of the cost of cloud hosting versus self-hosting needs to be based on the profile of your load. It is not that Amazon (or any other provider) is more expensive than self-hosting, as this is often not the case. Rather, it depends on the profile of your load. Moreover, it’s not so important where exactly your breakeven point is but rather it is most important to know the main sensitivities (e.g. bandwidth cost, CPU load, storage, etc.) for your situation so that you can understand which differences could flip the decision. The results here are for this example only and other examples will produce different results, some in favor of cloud and some in favor of self-hosting.
  2. The specific use case I’ve chosen is for a business that’s pretty far along. But some people have been wondering how this example applies to startups. That’s a great question.

While I’ve referred to “spiky” loads, there’s another way to say that which is “variable,” “unknown” or “unpredictable,” which describes the situation that a startup (or other new business endeavor) usually finds itself in. In those cases, the fact that you cannot forecast very well is a reason why it’s highly unlikely you’ll save money by self-hosting…because you’re very unlikely to buy the right amount of capacity. You’ll either overprovision and waste money on unused capacity, or you’ll buy too little and compromise the business. So while you might not call your startup load “spiky,” the fact that it’s unpredictable gives it a similar profile in the model and hence the economic conclusion would tell you to go with the cloud infrastructure route.

Another not-strictly-economic respect that needs to be considered for startups (and others) is the benefit of focusing one’s attention on primary value-creating activities versus commodity activities (relative to the business) that one might not be very good at anyway. In addition, AWS and other cloud providers give us the highly valuable ability to experiment with little downside. This is especially important for the highly iterative and trial-and-error nature of building successful Internet businesses.

The point of this particular example is that if you have a significant amount of load that is well known and predictable then you may be able to save some money by bringing a portion or all of that inside.

Charlie Oppenheimer is a serial-CEO and currently an executive-in-residence at venture-capital firm Matrix Partners. His most recent company, Digital Fountain, was acquired by Qualcomm, and his previous company, Aptivia, was acquired by Yahoo. He blogs at stratamotion.com

117 Responses to “Which is less expensive: Amazon or self-hosted?”

  1. Great post! Thanks for clarifying the “startup” angle. I’d like to add a couple of observations. The AWS model (or any similar vendors) are making the launch of new startups much more likely. This model boosts innovation where someone can just provision what is needed, develop and deploy with minimal costs and logistics. The 2nd one is related to the labor costs. If the organization already has a team of sys admin, network admin, database admin, etc. it make sense to compare. However, we are using AWS and require minimum labor to administer the entire platform. If you count each resource with an average cost of $100K/year (possibly more based on location and benefits), this adds up quickly. You mention that labor is in the model, but not included in the metrics shown. I would think that labor skews the results quite a bit – specially for startups and small business. A hosting like AWS makes it possible for small biz to host their application efficiently.
    Thanks for posting this! Some valuable info.

  2. coppenheimer

    (from the author…)

    Terrific discussion here which I greatly appreciate.

    There’s one angle that I’ve neglected to make as clear as necessary particularly for early stage businesses. The post here generally is about a way to look at the comparison -on economic factors only-. The specific use case I chose was clearly for a business that’s pretty far along. But I see that some people are directly or tangentially thinking about how this applies to startups.

    Allow me to clarify because this is important.

    While I’ve referred to “spiky” loads, there’s another way to say that which is “variable”, “unknown” or “unpredictable” which describes the situation that a startup (or other new business endeavor) usually finds itself in. In those cases, the fact that you cannot forecast very well is a reason why it’s highly unlikely you’ll save money by self-hosting…because you’re very unlikely to buy the right amount of capacity. You’ll either overprovision and waste money on unused capacity or you’ll buy too little and compromise the business. So while you might not call your startup load “spiky”, the fact that it’s unpredictable gives it a similar profile in the model and hence the economic conclusion would tell you to go with the cloud infrastructure route.

    Another not-strictly-economic respect that needs to be considered for startups (and others) is the benefit of focusing one’s attentions on primary value-creating activities vs commodity activities (relative to the business) that one might not be very good at anyway. In addition, AWS and other cloud providers give us the highly valuable ability to experiment with little downside. This is especially important for the highly iterative and trial-and-error nature of building successful Internet businesses.

    The point here is that if you have a significant amount of load that is well known and predictable then you may be able to save some money by bringing a portion or all of that inside.

    Charlie Oppenheimer

  3. Tony Hanly

    Thanks for the analysis. I use a combination of self-hosted and in the cloud. However, I have full-time IT resources and the hardware and systems and expertise to go the self-hosted route. The majority of businesses simply want to put content somewhere and the cloud solution is becoming the most obvious solution for them. I think scale of operations is an important part of the model.

  4. At Infraserve we have always maintained that Hybrid cloud is the way to go. Not only from a cost perspective but ere are good operational reasons such as security of data or latency.

    Secondly it is great to see someone showing where AWS sits in the cloud computing world. Namely as a good service for highly elastic cloud computing. If you have steady workloads then AWS is not cost effective. Just like you don’t go to a hire car company if you need the same type of car month in and month out. You buy or lease one.

  5. It is apparent the data was wrong to begin with which would explain how self-hosting is cheaper. Unless the assumption is you are a senior network engineer savvy enough to setup your own servers and maintain them, while running your business / taking care of clients, fine. That is a big expense missed, and only one of them.

  6. This argument was made and discredited several years ago so it’s odd to see it resurface on a respected blog – http://cloudshaping.com/2010/06/07/is-the-cloud-more-or-less-expensive-than-co-location/ and others have shown that self hosting is NEVER less expensive that public cloud. There may be other reasons that companies like Zynga have their own datacenter like control, isolation from other tenants etc but cost differentials for hardware are virtually never a good reason to make a commitment to either cloud or private.

    • Daniel Golding

      It was not discredited – unless you are a “CloudCEO”, I suppose. At small scales, cloud makes sense. At medium scales cloud sometimes make sense. At large scales, cloud almost never makes sense. The math here is pretty basic, but a 50k server installation is not going to be cheaper in AWS than it is in dedicated. And dedicated hosting for 50k servers will be more expensive than colo. There are grey zones, but at significant scaling, cloud doesn’t work well.

  7. I Am OnDemand

    The analysis is nice but it is ignoring so many things. All the comments are great but I miss the strategic point of view on cloud adoption? you can try to quantify all the benefits you get from the cloud and compare them to the traditional world but it is just like analyzing any other utility evolution based change – it just makes more sense to utilize the resources as a service. One thing that I do agree on is that Mega service vendors (zynga, facebook .. ) should consider having their own optimized cloud exactly like mega factories that generate their own electricity or closing their own customer deals with electricity vendors.

  8. We have thousands of users who run their servers on Amazon using one of our AMIs (http://bitnami.org) Whether EC2 is more or less expensive than the alternatives is a complex topic highly dependent on your particular requirements. Having said that, our perspective is that you need to consider the overall value you get for your money. There are features like the ability to take incremental snapshots of your entire machine for backup or cloning purposes that have no match by other hosting providers. That feature alone saves us innumerable hours of development and system administration. Our staging servers are not setup like or production servers, they can *be* exact copies of your production servers, on demand. How many bugs/issues do you think we avoid with that? For us, enough of them that any price difference with a “traditional” setup is not worth it

    • Tony Lucas

      Without wanting to go off topic, the suggestion that features like snapshots are unique to EC2 is wildly incorrect, (happy to talk more in private if you like) but I do get the point that you are making overall.

      I don’t think anywhere near enough value has been placed on the human resource element or the additional tools and functionality that a cloud platform with full orchestration capabilities can provide compared to traditional inhouse IT. One of the key elements that got mentioned briefly in another comment was all around time to delivery. Being able to provision servers, networks, disks etc in seconds can make a substantial difference to a companies effectiveness.

  9. The comments on this post explain exactly why we need these sorts of discussions. It gets people thinking about cloud costs. We believe that not many people and companies fully understand or evaluate their costs on using a specific deployment option of a specific cloud.

    We have created a free tool which helps users do exactly what you have done here. First create a deployment of a users requirements of the cloud, then use the latest prices from cloud providers to run their system through a simulation and give a detailed cost report. They can then change their deployment to see how their choices (and even cloud providers) will impact costs.

    I would love to get your thoughts on the tool: http://www.ShopForCloud.com

    Hassan Hosseini,
    ShopForCloud.com

  10. Greg Arnette

    My 5 year old start-up Sonian http://sonian.com has been running in the AWS cloud since day one. We have one of the best use cases for cloud computing: variable workload and we strore lots of large data on S3. Trying to replicate our infrastructure needs to a non-cloud would make the business model unviable.

    I have blogged about the challenges of managing costs in the cloud here http://goo.gl/7YeOm : Gaming the Cloud, the Need for “Cost Aware” applications, and Having the Right use Case for Cloud Computing.

    And here’s a post The Secret Life of the Cloud Cost Czar – http://goo.gl/OJimJ

    The cloud can work if you know going in the need for new ways of thinking about managing costs and designing systems to work with that requirement.

  11. One conclusion from this post can be that pure cloud is not the answer for everything and that organizations need to understand more variables than price alone when making infrastructure decisions.
    One of this variables, absent in the analysis, is the people cost, which can be significant. For some, the first cost could be to find and train the skills they need to manage infrastructure (Sys Admin, DBA, saecurity, etc).

    Even if you have a team available, setting up, maintaining, patching and optimizing infrastructure has very clear costs that must be taken into account. Who’s watching your servers at 3 AM in the morning?

  12. Dan Farfan

    As a startup ( artcollectorgame.com ) I couldn’t find a vendor more suited to my needs than Rackspace.com. Their entry point is low $$. Their servers stay up. Their pricing is linear. (I know what each 100 simultaneous players costs me, first 100 = 5th 100). I haven’t read anything anywhere what makes me think moving to AWS once I’m as large as the company modeled in this analysis would be a good idea.
    I think it’s an interesting choice for Zynga to move to self-hosted. That’s what big companies do. They use their profits to lower costs and grow profits. That’s not what a gamer’s company would do. They’d understand self-hosting puts you partially out of the game business and into the data center business – EXACTLY how buying a PBX made you an accidental phone company. But, a lot of folks did it for a few decades (probably also MBAs). I guess MBA schools still don’t have case studies to teach the calamity that can follow entering an internal business accidentally. I know I have no interest in it at all, no matter how big I get.
    @DanFarfan

  13. Interesting artical…. I have a few additional points with respect to self hosting
    (a) Cost of human operations either in Data center as well as co-location is not covered.
    (b) Even though people give SLA on availability, there are many options on AWS to bring up servers in very short time. That may be a challenge in self hosting in terms of investment as well as customer dis pleasure.
    (c) There are various techniques to save money by reducing to minimum number of servers during off peak time & scaling up as required. This may be a challenge in self hosting.
    (d) Availability of various production related services (viz. DNS, IP, Storage, Content Delivery, Caching, Database, Bandwidth, alerts, monitoring & many more) readily available for integrations with applications. Availability of all these will be a challenge.
    (e) Apart from direct costs I feel people also should think of Indirect costs & maintenance challenges which are going to be incurred during self hosting.

  14. Yaron Raps

    I always questioned the cost saving Amazon AWS provides for medium-large infrastructure. I think this proves that. Amazon is cost effective providing compute, storage, and network to small/ startup companies. Working with large firms, I never really had a discussion with any CIO who is considering to move to amazon cloud services. The alternative were (all do it yourself) more around private cloud, and IaaSbsolutions.

  15. Juan José Vázquez Rubio

    Thanks. Finally someone in our camp is thinking in the customer not in the coparison between tech models which has to deliver vaule not comparative savings.

  16. Mark Simmons

    Agree your analysis is interesting but the devil is in the detail. Each situation is unique so I would be wary of extrapolating theses results.
    One cost you have ignored is the capital cost.

    IMHO you need to include a cost of capital, not just the amortised cost. Using the capital elsewhere may provide a better return to an organisation than investing in infrastructure.

  17. Muath Barakat

    Nice, we should not forget the capital cost that you will have when you are uncertain about the traffic that your service will generate, specially in large scale projects, as Amazon solution will be pretty beneficial in such cases.

  18. Only at that amount of traffic you would be a complete idiot if you don’t run your own autonomous system. Sizing it for this kind of bandwidth you would need to count on about $ 2,500/month connection fees for which you’d get a multihomed N+1 redundant network with at least 3 10 Gbps upstreams.
    It would be reasonable to change the 60K figure for self hosted into 30K.