Blog Post

Which is less expensive: Amazon or self-hosted?

Updated. Amazon Web Services (AWS), as the trailblazing provider of Infrastructure as a Service (IaaS), has changed the dialog about computing infrastructure. Today, instead of simply assuming that you’ll be buying and operating your own servers, storage and networking, AWS is always an option to consider, and for many new businesses, it’s simply the default choice.

I’m a huge fan of cloud computing in general and AWS in particular. But I’ve long had an instinct that the economics of the choice between self-hosted and cloud provider had more texture to it than the patently attractive sounding “10 cents an hour,” particularly as a function of demand distribution. As a case in point, Zynga has made it known that for economic reasons, they now use their own infrastructure for baseline loads and use Amazon for peaks and variable loads surrounding new game introductions.

An analysis of the load profiles

To tease out a more nuanced view of the economics, I’ve built a detailed Excel model that analyzes the relative costs and sensitivities of AWS versus self-hosted in the context of different load profiles. By “load profiles,” I mean the distribution of demand over the day/month as well as relative needs for bandwidth versus compute resources. The load profile is the key factor influencing the economic choice because it determines what resources are required and how heavily these resources are utilized.

The model provides a simple way to analyze various load profiles and allows one to skew the load between bandwidth-heavy, compute-heavy or any combination. In addition, the model presents the cost of operating 100 percent on AWS, 100 percent self-hosted as well as all hybrid mixes in between.

In a subsequent post, I will share the model and describe how you can use it for scenarios of interest to you. But for this post, I will outline some of the conclusions that I’ve derived from looking at many different scenarios. In most cases, the analysis illustrates why intuition is right (for example, that a highly variable compute load is a slam dunk for AWS). In other cases, certain high-sensitivity factors become evident and drive the economic answer. There are also cases where a hybrid infrastructure is at least worthy of consideration.

To frame an example analysis, here is the daily distribution of a typical Internet application. In the model, traffic distribution is an input from which bandwidth requirements are computed. The distribution over the day reflects the behavior of the user base (in this case, one with a high U.S. business-hour activity peak). Computing load is assumed to follow traffic according to a linear relationship, i.e. higher traffic implies higher compute load.

Note that while labor costs are included in the model, I am leaving them out of this example for simplicity. Because labor is a mostly fixed cost for each alternative, it will tend not to impact the relative comparison of the two alternatives. Rather, it will impact where the actual break-even point lies. If you use the model to examine your own situation, then of course I would recommend including the labor costs on each side.

For this example, to compute costs for Amazon, I have assumed Standard Extra Large instances and ELB load balancer for the Northern California region. The model computes the number of instances required for each hour of the day. Whenever the economics dictate it, the model applies as many AWS Reserved Instances (capacity contracts with lower variable costs) as justified and fills in with on-demand instances as required. Charges for data are computed according to the progressive pricing schedule that Amazon publishes. To compute costs for self-hosting, I assume co-location with the peak number of Std-XL-equivalent servers required, each loaded to no more than 80 percent of capacity. The costs of hardware are amortized over 36 months. Power is assumed to be included with rackspace fees. Bandwidth is assumed to be obtained on a 95th percentile price basis.

Now let’s look at a sensitivity analysis. Notice in the above example, that a bit more than half of the total cost for each alternative is for bandwidth/data transfer charges ($35,144 for self-hosted at $8/Mbps and $36,900 for AWS). This is important because while Amazon pricing is fixed and published, 95th percentile pricing is highly variable and competitive

The chart above shows total costs as a function of co-location bandwidth pricing. AWS costs are independent of this and thus flat. What this chart shows is that self-hosting costs less for any bandwidth pricing under about $9.50 per Mbps/Month. And if you can negotiate a price as low as $4, you’d be saving more than 40 percent to self-host. I’ll leave discussion of the hybrid to another post.

This should provide a bit of a feel for how I’ve been conducting these analyses. Above is a visual summary of how different scenarios tend to shake out. The intuitive conclusion that the more spiky the load, the better the economics of the AWS on-demand solution is confirmed. And similarly, the flatter or less variable the load distribution, the more self-hosting appears to make sense. And if you’ve got a situation that uses a lot of bandwidth, you need to look more closely at potential self-hosted savings that could be feasible with negotiated bandwidth reductions.

Update (Feb. 14): This post has garnered a lot of much appreciated attention. From the comments, I see that two clarifications would be helpful:

  1. The key point here is that a comparison of the cost of cloud hosting versus self-hosting needs to be based on the profile of your load. It is not that Amazon (or any other provider) is more expensive than self-hosting, as this is often not the case. Rather, it depends on the profile of your load. Moreover, it’s not so important where exactly your breakeven point is but rather it is most important to know the main sensitivities (e.g. bandwidth cost, CPU load, storage, etc.) for your situation so that you can understand which differences could flip the decision. The results here are for this example only and other examples will produce different results, some in favor of cloud and some in favor of self-hosting.
  2. The specific use case I’ve chosen is for a business that’s pretty far along. But some people have been wondering how this example applies to startups. That’s a great question.

While I’ve referred to “spiky” loads, there’s another way to say that which is “variable,” “unknown” or “unpredictable,” which describes the situation that a startup (or other new business endeavor) usually finds itself in. In those cases, the fact that you cannot forecast very well is a reason why it’s highly unlikely you’ll save money by self-hosting…because you’re very unlikely to buy the right amount of capacity. You’ll either overprovision and waste money on unused capacity, or you’ll buy too little and compromise the business. So while you might not call your startup load “spiky,” the fact that it’s unpredictable gives it a similar profile in the model and hence the economic conclusion would tell you to go with the cloud infrastructure route.

Another not-strictly-economic respect that needs to be considered for startups (and others) is the benefit of focusing one’s attention on primary value-creating activities versus commodity activities (relative to the business) that one might not be very good at anyway. In addition, AWS and other cloud providers give us the highly valuable ability to experiment with little downside. This is especially important for the highly iterative and trial-and-error nature of building successful Internet businesses.

The point of this particular example is that if you have a significant amount of load that is well known and predictable then you may be able to save some money by bringing a portion or all of that inside.

Charlie Oppenheimer is a serial-CEO and currently an executive-in-residence at venture-capital firm Matrix Partners. His most recent company, Digital Fountain, was acquired by Qualcomm, and his previous company, Aptivia, was acquired by Yahoo. He blogs at stratamotion.com

117 Responses to “Which is less expensive: Amazon or self-hosted?”

  1. I Am OnDemand

    I agree with Jeff S that this one is a bit misleading. I also think that it is not include all the considerations. I also find that some of this comparisons are leading to the fact that the comparison is not relevant and the enterprise will need to make some strategic long term decisions. Anyway I suggest you to test drive your costs with Newvem – https://www.savings.com – give it a try.

    Ofir.
    @iamondemand

  2. The comparison is flawed because the two options don’t have the same starting point. How long would it take me to acquire, build, set up and staff a working data centre to get it to the comparison point with Amazon? Would I get capital approval for that? What is the time-to-value for what I’m trying to do? What if the business environment changes one year later and my asset utilization drops to uncompetitive levels – do I write down my investment?

    With Amazon I have all the options to turn on a dime, realize my ROI and stop tomorrow if the world changes.

    At the risk of caricaturing the point, it’s as if you’re trying to build a cost model for someone buying (or should I say building) his own car because under certain very specific conditions – assumed to be long-term and unchanging – it would be cheaper than renting a car.

  3. Sujitra vasudev

    Quite an interesting analysis, we had hosted a multi-player game in AWS and found that the loading and computing time where reasonably slow. Its true it does not work for all the scenarios. Is there any better choice for start ups than relying on the cloud services.

  4. Kenny Young

    Popular post, lots of great conversation. Basic logic says every analysis can have a wide variety of interpretation.
    With my clients I see mostly startups and medium side (and under) being the majority of our cliental. For those a cloud only solution is fitting in many cases. The nimble will be able to take advantage while not having to worry about the network engineering and administration teams breathe, hence, lack of depth. Some of our cloud customers do not have an IT department.
    Our larger customers are starting off with a hybrid solution. For those in the larger size I find they have a larger risk – over burdened by projects, complex environments, sensitive data, and so on. Many of these customers need opportunities such as hardware replacement cycles and moving smaller pieces of larger projects where it makes sense.
    One thing to add, it would be nice to see an analysis of a storage scenario. SAN versus cloud storage, this is where large savings live.

  5. Sujoy Gupta

    The labor costs of operating an AWS based system are much lower compared to self hosted. This is because a lot of things that ops folks are spending time on when self hosted simply disappear when you are using AWS. This is even more salient when using a database hosted on AWS. When you add this higher cost of labor to self hosted solution, AWS wins in both flat as well as spiky category.

  6. thefatbrain

    often time people are looking into the cost involved into running some application over the course of 3 years and say oh self-hosting is cheaper. But I would look into it from an angle if say AWS is charging me “X” amount of dollar every month, am I able to achieve the same level of availability that AWS is giving me by spending the $X/month self-hosted?
    The answer is often a no.

  7. Robert Gray

    While this analysis may or may not be valid for large and varying workloads is does not apply to the low end of startups and mobile apps pulling from the cloud, and requiring less than a full server instance. At the “entry end” of the market it is a “no-brainer” to subscribe to something like AWS with it’s lack of capital deployment, lack of infrastructure management and scalable fees. Our At-Hand Guides (New England Day Trips At-Hand) iPhone app pays all or $1.50/month for S3 fees for our 5,000 photo gallery with multiple resolutions stored. And we have just moved beyond Heroku’s FREE server platform (also running on AWS) for a fee of $15 per month. Try doing either of those with co-hosting or your IT own shop.

  8. Bruce Coleman

    Charlie seems to have missed a large part of the costs. There is a great deal of work to keeping systems, software, security etc going. Good use can transfer much of this to the cloud provider.

  9. John Shepard

    If all Amazon customers had high load of requests at the same time, the cloud service wouldn’t meet demand. The analysis shown is based on most typical scenario.
    The biggest problem for infrastructure solutions is to know effective cause of bottlenecks and how to solve them fast, as well as the availability of resources. The hosting model in this case doesn’t matter.

  10. Ozgur Akan

    I am not sure if it is possible to put all aspects into the picture while deciding which one is cheaper than other.

    I think it is about what I want to focus on.

    If I want to focus on my business I might benefit from IaaS more since I will not have to deal with hardware issues, won’t have to plan/deal with so many details that comes with the evolution of hardware and datacenters. This would let me focus more on the service I provide.

    If technology (hardware) is the core of my business, then I would eventually focus on hardware as that would be the tool which enables my services. If I am Google, I would do my own hosting. That is what they do now.

    Cost is evaluated with the value. Higher cost in any one of these choices my bring higher proportional value as well. It may be very misleading to go with side-to-side cost comparision.

    OZ

    • thefatbrain

      Second this opinion. I still do not believe there is a crystal ball that can give us a true apple-to-apple comparison of which option(IaaS or self-hosted) is more economical because most of the analysis are missing bits and pieces and there is no bullet proof way to come up with such analysis. And chances are doing similar analysis in different regions and countries would get a different result. However, putting ones’ business focus ahead of $$ is a good way of determining which way one should go.

  11. Jim Stikeleather

    Ihave been playing around with a number of back of the envelope calculations across the different providers and it seems to suggest 2 criteria (although hybrid is probably the end game for everyone). As pointed out, spikiness is clearly a cloud integrator. the other one that seems to appear is scale – from and adjacency perspective – back up recovery, security, quality of service requirements, etc. Until the volume reaches a certain point, the public cloud offers the best cost on these adjacency issues for any reasonable level of service.

  12. Interesting analysis, but totally missing the mark on why you would want to do cloud in the first place, which isn’t cost. It’s operational expenditure (OPEX) over capital expenditure (CAPEX).

  13. Yes, this has been a constant question thrown up at often, the way I look at it case to case. but, what I see the question of the analysis is that should we restrict this only to bandwidth and compute only, which more often the customers don’t see it as an issue directly unless you are a ecommerce company where spike is often the real challenge to deal with. so, in my opinion, going beyond this, the decision also # servers in your data centre, data size transfer, load balancer, and the customer base size will determine anytime the cost of the economy for cloud. indeed this is a insightful article to reference too!

  14. Dave Dopson

    This article is _wildly_ misleading. Labor and engineering cost isn’t even factored in and you are only saving 10k….. No way in hell you could pay me to maintain 150 servers for a year for only 10k. 100% guaranteed that >> 1 of them WILL break and you will be stuck dealing with RMA and other BS.

    Not to mention the tacit assumption that you 100% know the exact capacity you are trying to buy …. and that you really really want to buy it with upfront hard cash. So in the “roll your own” scenario, the up-front costs are pretty substantial with $0 back if you accidentally over-provision. It’s not about “spiky” load … it’s about the multi-week lead time to acquire new hardware and the resulting need to buy more than you need to ensure no downtime. Besides, as blogged elsewhere, given that you are buying physical servers and amortizing them over 3 years, it’s highly disingenuous not to use 3yr AWS reserved instances which have most of the cost advantages and far more flexibility versus purchasing outright.

    But in the end it comes back to all the things you will have to do yourself (like predict capacity, finance up-front hardware purchases, RMA dead hardware, …. oh yeah, and BUY RACKS/SWITCHES/ROUTERS and debug that stuff) …. There is no way on earth you will beat AWS pricing at this level of scale. Your scale is off by several orders of magnitude for what is needed to start thinking about “rolling your own”. The NRE (non-recurring-engineering costs) alone will absolutely swamp out any savings you might *think* you are getting.

  15. Another simple analysis would be to base it on predictability. If your business and traffic is predictable then self host. Otherwise use a public cloud. For most startups cloud is the best since both traffic and business would be unpredictable when you start.

  16. Brian McCallion

    I like the work done to examine costs. And I think the question of costs is interesting. However, I question focus on “cost” as a path to insight. Mostly I find the focus on costs results in obfuscation and may distract from more compelling opportunities for insight. While Zynga seems to be the case often cited when challenging the assumption of Amazon AWS as the “default” for startups, I wonder whether in the case of the startup, monthly service costs are really what Venture Capitalists really are concerned over when they mandate Amazon AWS. Certainly no VC wants to see money spent on clearly wasteful services. Yet why hand over a substantial amount of capital at all? Clearly VCs expect a startup to focus on proving at growing the specific opportunity for which the funding was provided. In Zynga’s case, there may be good reasons to go to all the trouble of hosting applications and managing them. However, it many startups I think Amazon AWS is not just a cost-effective option, it’s a service that helps the startup founders and staff from “tinkering” and building services, processes, and application that do not visibly result in forward progress in executing on the idea the VC and founders seek to capture in the first place. How much is the distraction, the focus of hard to recruit, often unique individuals worth when compared to hosting fees and web services fees. In my opinion, the greatest value Amazon AWS and other Cloud platforms offer is the ability for a firm to maintain a laserlike focus on the business opportunity that will create real value. Certainly there are outliers, but startups and enterprises alike have much to lose by focusing on a “dismal” question of costs, while ignoring why the company is spending any money at all on technology services: grow the business.

  17. If you change bandwidth provider to Cogent in the Self hosted model you save about $30,000 a month. Cogent for those requirements can be had for $5,000 a month… Sorry Amazon – you’re too expensive to compete with self hosted.