Blog Post

Which is less expensive: Amazon or self-hosted?

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Updated. Amazon Web Services (AWS), as the trailblazing provider of Infrastructure as a Service (IaaS), has changed the dialog about computing infrastructure. Today, instead of simply assuming that you’ll be buying and operating your own servers, storage and networking, AWS is always an option to consider, and for many new businesses, it’s simply the default choice.

I’m a huge fan of cloud computing in general and AWS in particular. But I’ve long had an instinct that the economics of the choice between self-hosted and cloud provider had more texture to it than the patently attractive sounding “10 cents an hour,” particularly as a function of demand distribution. As a case in point, Zynga has made it known that for economic reasons, they now use their own infrastructure for baseline loads and use Amazon for peaks and variable loads surrounding new game introductions.

An analysis of the load profiles

To tease out a more nuanced view of the economics, I’ve built a detailed Excel model that analyzes the relative costs and sensitivities of AWS versus self-hosted in the context of different load profiles. By “load profiles,” I mean the distribution of demand over the day/month as well as relative needs for bandwidth versus compute resources. The load profile is the key factor influencing the economic choice because it determines what resources are required and how heavily these resources are utilized.

The model provides a simple way to analyze various load profiles and allows one to skew the load between bandwidth-heavy, compute-heavy or any combination. In addition, the model presents the cost of operating 100 percent on AWS, 100 percent self-hosted as well as all hybrid mixes in between.

In a subsequent post, I will share the model and describe how you can use it for scenarios of interest to you. But for this post, I will outline some of the conclusions that I’ve derived from looking at many different scenarios. In most cases, the analysis illustrates why intuition is right (for example, that a highly variable compute load is a slam dunk for AWS). In other cases, certain high-sensitivity factors become evident and drive the economic answer. There are also cases where a hybrid infrastructure is at least worthy of consideration.

To frame an example analysis, here is the daily distribution of a typical Internet application. In the model, traffic distribution is an input from which bandwidth requirements are computed. The distribution over the day reflects the behavior of the user base (in this case, one with a high U.S. business-hour activity peak). Computing load is assumed to follow traffic according to a linear relationship, i.e. higher traffic implies higher compute load.

Note that while labor costs are included in the model, I am leaving them out of this example for simplicity. Because labor is a mostly fixed cost for each alternative, it will tend not to impact the relative comparison of the two alternatives. Rather, it will impact where the actual break-even point lies. If you use the model to examine your own situation, then of course I would recommend including the labor costs on each side.

For this example, to compute costs for Amazon, I have assumed Standard Extra Large instances and ELB load balancer for the Northern California region. The model computes the number of instances required for each hour of the day. Whenever the economics dictate it, the model applies as many AWS Reserved Instances (capacity contracts with lower variable costs) as justified and fills in with on-demand instances as required. Charges for data are computed according to the progressive pricing schedule that Amazon publishes. To compute costs for self-hosting, I assume co-location with the peak number of Std-XL-equivalent servers required, each loaded to no more than 80 percent of capacity. The costs of hardware are amortized over 36 months. Power is assumed to be included with rackspace fees. Bandwidth is assumed to be obtained on a 95th percentile price basis.

Now let’s look at a sensitivity analysis. Notice in the above example, that a bit more than half of the total cost for each alternative is for bandwidth/data transfer charges ($35,144 for self-hosted at $8/Mbps and $36,900 for AWS). This is important because while Amazon pricing is fixed and published, 95th percentile pricing is highly variable and competitive

The chart above shows total costs as a function of co-location bandwidth pricing. AWS costs are independent of this and thus flat. What this chart shows is that self-hosting costs less for any bandwidth pricing under about $9.50 per Mbps/Month. And if you can negotiate a price as low as $4, you’d be saving more than 40 percent to self-host. I’ll leave discussion of the hybrid to another post.

This should provide a bit of a feel for how I’ve been conducting these analyses. Above is a visual summary of how different scenarios tend to shake out. The intuitive conclusion that the more spiky the load, the better the economics of the AWS on-demand solution is confirmed. And similarly, the flatter or less variable the load distribution, the more self-hosting appears to make sense. And if you’ve got a situation that uses a lot of bandwidth, you need to look more closely at potential self-hosted savings that could be feasible with negotiated bandwidth reductions.

Update (Feb. 14): This post has garnered a lot of much appreciated attention. From the comments, I see that two clarifications would be helpful:

  1. The key point here is that a comparison of the cost of cloud hosting versus self-hosting needs to be based on the profile of your load. It is not that Amazon (or any other provider) is more expensive than self-hosting, as this is often not the case. Rather, it depends on the profile of your load. Moreover, it’s not so important where exactly your breakeven point is but rather it is most important to know the main sensitivities (e.g. bandwidth cost, CPU load, storage, etc.) for your situation so that you can understand which differences could flip the decision. The results here are for this example only and other examples will produce different results, some in favor of cloud and some in favor of self-hosting.
  2. The specific use case I’ve chosen is for a business that’s pretty far along. But some people have been wondering how this example applies to startups. That’s a great question.

While I’ve referred to “spiky” loads, there’s another way to say that which is “variable,” “unknown” or “unpredictable,” which describes the situation that a startup (or other new business endeavor) usually finds itself in. In those cases, the fact that you cannot forecast very well is a reason why it’s highly unlikely you’ll save money by self-hosting…because you’re very unlikely to buy the right amount of capacity. You’ll either overprovision and waste money on unused capacity, or you’ll buy too little and compromise the business. So while you might not call your startup load “spiky,” the fact that it’s unpredictable gives it a similar profile in the model and hence the economic conclusion would tell you to go with the cloud infrastructure route.

Another not-strictly-economic respect that needs to be considered for startups (and others) is the benefit of focusing one’s attention on primary value-creating activities versus commodity activities (relative to the business) that one might not be very good at anyway. In addition, AWS and other cloud providers give us the highly valuable ability to experiment with little downside. This is especially important for the highly iterative and trial-and-error nature of building successful Internet businesses.

The point of this particular example is that if you have a significant amount of load that is well known and predictable then you may be able to save some money by bringing a portion or all of that inside.

Charlie Oppenheimer is a serial-CEO and currently an executive-in-residence at venture-capital firm Matrix Partners. His most recent company, Digital Fountain, was acquired by Qualcomm, and his previous company, Aptivia, was acquired by Yahoo. He blogs at

117 Responses to “Which is less expensive: Amazon or self-hosted?”

  1. IMHO , your example must elaborate labor cost when you are trying to compare the cost between self hosted Vs cloud hosting. Hiring 20 people to manage a self hosted environment for different aspects Vs hiring 2 to 3 people for managing your cloud infrastructure. The cost in labor itself makes a substantial difference for the example you have put up.

  2. I’m not saying your analysis isn’t right, but it does leave out at least one important point: AWS gives you a lot of infrastructure that you simply don’t get, at all, with self posting.

    For example. Your self-hosting analysis doesn’t take into account that you need staff in every city where you have a data center. For a small business or startup, the complexity of maintaining a presence in multiple physical data centers in a self-hosted or even leased solution is considerable, to say the least. With Amazon’s abstractions, deploying worldwide with multiple availability zones is relatively simple, especialy with ELB. We for example survived an entire data center outage at Amazon simply by using its availability zones as recommended…

    Then add in S3, Queue Service, Amazon RDS, EBS volumes, etc. Besides replacing physical hardware, Amazon also takes care a lot of the complex infrastructure for you, which is a huge value-ad.

    Admittedly, you can still use those services in a hybrid or even self-hosted solution, but it’s not nearly as convenient or cohesive.

    Food for thought.

  3. Bryan Beal

    This article is very informative, but I think it neglects to take the value of Agility into the equation. For some organizations “cost” is more complicated than simply the expense of Amazon vs the expense of “self hosting.” It could also include things like lost revenue, lost productivity, etc. If Amazon (or any other public cloud) gives an organization the ability to turn services up more quickly and efficiently it may “cost” them less, even if the line item bill from Amazon is higher than the bill to self-host.

  4. George Gamble

    No labor costs? Those will be a lot higher in a self-hosted environment. Also, if you are amortizing over 3 years why not take advantage of the 3 year discount on AWS? Its pretty large – 15-20% I think. At that point the pricing comes to par. And if you are pushing that type of bandwidth, you better be making some serious moola which means this is all moot anyway…

  5. Ivan Kedrin

    Great to see someone else’s models and assumptions. Thanks for sharing.

    Most CTO’s have these comparisons & projections internally — it’s great to compare notes. Have to remember to update the models as prices keep dropping e.g. S3 storage pricing was recently lowered.

  6. … another important aspect (this time, in favor of outsourcing to a reliable cloud provider) – geographic redundancy. Based on the space costs, this looks like a single-location self hosting. While parts of AWS have been known to go down, a properly implemented (zones etc) AWS hosting setup will be more resilient when there’s a regional fiber cut, or a peering dispute between backbones. Then there’s tsunamis, earthquakes, fires, etc… And what’s the cost of your core location being offline for a day? i.e. it appears as though your “self-hosting” math doesn’t have any “insurance”.

  7. Charlie: regarding your charts – because of your nice traffic shape, I am actually very surprised you are only saving $10k (14%) compared AWS. At my video CDN, the majority of our global delivery and storage network has always been self-hosted, but we used AWS for some non-critical aspects.
    When we took some of our storage, application, and delivery components in-house, we were seeing cost reduction close to 50% in some cases.

  8. And that makes perfect sense – for a company that knows what it’s doing (that’s important), self-hosting is always cheaper than Amazon. The beauty of AWS (or any outsourced “cloud” solution for that matter) is capacity “on tap”. You have a major live event coming up, or you just ran an ad on national TV? Spin up 100 servers to absorb the spike, take them down when done. And you didn’t need to provision additional cabinets, power, or sign new bandwidth commits for a year.

    Other than that, if you run a service characterized by steady traffic patterns and predictable load – it’s almost always cheaper to run your own infrastructure.

    • Well the other value is that AWS isn’t just virtualized servers. You get a lot of other infrastructure. It lets you hit the ground running as a dev shop instead of focusing on reinventing Amazon’s wheel.

  9. Terrific discussion here which I greatly appreciate.

    One point that several people are raising directly and tangentially is that the decision is not just about the numbers. This is an important point which is worth acknowledging and amplifying because it speaks to the reasons why we are all so excited about having on-demand cloud infrastructe. There are many situational variables that need to be considered and economics alone do not provide a well reasoned decision.

    For example, let’s the the situation of an early-stage business. While I’ve referred to “spiky” loads, there’s another way to say that which is “variable”, “unknown” or “unproven” loads which describes the situation that a startup usually finds itself in. In those cases, the fact that you cannot forecast very well is the reason why it’s unlikely you’ll save money by self-hosting…because you’re very unlikely to buy the right amount of capacity. You’ll either overprovision and waste money on over-capacity or you’ll buy too little and compromise the business. Another respect that’s important to consider for startups (and many others) is the benefit of focusing one’s attentions on primary value-creating activities vs commodity (relative to your business) activities that we might not be very good at anyway. And finally, AWS (and other cloud providers) give us the highly valuable ability to experiment with little downside. If you’re a fan of the Lean Startup methodology, this is essential.

    Charlie Oppenheimer

  10. ☆ Sean Lindsay

    I’ve worked in startups that scaled colo hosting and spent the last 4 years scaling my current company on AWS. And while I can appreciate the subtlety in some of your considerations, I fear this will be misinterpreted by many (investors specifically) in much the same way I’ve seen quantified analysis of the hard cost-savings of offshoring misapplied. There are many valuable soft benefits, especially for early stage companies, and the analysis you’ve done seems to apply most at stable scale.

    I’d hate to see decision makers take away the simplified message and draw entirely the wrong conclusion, especially for small companies where I’d argue in nearly every case the leverage of a good cloud hosting solution is so high.

    • Daniel Golding

      In the case of small companies, a cloud option usually makes sense. But colo hosting makes more sense with database, I/O, and super-bandwidth heavy apps. Self-hosting (non colo) rarely ever makes sense.

  11. Bill de hÓra

    Interesting analysis.

    1: You mentioned AWS ‘wins’ for spiky workloads. Are you talking about load variation on a single workload, having to cater for variation across multiple (possibly conflicting) workloads, both?

    2: If you are bandwidth dominated on reads you should be offloading serving to a CDN provider, which changes the cost model. The exceptional workload perhaps is a lot of users accessing their own private content (ie heavy bandwidth usage is distributed across many media objects instead of users accessing fewer hotter files).

    Btw it assumes the cost of infrastructure and obtaining system qualities is zero. I would be interested in seeing the non-recurring + employee/organizational costs to build a “private” cloud built out that provides the reliability and availability guarantees AWS does, tooling monitoring and so on. Also what the opportunity cost is to any business would be while waiting for all that stuff to get provisioned.

  12. Steve Gorton

    Interesting scenario selection. For every scenario self-hosted, one could find a pro-AWS also.
    Also, not factoring build/provisioning/support costs into the self-hosted mask some costs.
    For me, comparison is good, but not apples-for-apples.

  13. This analysis is miss leading at best, there are several hidden costs in enterprise grade self hosting, there the obvious expenses such as real estate, cooling, security (physical and cyber). also there are the other less obvious ones, such as technicians needed to make sure that the hardware is working and replacing the failing parts whenever needed. Factor all that in and you’ll see how public cloud hosting is a lot cheaper.

    There are also the impossible to calculate costs, such as the unused capacity costs, usually companies purchase hardware 5 years in advance, which means they always have 5 years worth of wasted capacity in the worst scenario, Which was the reason for amazon to start leasing that wasted capacity in the first place.

    In my opinion the best approach is for each company to assess its needs, with Large companies owning their own private cloud and overleasing it to their internal customers.

    • Daniel Golding

      To simply declare that “public cloud hosting is a lot cheaper” as a blanket statement is tough to understand. There is a place in every operating continuum where cloud, dedicated, colocated, and completely self-hosted options make sense. Should Facebook be in a public cloud? It makes no sense.

      Many folks move from one end of the spectrum to the other as they scale up. There is no single solution – especially in public cloud – that will meet all options.

      True “self-hosted” solutions, with your own data center, only make sense at the many 10^5 server level and above. And sometimes not even then.

  14. Giuseppe Miriello

    I tend to agree to your analysis. I work in a datacenter and I noticed that people transition from shared cloud to self-hosting (or dedicated cloud) if they have nil or few traffic spikes.

    Many of them remain self hosted and – as zynga – use AWS to absorb traffic/computation spikes.

  15. Daniel Golding

    There are a bunch of choices – cloud (NOT only AWS), dedicated hosting, colocation with self/hosting.

    I certainly agree that an AWS-only approach is not good. However, when did AWS become the only cloud? And when did dedicated server offerings go away? AWS is not affordable for many base loads, but there are a number of options.

    • “However, when did AWS become the only cloud?”

      They might not be the only player, but they are leagues ahead of any other cloud provider in terms of offerings (they also offer paaS, Saas, CDN not just IaaS), APIs, capacity and so forth. If you were to compare solely on the IaaS aspect Rackspace would be the only option and their pricing is not very “elastic” to meet some types of demand/usage.

    • Tarun Dua

      True, but I think 8USD/Mbps at 5Gbps usage is very high priced bandwidth, USD 2-3/Mbps or less is easily achievable at that scale with self-hosting/dedicated server hosting. 30K – 35K per month savings can be significant. As Daniel points out below, server amortization is not the only way apart from the cloud one can very well rent out dedicated servers for cheap.

  16. Kent Langley

    I found myself wondering how reserved instance utilization would affect the pricing model for AWS over a 36 month period. If you know your going to have a stable core set of instances it seems this could make a significant difference. Amazon claims, “Reserved Instances can provide savings of nearly 60% compared to using On-Demand Instances.”

    This is something I’m exploring for some of my AWS using clients.

    • Bill Jackson

      Two factors that seem really interesting are:
      – how to select the number of reserved instances, and
      – how discount schedules alter costs

      What happens when one is willing to over-reserve, and can negotiate a sweetheart deal with Amazon? That could move the “Amazon Hosted” line significantly.

      • On reserved instances, the model calculates the number of reserved instances required (if any) to minimize total costs. It uses the load distribution to figure this number out so that over the total ((#-reserved * reserved-per-Hour) + amortized-reserved-fee) + (#-on-demand * on-demand-per-hour) is minimized.

        Discount schedules are applied according to the published volume schedules.

        If you can negotiate a better deal than published, then those new numbers go into the model.

        Charlie Oppenheimer

    • Exactly my thought as well. What about the Reputation/Executive Risk of a cloud provider. I will be content proposing use of Amazon as the cloud vendor, not so much for the lesser knowns.

    • Ghazenfer Mansoor

      We are using AWS reserved instances and its quite a savings. Good thing is, its not tied to a specific instance. Its tied to instance type. If you reserved small instance, as long as you have a small instance, you will get savings

  17. Jeff Schneider

    It isn’t clear how you’re accounting for the self-service aspects of cloud computing. How much money did i save when i pushed a button to launch an XL server in 60 seconds? What about assigning it an Elastic IP in 15 seconds? Or attaching storage in 10 seconds?

    The point that Zynga, Google and others have made about long-running workloads typically applies to running *arrays of the same workload* (1 application running on 500 servers). The cloud excels when you run many different applications and need on-demand agility (500 different workloads running across 350 servers).

    This analysis is misleading, IMHO.

    • you didn’t read the part about spikey workloads. any time you have variation in demand, you want to take advantage of some kind of pooling. that’s the cloud’s main pitch: that it pools demand across many companies so they can share a fixed pool. there’s no reason that fixed pool should be AWS, of course: depending on scale, you could obtain the same effects purely internally, by agregating demand between groups within a single company.

    • Also, where is the cost in the personnel to maintain the self-hosted server capacity? It’s bundled in the Cloud / dedicated hosting price.

      How about the cost of spares for quick replacement of self-hosted server capacity that fails? All hardware fails.

      Where’s the cost in downtime as your expensive admins rush from home or a vacation to fix your self-hosted solution, or do you have enough staff to cover 24*7*365?

      Where’s the cost in your admins figuring out how to bring back online your self-hosted solution as they are not doing so day-in-day out?

      Lots of left out costs here.

      • Absolutely. This was the first thing that comes to mind after reading the article.

        This kind of analysis is common for very young startups, on shoestring budget, with bunch of do-it-all, know-it-alls wearing all hats including sysadmin. However, they would typically not have workloads of the dimension discussed. Exceptions of course, may exist.

      • Robert de Bock

        I agree, self-hosting requires staff. But; putting applications on EC2 requires some work as well, though less, because the virtual machines have to be created, loaded with applications, etc.

        But in general; I estimate the cost for sys-admins to be far higher than amazon-hosted.

      • Kind of more replying to the “You don’t have Sysadmins looking after your EC2 instances?” comment.

        Look at the numbers: the above analysis saves you $10k/month, or $120k/year. For 131 servers!

        So you’ve got the money for an extra one to one and a half sysadmins. That’s not enough to cover the difference in handling 131 machines with all their maintenance, and 131 machines in the cloud.

        No question, this is NOT a savings.

    • Absolutely agree, how do you factor in disaster recovery, cost of labor, service level requirements, etc. in the simplistic cost breakout being used. One example: the $12,227 per month space calc – which includes power – is way too low. The cost of power alone eats probably 60-70% of that figure – if you factor the cost of multiple FTE’s handling sys admin, storage admin, general support, etc. – the dollar per month goes significantly above the per month cost allocated. In addition the self hosted cost doesn’t even account for a refresh of those servers, which increases the monthly cost by 25% if you count one refresh over 6 years. The cost of the hosted solution will decrease as the number of instances to handle the core computing requirements is reduced as the price for storage, IO, etc. goes down as you move forward in time.

  18. adrian cockcroft

    Interesting analysis, however if bandwidth costs are dominant you should be factoring a CDN based solution into the model, regardless of whether it’s datacenter or cloud hosted compute.

    • @adrian: Not necessarily. While static content can certainly be CDN hosted quite easily, more dynamic content cannot. This is especially true of user-specific data. In that case, direct peering with the big boys is your best bet.