Analyst Report: Five features Amazon Web Services must fix


Amazon Web Services (AWS) customers frequently wake up to a shiny new feature or service announced by Amazon evangelist Jeff Barr on the AWS blog. Some of these features are significant, and a few go unnoticed. Yet while Amazon is appreciated for the pace at which new features are added to the stack, longtime AWS customers complain that the most requested features are not prioritized.

Based on feedback from AWS users, we have compiled a list of five Amazon EC2 issues that are not only annoying but force customers to look for alternatives.

1. Shared EBS volumes. Elastic Block Store provides persistent storage to Amazon EC2 volumes. Launched in 2009, this was a highly appreciated feature of Amazon EC2 that removed the dependency on Amazon S3, which was slow.

Many engineers immediately attach an EBS volume as soon as an Amazon EC2 instance is launched and move the data that needs persistence. But after four years, the most-asked-for feature of EBS is yet to come: attaching the same EBS volume to multiple EC2 instances. AWS encourages running multiple Amazon EC2 instances behind a load balancer to get optimal performance. In fact, it is not a good idea to run the application on just one EC2 instance. Most content-management systems and media-driven applications rely on shared storage. When these systems are migrated to AWS and put behind an ELB, there is no easy option to share the content across the fleet of EC2 instances running the same application.

For example, an end user might upload a new image to one of the content servers randomly picked by the load balancer. Now, replicating this image across all the running servers is left to the developers. While AWS recommends using Amazon S3 for storing static content, many popular CMS frameworks expect the content to be available on the local file system. To ensure that all the servers share the latest content, it is mandatory to implement a distributed file system like Gluster or NFS. This requires advanced skills and involves launching a dedicated VM to run the file server. It also makes the deployment fragile, with the file server becoming a single point of failure.

If Amazon were to support sharing the same EBS volume across multiple EC2 instances, it would avoid the need for a dedicated file server and additional configuration on each server. This is not a complex feat: Google Compute Engine supports mounting the persistent disks simultaneously on multiple instances. Though only one instance will have read/write access to the file system, all the instances will immediately gain access to the content. Still in technology preview, Google Compute Engine is aiming at leapfrogging Amazon EC2 in performance and features. Early benchmarks reveal that GCE will be a viable alternative to Amazon EC2.

2. Provisioned ELB traffic. Elastic Load Balancing (ELB) provides a mechanism to distribute the traffic evenly across multiple Amazon EC2 instances. Amazon positions ELB almost as a magical service that provides high uptime and scalability. According to the official description of ELB, “It enables you to achieve even greater fault tolerance in your applications, seamlessly providing the amount of load balancing capacity needed in response to incoming application traffic.”

The seamless increase in load-balancing-capacity promise is certainly misleading, as ELB is designed to scale up and scale out gradually with the linear increase in traffic. This is fine for ecommerce portals or ticket sales that start with less traffic and grow with time. But if a website running behind an ELB experiences a sudden spike or flash traffic, then there is a significant drop in performance. This pattern is common with websites that announce exam results or news portals that post breaking news. To make the ELB ready to handle this kind of sudden spike, AWS customers are expected to subscribe to the support that costs a minimum of $49 per month and raise a support ticket requesting ELB prewarming. While there is enough guidance on addressing this issue, it is buried under the massive documentation of AWS. Like the Provisioned IOPS feature of EBS, Amazon should enable provisioned traffic for ELB where customers can choose the traffic pattern beforehand to get assured scalability.

3. Per-minute billing model. Amazon EC2 customers pay for the instances they run by the hour. This means even if the instance is run for a few minutes, Amazon charges for the whole hour. When AWS launched EC2 in 2008, it was considered groundbreaking innovation based on self-service and on-demand availability of compute resources. Fast-forward to 2013 and this is an unreasonable way of pricing VMs. Many customers will be able to take advantage of the cost structure if Amazon switches to a per-minute billing model. Of course, AWS deserves credit for having the most innovative purchase plan in the form of spot instances. But with major competitors like Windows Azure and Google Compute Engine offering per-minute billing, customers are waiting to see a change in Amazon’s billing models.

4. Improved CloudWatch metrics. Amazon CloudWatch provides metrics related to various AWS services including Amazon EC2, Amazon RDS, and Amazon DynamoDB. While it supports an array of services, the metrics for Amazon EC2 leave a lot to be desired. With the basic metrics related to CPU, disk, and network tracked at the hypervisor level, it just doesn’t meet the bar. Despite signing up and paying for Amazon CloudWatch, customers still need to rely on external services like Pingdom to track the basic metrics like site availability. For monitoring advanced services based on a web server or a database server, customers are forced to set up an agent-based infrastructure like Nagios or Zabbix. Though CloudWatch supports custom metrics, it is quite a bit of work, with no out-of-the-box support for advanced metrics.

Windows Azure recently added end-point monitoring that offers basic website uptime monitoring. Rackspace acquired and integrated Cloudkick with Rackspace Cloud Servers, which was known for robust monitoring capabilities. Amazon can easily embed an agent in every EC2 instance to track and report metrics that are granular and accurate. In fact, Amazon EC2 instances launched under AWS Beanstalk already use an agent-driven monitoring engine to track the health of servers. Amazon should extend this agent from AWS Beanstalk to all the Amazon EC2 instances to track and report meaningful metrics.

 5. Dynamic VM sizing. If you thought Microsoft always confuses customers with multiple flavors and versions of Windows, you haven’t seen the number of Amazon EC2 instance types.

There are 18 types of Amazon EC2 instances tagged under 6 family types. Each instance type is suited for a specific kind of workload. If you are not already lost looking at the detailed description of these instance types, you are expected to choose the right type aligned with your application. There are hi-CPU, hi-memory, hi-storage, cluster compute, general purpose, and more instance types to choose from. Despite this choice, do customers get what they want? Not really. Most of the time, the mapping between an on-premise physical server and Amazon EC2 instance type doesn’t come close. In some cases, it’s the memory and in others it is the CPU that falls short. And after all this, the performance never matches the power of the instance type. Is it hard to achieve? It may not be, as one recent IaaS entrant, ProfitBricks, offers dynamic configuration of virtual servers. ProfitBricks also claims that it delivers better performance because it uses InfiniBand interconnects with SSD storage. It is time for Amazon to switch to dynamic instance types, where customers are allowed to drag the sliders to select the memory, cores and CPU, and disk. This will simplify dealing with Amazon EC2 and put the customers in control of the server configuration. They can stop, tweak the configuration, and relaunch Amazon EC2 instances until the performance is satisfactory.

These are just a few features based on Amazon EC2, but there are many issues with Amazon RDS that need attention. We will cover them in future posts.

What other issues do you want the IaaS pioneer to fix?

Source: flickr user Steve Hankins

Table of Contents

  1. Summary

Join GigaOm Research! Become a subscriber and get reports like these, plus full access to our collection of over 1,700 reports from world-class analysts.