Facebook on Friday detailed how it uses a custom-made load balancing controller to distribute network traffic across its servers and save the company a whole lot in power consumption.
The social networking giant found that when its web servers are idle and not taking user requests, they don’t need that much compute to function, thus they only require a relatively low amount of power. As the servers handle more networking traffic, they need to use more CPU resources, which means they also need to consume more energy.
Interestingly, Facebook found that during relatively quiet periods like midnight, while the servers consumed more energy than they would when left idle, the amount of wattage needed to keep them running was pretty close to what they need when processing a medium amount of traffic during busier hours. This means that it’s actually more efficient for Facebook to have its servers either inactive or running like they would during busier times; the servers just need to have network traffic streamed to them in such a way so that some can be left idle while the others are running at medium capacity.
This is where the load balancing controller, dubbed Autoscale, comes into play. Autoscale acts as an intermediary between groupings of Facebook web servers and the company’s in-house load balancers, which distribute the network traffic to the servers. The controller makes sure the load balancers are constantly directing a medium amount of traffic to groups of web servers in order to make sure that the right amount of power is being used. During periods when there’s not that much network traffic, Autoscale distributes the traffic to a smaller amount of servers — keeping them running at medium capacity — and leaves the rest of the servers either inactive or handling batch-processing tasks, both of which don’t consume that much energy.
Here’s a technical breakdown of how Facebook gets Autoscale to do its trick, per the blog post detailing the system:
For this to work, we employ the classic control theory and PI controller to get the optimal control effect of fast reaction time, small overshoots, etc. To apply the control theory, we need to first model the relationship of key factors such as CPU utilization and request-per-second (RPS). To do this, we conduct experiments to understand how they correlate and then estimate the model based on experimental data.
Facebook claims that Autoscale makes its data centers much more energy efficient than the company’s older method of having servers run at low capacity when there’s little networking traffic. Using Autoscale on a particular cluster of servers, Facebook apparently saved 27 percent in energy consumption during midnight. The company said that on an average day, Autoscale can save 10 to 15 percent in energy for different servers.
Post and thumbnail images courtesy of Shutterstock user dolphfyn.