Facebook engineers have come up with a way to turbo-charge PHP, a programming language preferred by web developers. And the Palo Alto, Calif.-based social networking and identity provider is now open sourcing this technology called HipHop for PHP, a source code transformer which programmatically transforms PHP into highly optimized C++ and then uses g++ to compile it. HipHop for PHP was developed internally to boost the performance of Facebook applications while also lowering hardware costs. From the Facebook blog:
With HipHop we’ve been able to reduce the CPU usage on our web servers on average by about fifty percent depending on the page. Less CPU means fewer servers. This project is incredible, has had a tremendous impact on Facebook and we are releasing it as open source in hope that it brings a new focus toward scaling large complex websites with PHP.
HipHop for PHP is the latest in a long list of products that were developed internally by Facebook and have been open sourced to the world. PHP -– aka Hypertext Preprocessor -– is a scripting language much like Perl, Python and Ruby that was created by Rasmus Ledorf in 1995. In terms of CPU and memory demands, scripting languages are less efficient than compiled languages such as C++. PHP is currently used by popular and large, dynamic web sites such as Facebook, WordPress.com and Digg. It was viewed as one of the hottest new technologies in 2005, when venture capitalist Marc Andreessen, founder of Netscape and current Facebook board member, told the Wall Street Journal that “PHP is to 2005 what Java was to 1995.”
Of course, since then the web has scaled many times and as a result some severe shortcomings in PHP have been exposed. Many folks have subsequently looked for alternatives, opting for languages that have cleaner syntax, a more vibrant community and perhaps even better frameworks. Well-known PHP programmer Terry Chay in a blog post defending PHP recently noted:
Obviously, I think PHP is very frequently the right choice. The reason I choose PHP is that it is a web-based templating language that is simple, scalable, and pragmatic. Choices have consequences. Everyone knows what consequences are. If not, there’d be a One Language to rule them all. And, we’re not Java developers. ;-) One consequence of PHP is that it is now stuck between a rock and a hard place.
On one end, the ubiquity of rich, Ajax-driven, web sites means the inherent advantage of a templating language is no longer there, having been replaced by a much larger demand for design….On the other end, social networks have sped up demands of data, so that they live more in RAM in the form of memcached than on disk in the form of a relational database. When making a web page was tied to a disk-bound database, performance discussions are pushed to database performance discussions, which really is a discussion of disk performance. …web performance is now a complicated beast.
Facebook, which bet on PHP early on, has had that quandary for a long time. It had no option but to innovate around PHP and its resource hog-like habits. The reason PHP can be so resource-intensive is because it’s interpreted on the fly.
You can make it perform better by using tricks such as output caching, recycling the compiled code that’s generated at runtime (aka opcode caching) and writing extensions that are essentially bits compiled as C++. However, all these techniques have their own set of issues. Given than I am no programmer, here is my understanding of what Facebook has done: It came up with a way to analyze the PHP code and convert it to optimized C++ code, which is in turn compiled to a very speedy machine-specific code. HipHop benefits from the maturity of g++, a C++ compiler.
Scaling Facebook is particularly challenging because almost every page view is a logged-in user with a customized experience. When you view your home page we need to look up all of your friends, query their most relevant updates (from a custom service we’ve built called Multifeed), filter the results based on your privacy settings, then fill out the stories with comments, photos, likes, and all the rich data that people love about Facebook. All of this in just under a second. HipHop allows us to correct the logic that does the final page assembly in PHP and iterate it quickly while relying on custom back-end services in C++, Erlang, Java, or Python to service the News Feed, search, Chat, and other core parts of the site.
Earlier this week I spoke with Haiping Zhao, a Facebook senior software engineer; Scott MacVicar, the company’s open-source developer advocate; and David Recordon, a senior programs manager, who explained to me that the company first started working on solving the scaling problems in 2007 and tried to fix it using various methods before coming up with the current solution.
Recordon said that the company wasn’t done optimizing but wanted to open source its code mostly because it wants other people to use it and also help extend it. He was confident that there will be many takers for HipHop for PHP, especially among the enterprises looking to save on their hardware spending. “When you can slash your hardware costs by half, that is significant,” said Recordon.
In addition, Facebook’s engineers believe that performance gains should help PHP re-attract developers who might have opted for the more fashionable programming languages such as Ruby and Python. Any switchers would help solve Facebook’s more pressing problem: a desperate need for more and more developers in order to keep growing its web empire.
Related GigaOM Pro Content:
- Will the Real-Time Web Bring High Performance to a System Near You?
- Social Networks Need to Grin and Bear Infrastructure Costs
- Why the Hoopla About Hadoop?