The rise of servers powered by cell phone chips, with hundreds of them whirring away to solve a problem while using less power, has become almost commonplace in the last few years, but over at Linux Magazine, Editor Douglas Eadline brings up the enormous problems associated with such a vision: namely, the software can’t cut it yet (hat tip to Inside HPC). Eadline writes about May’s Law, the corollary to Moore’s Law, which says the number of transistors on a chip will double every 18 months.
David May postulated that the software efficiency halves every 18 months to compensate for Moore’s Law. Basically, adding faster hardware makes the software more complicated and it doesn’t run as well. It also takes time for it to catch up to hardware gains. Now, as certain high-performance compute gurus or webscale data centers contemplate a wholesale change in architecture inside their data centers by adding ARM-based (s armh) servers, the issue of software complexity must be addressed. Eadline suggests abstracting the run time environments for such systems. From his article:
As software progress crawls along, I am convinced that future large-scale HPC applications will include dynamic fault-tolerant runtime systems. The user needs to be lifted away from low-level responsibility so they can focus on the application and not the complexity of the next hardware advance.
This sounds similar to the issues driving the creation and adoption of platforms as a service on top of various clouds, only Eadline is arguing for a run time environment that enables high-performance computing on top of different hardware architectures, be it x86 chips, graphics processors or ARM-based chips. We’ve covered that before, and I still think it has promise. As the needs for performance and power efficiency become more important, figuring out how to get the best hardware on the job without having to rewrite all your code becomes a problem computer science must solve.