28 Comments

Summary:

Parallel processing isn’t just for supercomputers or GPUs anymore. Computer makers are throwing multiple cores at everything from servers to your printer. But the focus on horsepower misses a crucial problem associated with adding more processors. To really take advantage of them, you have to rewrite […]

Parallel processing isn’t just for supercomputers or GPUs anymore. Computer makers are throwing multiple cores at everything from servers to your printer. But the focus on horsepower misses a crucial problem associated with adding more processors. To really take advantage of them, you have to rewrite your code.As anyone who’s ever hosted a demolition party well knows, you can only throw so many workers at a problem before people start to linger at the edges, swill your alcohol and generally stop helping. You need not just manpower, but a good way to organize those workers so that someone, says, preps a drop cloth before your walls get taken out. And others prep for cleanup while the plaster is flying.

Silicon doesn’t tend toward drunken destruction, but if you’re putting the cores in place, it would be great to give them better instructions. Otherwise the promise of performance is just a promise, which is why Microsoft and Intel recently pledged $20 million to two universities trying to figure out an easy way to translate the billions of lines of code into an instruction set for multicore chips.

Others are pushing Erlang as a potential solution to parallel programming, while those in the supercomputing industry are warning of a performance drop caused by applications not keeping up with the cores. Software startup VirtualLogix is trying to use virtualization software to govern how multicore chips run applications by making the programs think they’re running on one processor.

Last week, during the launch of the iPhone, Steve Jobs told the New York Times that the next generation of the Apple OS will not focus on new features, but will instead solve the problem of writing software for multicore processors. Apple has code-named the technology Grand Central, and based it on a programming language called OpenCL. It will parallelize C programming languages for graphics processors.

Besides investing millions of research dollars into the search for a magic compiler or reviving an older language, chip vendors are coming up with stopgaps. Unfortunately these stopgaps are focused solely on their own silicon. Nvidia has released a tool called CUDA to help translate C languages into parallel instructions that can be used by Nvidia’s GPUs for scientific computing. (Apple’s OpenCL looks similar to CUDA.) And AMD also has its own effort, called Stream.

Freescale on Monday announced a set of multicore embedded processors that come with software support in the form of a simulator that ships before the chips do. As a result, users can start their development efforts and test their multicore code weeks ahead of time. “Customers are not looking for suppliers to offer them a chip and then leave them to program it themselves,” explained Steve Cole, a systems architect for Freescale. “There’s a certain amount of support and market knowledge that we need to have to help our customers.”

With all the work it takes to rewrite code, it’s no wonder everyone from startups to established companies are desperately searching for the programming equivalent of a Babel fish to solve the problem. The one that succeeds will be responsible for taking computing to its next jump in speed.

  1. Not all problems are parallizable, unfortunately. Also, see http://en.wikipedia.org/wiki/Amdahl's_law

    Share
  2. It was interesting to me that most of the video compression tools took up using multicore, even the free tools, from the git go. It’s a fairly easy task to divvy up a compression task into regions for distribution to a compression engine. Compute tasks can be very vexing, however. One never knows what variables will be called or locked when dispatched between various cores, when artificially broken out their event loop.

    Even older optimizing compilers in the mainframe business (which had multiple CPU for ages), had to have code output tested very carefully before deployment. I remember several sessions back in the day where the compiler would spit out very well optimized code that would run for weeks in regression testing before some race condition would hang us. Thank goodness for specialized testing tools that did nothing but test for these pending locks – but back then we had months and sometimes years to get a program working.

    Today, we just don’t have that luxury, with multicore being a commodity, the tools have to catch up.

    Share
  3. Stacey Higginbotham Thursday, June 19, 2008

    Rajeev, they are fast, but they could be even faster. Who wouldn’t want that?

    Share
  4. This problem is a bit of a red herring at the moment since CPU is not the bottleneck for most consumer apps.

    Also, when you’ve found a problem for which the solution is a new programming language, you now have two problems.

    Share
  5. Funny thing is, the brain is a lock free massive parallel system with the ability for sequential output, also called speech.
    “Smart” applications have always required a parallel approach, the problem is we have concentrated on fast and dumber bloated applications.
    The other problem is, Boolean logic is not well suited for smart parallel applications.
    I always highlight it this way.
    What is the brain really good at, pattern processing or following rules?
    So why should the brain use Boolean rules in one part when it can use abstract patterns to encode rules?
    Or try to teach/program a Boolean system to lean “all”, the brain has no problem to lean all with marbles and apply it immediately to Gummy bears for example.

    Why is this important for parallel computing, well pattern can easily be processed in prallel. Or mother nature is pretty darn smart.

    Share
  6. Surprised no mention of PeakStream (acquired by Google) and RapidMind up in Canada.

    Share
  7. Here’s an interview with one of multi-core’s early thought leaders, Prof. Anant Agarwal at MIT: http://sramanamitra.com/2007/08/20/the-next-big-innovation-in-microprocessors-anant-agarwal-part-1/
    His new startup is called Tilera.

    Share
  8. Since Google purchased Grand Central a year ago, perhaps Apple could have chosen a better name for their new product.

    http://googleblog.blogspot.com/2007/07/all-aboard.html

    Share
  9. The problem is knowing how many cores there are, if you knew there were 2 cores and only cores then it would be easy.

    I think a solution like a load balancer is the one that will win out.

    Share
  10. Great article. I’d add another approach the mix: Cilk++, a set of language extensions and a runtime system, which take C++ apps into the multicore realm.

    I bet the programming tools/methodologies that win out will be ones that maintain the serial semantics of the existing legacy apps, and don’t require a great deal of code restructuring to multicore-enable an app.

    Share

Comments have been disabled for this post