Blog Post

Multicore's Not-So-Secret Problem

Parallel processing isn’t just for supercomputers or GPUs anymore. Computer makers are throwing multiple cores at everything from servers to your printer. But the focus on horsepower misses a crucial problem associated with adding more processors. To really take advantage of them, you have to rewrite your code.As anyone who’s ever hosted a demolition party well knows, you can only throw so many workers at a problem before people start to linger at the edges, swill your alcohol and generally stop helping. You need not just manpower, but a good way to organize those workers so that someone, says, preps a drop cloth before your walls get taken out. And others prep for cleanup while the plaster is flying.

Silicon doesn’t tend toward drunken destruction, but if you’re putting the cores in place, it would be great to give them better instructions. Otherwise the promise of performance is just a promise, which is why Microsoft and Intel recently pledged $20 million to two universities trying to figure out an easy way to translate the billions of lines of code into an instruction set for multicore chips.

Others are pushing Erlang as a potential solution to parallel programming, while those in the supercomputing industry are warning of a performance drop caused by applications not keeping up with the cores. Software startup VirtualLogix is trying to use virtualization software to govern how multicore chips run applications by making the programs think they’re running on one processor.

Last week, during the launch of the iPhone, Steve Jobs told the New York Times that the next generation of the Apple OS will not focus on new features, but will instead solve the problem of writing software for multicore processors. Apple has code-named the technology Grand Central, and based it on a programming language called OpenCL. It will parallelize C programming languages for graphics processors.

Besides investing millions of research dollars into the search for a magic compiler or reviving an older language, chip vendors are coming up with stopgaps. Unfortunately these stopgaps are focused solely on their own silicon. Nvidia has released a tool called CUDA to help translate C languages into parallel instructions that can be used by Nvidia’s GPUs for scientific computing. (Apple’s OpenCL looks similar to CUDA.) And AMD also has its own effort, called Stream.

Freescale on Monday announced a set of multicore embedded processors that come with software support in the form of a simulator that ships before the chips do. As a result, users can start their development efforts and test their multicore code weeks ahead of time. “Customers are not looking for suppliers to offer them a chip and then leave them to program it themselves,” explained Steve Cole, a systems architect for Freescale. “There’s a certain amount of support and market knowledge that we need to have to help our customers.”

With all the work it takes to rewrite code, it’s no wonder everyone from startups to established companies are desperately searching for the programming equivalent of a Babel fish to solve the problem. The one that succeeds will be responsible for taking computing to its next jump in speed.

28 Responses to “Multicore's Not-So-Secret Problem”

  1. John Keane

    It is a tough one. Just remember that computing is still in infancy. Shame on Microsoft and Intel. I do hope those universities are not offended by the loose change offered.

  2. whoopie

    the future is in functional programming, and shared-nothing concurrency. google is already there – look at the tutorials for map/reduce. they are in haskell.

    imperative programming is dead, most imperative tools will end up performaning worse on massively multicore systems utilizing relatively cheap processors

  3. ronald

    @ twyrick
    Sorry the brain doesn’t do fuzzy logic.
    And the pattern logic is build upon _1_ simple algorithm which generates different models in different regions and layers of the brain. This is what makes it so resilient to errors, where Boolean logic is really prone to errors. This also enables what you call fuzzy, but my guess it’s just an evolutionarily approach to handle new or defect brain regions/layers.
    I don’t think this will replace hard core Boolean systems for math modeling, or the GUI for example. I see Boolean more as an Co-Processor for math, while decisions are made on the other site. Best of both worlds.
    If you know what Intelligence is let me know.

  4. twyrick

    Ronald, you make several very good points, but I’m not sure people really *want* to use a computer system that uses “pattern recognition” and fuzzy logic based on abstracts, vs. Boolean logic?

    It may be correct that a multi-core processor architecture turns out to be best suited for working along the lines of how the human brain does. But unless you’re tinkering with A.I. and simulations of human thought – I’m not sure it’s the “most desirable” computing model?

    Humans traditionally found computers very helpful in the areas where our brains fell short, such as faultless computations of complex mathematical equations. With a non-Boolean system, based on pattern recognition and fuzzy logic, I’d envision computers that are “usually” capable of getting correct answers. They might generate results with great processor efficiency, using “known to be previously correct” data to speed things along on a large spreadsheet. Yet the operator couldn’t have 100% confidence in precisely correct answers across the board.

  5. Great article. I’d add another approach the mix: Cilk++, a set of language extensions and a runtime system, which take C++ apps into the multicore realm.

    I bet the programming tools/methodologies that win out will be ones that maintain the serial semantics of the existing legacy apps, and don’t require a great deal of code restructuring to multicore-enable an app.

  6. The problem is knowing how many cores there are, if you knew there were 2 cores and only cores then it would be easy.

    I think a solution like a load balancer is the one that will win out.

  7. ronald

    Funny thing is, the brain is a lock free massive parallel system with the ability for sequential output, also called speech.
    “Smart” applications have always required a parallel approach, the problem is we have concentrated on fast and dumber bloated applications.
    The other problem is, Boolean logic is not well suited for smart parallel applications.
    I always highlight it this way.
    What is the brain really good at, pattern processing or following rules?
    So why should the brain use Boolean rules in one part when it can use abstract patterns to encode rules?
    Or try to teach/program a Boolean system to lean “all”, the brain has no problem to lean all with marbles and apply it immediately to Gummy bears for example.

    Why is this important for parallel computing, well pattern can easily be processed in prallel. Or mother nature is pretty darn smart.

  8. This problem is a bit of a red herring at the moment since CPU is not the bottleneck for most consumer apps.

    Also, when you’ve found a problem for which the solution is a new programming language, you now have two problems.

  9. It was interesting to me that most of the video compression tools took up using multicore, even the free tools, from the git go. It’s a fairly easy task to divvy up a compression task into regions for distribution to a compression engine. Compute tasks can be very vexing, however. One never knows what variables will be called or locked when dispatched between various cores, when artificially broken out their event loop.

    Even older optimizing compilers in the mainframe business (which had multiple CPU for ages), had to have code output tested very carefully before deployment. I remember several sessions back in the day where the compiler would spit out very well optimized code that would run for weeks in regression testing before some race condition would hang us. Thank goodness for specialized testing tools that did nothing but test for these pending locks – but back then we had months and sometimes years to get a program working.

    Today, we just don’t have that luxury, with multicore being a commodity, the tools have to catch up.