Why Watson and SPSS Are IBM’s Big Data Yin and Yang

After all the talk over the past few weeks about what IBM’s (s ibm) Watson victory on Jeopardy! means and doesn’t mean, about what the system is and isn’t capable of doing, it’s becoming clear that Watson is not HAL of 2001: A Space Odyssey notoriety. But when used in concert with predictive analytics software, technologies like Watson can become part of a complete big-data architecture that could change the data-analysis game for everything from medical diagnoses to sentiment analysis on Twitter. However, the promise of this holistic data environment doesn’t stop with IBM.

As I reported in a recent post explaining the limits of Watson as a machine-learning platform, its ability to process and answer questions based on natural language is a big deal, but the system as currently comprised is largely relegated to the realm of answering specific questions based on the very specific data loaded into it. However, thanks to a $14 billion investment in analytics acquisitions over the past several years, IBM has a robust portfolio products with which to complement Watson’s impressive capabilities. According to IBM VP of Predictive Analytics Deepak Advani, SPSS — the predictive piece of IBM’s analytics puzzle, which it bought for $1.2 billion in 2009 — might just be Watson’s ideal mate.

The possibilities in the health care industry — where IBM already has confirmed Watson’s technology will be applied next — is a great example of what Advani is talking about. Using SPSS software, he explained, a doctor might be able to mine years of patient records, as well as other relevant data sources, to predict the likelihood of a particular patient developing a serious cardiovascular condition, for example. The next step is determining the best treatment plan, and that’s where Watson comes in.

Doctors have a limited amount of time to read medical journals — maybe just five hours a month — Advani said, so Watson’s ability to store and process the information contained in, essentially, all available medical journals as they’re released becomes very important. Asked a question about the best-possible treatment plan for that particular patient, given the patient’s particular background, Watson could scour its database to suggest a plan, or plans, for the doctor to consider. Considering the myriad variables affecting the success of any given treatment plan, Watson might be able to select a plan that doctor could not possibly have suggested.

The Watson-plus-SPSS combo also could be very effective in deriving insights from social media data, albeit via a different setup. In predictive analytics, Advani explained, the more data users put in, the more accurate their results, which makes social media a prime target for software such as SPSS. After all, he noted, companies can conduct all the surveys they want, but consumers are just offering up those insights for free on Twitter and other sites. He described how television companies are already using SPSS to do sentiment analysis on various social media sites to make changes to their programming in real time based on what they were able to determine users liked and didn’t like.

However, even with the ability to detect slang, context and other potentially confounding factors, Advani acknowledged that mining social media data might not be entirely precise off the bat, which is why he recommends a step-by-step approach. Essentially, he says, the process is to listen to the data, take measured actions in accordance with it, write predictive algorithms based on those results, and then refine the algorithms. If used correctly, predictive software should ultimately let users determine, based on data gleaned from Twitter and elsewhere, what the public likes and doesn’t like, and the results of acting in any specific manner might be.

In the social media realm, Watson might come into the picture earlier, when trying to process consumer sentiment toward any particular subject. In a recent Forbes article on the future of the Watson technology, IBM VP of Emerging Technologies Rod Smith discussed the use case of an unnamed customer “combing through months worth of Twitter chatter to extract a picture of what consumers think about the customer’s products.” Of course, one needn’t use the exact Watson technology stack for data mining at such an epic scale — Hadoop applications, including IBM’s InfoSphere BigInsights software could do the job, too — but Watson’s natural-language abilities could make the process quite a bit easier and, possibly, more accurate.

In the same article, Smith made one more observation that actually drives home the point of how relatively accessible the Watson, or Watson-like technology, could become once IBM begins formally turning it into a product. Whereas Jeopardy! required Watson to answer a question in a matter of seconds, such real-time performance would be unnecessary in all but the most-demanding business environments, which cuts down on the infrastructure required. As Smith noted, if a customer could wait two minutes for an answer, it wouldn’t need an entire room full of servers dedicated to just to that cause.

I should point out, though, that IBM is hardly the only vendor or organization targeting these types workloads; it just has the deep pockets to required to have them all under its banner, and the marketing skills to make Watson a household name. SAS, for example, is a major player in predictive analytics, and the Apache Mahout project is dedicated to the types of machine-learning algorithms that make Watson so smart. And, as alluded to by Smith, it doesn’t require an IBM Power7-based supercomputer to churn out answers in less than real time; commodity machines might work just fine. The real story is about the advent and democratization of advanced data-processing techniques combined with advanced predictive analytics to achieve the most accurate and actionable insights possible. It’s a story worth telling — and selling.

To hear more about what everyone is doing in the big data space, be sure to attend out Structure Big Data conference March 23 in New York City. IBM’s Jeff Jonas will be speaking on “geospatial data as analytic super-food,” and other speakers span the spectrum from EMC (s emc) to Revolution Analytics to InfoChimps.

Related content from GigaOM Pro (sub req’d):