Natural language user interfaces – think Siri – promise to revolutionize everything from call-centers to hospitals, but what about their data sources? How do the companies and organizations deploying them ensure real choice, rather than just assuming a certain one or two sources will do the job? These are questions that came up today at GigaOM’s Structure:Data conference in New York and, according to Nuance(s nuan) Communications CTO Vlad Sejnoha, the answer may lie in the semantic web.
The semantic web, a term coined by Tim Berners-Lee, is an initiative being developed under the auspices of the World Wide Web Consortium (W3C). It aims, through the use of standardized tags and formats, to build a framework where content on the web has machine-understandable meaning, rather than simply being searchable by keyword.
Asked by GigaOM Research analyst George Gilbert how best to tap a variety of information repositories without having to “hardwire” the language interface into each one separately, Sejnoha said the answer lies in an open approach:
“The conversation stack… has to interact with content sources, other services, applications and devices. Today, integrating those interfaces into those resources is a one-off job. Some applications on the market make choices on behalf of the user, and this brings important questions about openness. I do think the promise of the semantic web remains very important there.
“I hope we get to the point where people who have important services or content on the web publish them in standard formats that we can connect to and [use] almost automatically… I am hopeful that this will gain greater support in the industry as these folks realize without something like this the interface might become opaque to new entrants.”
However, Gilbert demurred, suggesting that incumbents have “no incentive to make a common interface.”
Check out the rest of our Structure:Data 2013 coverage here, and a video embed of the session follows below:
Session Name: The Future Impacts of Big Data Insights of on your Organization.
Speakers: George Gilbert Vlad Sejnoha Currie Boyle
GEORGE GILBERT 0:14
Okay. So, I think we have something a little different for everyone today. I like to think of it as something mainstream familiar, but with a little black magic underneath. And the objective of the panel is to talk about some product that most of us are familiar with, Siri, Google now that’s its emerging. Nuance speech recognition as it’s applied to healthcare or [inaudible] for individual applications. IBM Watson. And the idea is that even though they accomplish many different tasks. They actually have surprisingly common underpinnings, not exactly the same. But, since we’re moving to an era where these are going to be more critical technologies we thought it worth exploring. And I want to– both panelists to introduce themselves and then Vlad put a little into context way this area of the natural language interface and the ultimate conversational user interface is so important.
VLAD SEJNOHA 01:29
Thank you. I’m Vlad Sejnoha CTO of Nuance Communications and the focus of the company is this notation that we are entering a period when the way interact with application assistants and access information and content, is going to undergo a dramatic transformation. Primarily because of the naturalization of the natural language understanding conversational management and even the higher level AI reasoning. We are overwhelmed with the amount of structured data. There is, sort of vast amounts of unstructured amounts data yet to yield its value. And we think that these techniques are going to make a dramatic difference in way we take advantage of all of that.
GEORGE GILBERT 02:15
Okay. And one part of the magic I hope we get to later is the, why now.” So, Currie, tell us little about your role at IBM and how Watson has evolved since that great Jeopardy match?
CURRIE BOYLE 02:35
Great, thanks. So, Currie Boyle working natural language processing and natural language understanding and Watson, Watson like technologies. And to completely agree with you and there’s actually been a lot of focus yesterday on the changes that need to happen in terms of accessing structured data, as well as unstructured data. And the big focus on the user interface yesterday, and I think this echo’s the change that we both feels happening in the industry. The business I’m in is dialog management systems. So, unlike systems that you make a request from and then they provide you an answer, the average query is whatever 2. 4 words. And that really doesn’t make a query specific or complete enough for you to actually taken any action on it.
CURRIE BOYLE 03:15
So, we work on non-scripted solutions to do dialogue management. The simple way to look at it is, try to determine the intent of the user who’s seeking a person or processer information. And the, intent of author or when wrote information. And even though they may have no words in common, they have a common intent and to match those.
GEORGE GILBERT 03:37
Okay. So, being that we have 20 minutes to cover about 40 years of research, let’s prioritize what we want to scheme over. So, at the three highest levels there’s the– how does this work when a user is actually operating it, the run time? Then a little bit on the, sort of the design time. Where do the smarts come from? And if we have time in our copiously expanded format here, some of the business implications. How some of the vendors are going about doing it?
GEORGE GILBERT 04:15
So, the state are– well the most mainstream is the natural language processing and understanding. And I want to skim over that a little bit because everyone’s been exposed to it through products like Siri, or Nuance. But underneath that there’s this layer where the software has to figure out what the users trying to do. And then it has to call on these underlying data services and web services, some to just pull up information and some to actually do things. So, let’s take an example of healthcare, actually since both of you guys have a practice or developing practices in this area. Vlad why don’t we start with, what’s the– what goes on when a doctor or a provider or payer or someone like that. In the healthcare food chain is using Nuance and how does it enhance their productivity, how does it change how they interact with systems?
VLAD SEJNOHA 05:41
Sure. So, I think this is a great example of the power of this technology the really unlock value and leverage the information in the data that’s associated with clinical workflow as it were, which begins with a doctor documenting a patient encounter. And typically doctors like to do that in freeform narrative dictation, that traditional used to be transcribed by humans. Today it’s converted into text using speech recognition. But, more importantly and excitingly we are not able to in real time, extract the medically [inaudible] in facts and their attributes and relationships, from that dictation. So, you’re probably familiar with the government mandated meaningful use legislation that by a certain date – not so far in the future. All hospitals have to start using electronic health records for quality and continuity of care.
VLAD SEJNOHA 06:38
It’s proven very difficult to get doctors moved off the old style of documentation into these point and click interfaces. So, extracting information automatically populating these E-charts is a big step a big savings in costs and time. But, we can then immediately perform additional processing on top of that information, including checking whether a sufficient detail was provided, whether a certain procedures were applied as well, giving the contexts of the patient’s situation. And then moving further upstream and actually assigning automatic billing codes. So, that really kind of shrinks the whole documentation process. And unlocks a lot of data that can be mined and exploded for improve patient care, for example.
GEORGE GILBERT 07:21
So, just to– just still it sounds like you’re stream lining the work flow and because you’re adding structure you can make more effective use of the data for decisions, is that fair?
VLAD SEJNOHA 07:35
Exactly. And once you have these higher equality electronic health records. It opens up possibility for other applications. For example a physician dealing with that particular case might ask, Have there been other patients in the system, possibility even nationwide with similar characteristics.” What where the outcomes, what were the interesting findings? And help inform that particular–
GEORGE GILBERT 07:57
VLAD SEJNOHA 07:57
GEORGE GILBERT 07:58
And I want you to hold that thought because that’s when we get down to the magic layer for I assume is the magic layer right now. So, Currie, the layers as I understand them and watch them are somewhat similar, but you’re trying to solve a slightly different problem. Can you walk us through us through the food chain of what happens, and then what comes out at the other end?
CURRIE BOYLE 08:32
Okay. I’ll expand beyond Watson, which was– is a project– ongoing project, an evolving project. But, if you look at just general customer demands, people are looking for the ability, in sort of, four different areas. They’re looking for things like Siri, in device in vehicle, device kind of systems, only they’re looking for them to be conversational rather than– and they’ll remember– and they’re looking for them to remember contexts. The second major area really is something we’ve talked about, which is to find structured data. So, a different way, how many times does– do we have an extended warranty attached to a new pickup truck that we sold in Eastern Europe or something, right? And so, to be able to actually access structured data, but to use a natural language interface.
CURRIE BOYLE 09:23
The third sort of key area is in professionals who are trying to design things, or dealer support business partner relations, which is looking for maybe a person or a process, have we ever done this before? Which isn’t necessarily looking for structured data, but has anybody ever built this part before? Have we ever designed this wing before? And then the lasts one’s probably more traditional for both of us, which is in context centers. How do you provide intelligent agents, either directly through web or chat without human intervention, or how do you assist and agent to be more effective?
GEORGE GILBERT 09:56
Can you tie that back to healthcare example, where you might not be just extracting structured meaning, of– okay, so what’s the diagnosis, what are the symptoms? Where you as I understand it take it currently, or anticipate taking it a step further in terms of predicting something?
CURRIE BOYLE 10:23
Right. So, there was a speaker yesterday taking about gnome mix and personalized medicine to the gnome mix level. But, I think this is a nice and intermediary area that you’re talking about, which is a patient similarity either in diagnosis or treatment. So, if we know their blood pressure and we know the kind of cancer that they have, and what stage it’s at. That’s one set of information that the healthcare practitioner can use to make a diagnoses. But, the idea of being able to look at the unstructured things with the doctors notes to say that, the person was pale and they had yellow fingers, they said they didn’t smoke, but they probability really do. And they don’t seem to be following their diet. So, those things that are captured in unstructured ways, to be able to put structure around those, add those into the diagnoses, and therefore extend the – enrich what would normally been sort of a numeric or structured data analyses to be structured data plus the unstructured data with context put around it. So, that you have a deeper meaning of it.
GEORGE GILBERT 11:19
Okay. So, this is something it’s an objective, a grand objective for the industry to be able get farther and farther, not just just find the sort of understand the language. But, then sort of add flesh to the bones of that.
CURRIE BOYLE 11:39
To determine the best action to take, right?
GEORGE GILBERT 11:42
The next best action. Is that something Nuisance roadmap, or…
VLAD SEJNOHA 11:48
Very much so, and I think we share the same vision of what a complete system would look like. And I think the transformative dimension here starts with understanding the intent of either text or something that’s spoken by a user, extracting the meaning of that. Then determining the best action, and down the line actually engaging in a collaborative dialog. Because we’ve been talking we’ve been talking about applications which of kind of really one shot. You have a request you want to do something and it happens. But, certain kinds of information will probably require sort of a question answer sequence. That will involve this ambiguation offering of choices, weighing of different of sources of information.
CURRIE BOYLE 12:40
And so, we as well as IBM and others are hard at work at building this full stack. And that’s really going all the way to what’s traditional called reasoning, which is artificial intelligence. Where you encode the overall goals of a task in a system dynamically determines how to best advance the transaction. And this useful, not just to access structured data more efficiently, also to interact with unstructured data and accomplish a great variety of tasks. So, that’s why I referred to it really as a complete transformation of the user experience.
VLAD SEJNOHA 13:14
Right. And just add in, I’m sure we agree on this. The idea that it– it’s specific for you and the context you’re in. So, rather than understanding the intent that you have and executing it, it’s understanding giving where you are and your preferences what’s right for you in this particular case. So that’s that level of personalization I think is really differentiating for how people like you will take things to market. Not as this person asked this question and you’re trying to return the inferences of it. Giving all of the context that you knew about them yesterday and the dialog that you costumed generated specifically them, for today.
GEORGE GILBERT 13:47
So, let’s actually take that– I think one more level deeper which is, first you need contexts, which is information about the user that may have come from other actions, locations, questions, things like that. But, then you want you want this layer where you can ask information, repositories without having to hardwire to each repository, like you common way of representing a disease and its symptoms. Or a common way of to take a completely different example, of booking a flight and all of the special requests for seating for food, so… How– what are the different ways that might get done? Because the alternative is every conversational assistant has to hardwire to each, to Yelp, to Expedia, to – how do you do it in more general-purpose layer? Who are the different potential players and how might they do it?
CURRIE BOYLE 15:00
I think that’s a really great point. So, the conversational stack I described obliviously has to interact with content sources, other services, applications, devices. And today integrating these interfaces into those resources is kind of a one off job. It’s a custom integration, and as a result you’re seeing some of these applications on the market that have made choices on behalf of the user, of where certain kinds, of information is brought in from. And I think it really opens an important question about openness as we evolve the UI to include natural language, and direct access to specific desired destinations. How we’re going to offer the user a choice an ability to prefer one source over another? And in fact some of larger search portals are now straddling the fence of being content providers as well as the honest broker search engine. So, that’s an interesting theme as well. I do think the promise of a semantic web remains a very important there. I am hopeful that we’ll get to a point where people who have important content or services on the web, publish them in standard formats in terms of ontologies and known terms, that we can then connect to, and interact with more or less automatically.
GEORGE GILBERT 16:22
So, bring it– let me just make– try and make it concrete. So, Expedia, hotels.com, and Priceline they all have their different ways of representing a reservation for their core entity open table, whatever. Who might be in a position to help make that common layer, for either of you?
CURRIE BOYLE 16:51
Well there are W3C initiatives around this. And I think that– I’m hopeful this will gain a greater support in the industry as these folks realize that without something like this the interface might become opaque to the new entrance. It might become very difficult to become integrated into somebody’s virtual assistant.
GEORGE GILBERT 17:13
Yeah. But, the encumbrance have no incentive to make a common interface to their existing competitors, or potential competitors. So, my question is there someone above them how can kind of push them in the direction of saying, I want to make a plane reservation, I don’t care of the underlying service, or they’ll be just little difference at the edge.
VLAD SEJNOHA 17:38
I think there’s more in question on how much we have a perfect answer to it. But, I think there’s something tactical for the audience here today. Which is how can you move towards conversational language interfaces for your solutions that you offer, today? So, people are seeking data or their seeking visualization, their seeking a mixture of structured, unstructured data you can provide with your tooling. What are things that you could do with firms like ours to be able to differentiate yourselves, by offering both speech to and text based natural language systems to allow people to use your systems more effectively. Just echo some of the things that happened in the conference today. And that’s available today from– as things that are out there. And I don’t know about you, a lot of the challenges– a lot of the people that we talk to have a very limited point of view of what’s doable. And I think actually some organizations can deliver much more than what the industry expects. In terms of being able to understand an intent and turn it into an action, that could be used for instances. Just go and access a report or multiple reports, or merge data together data together from multiple serves.
GEORGE GILBERT 18:52
So, let me and just try and sum up in our last 20 and half seconds. It sounds to me like the interpreting users intent is a technically a very difficult problem. But, putting a common interface to the web services and data repositories that they might want to access is more of a sort inside baseball– more of a political problem? Would that be fair?
VLAD SEJNOHA 19:20
It’s true to some degree. Connecting a natural language system to some new content, is not a trivial matter of integration. But, it can certainly be aided by standardizing the description of the resources. And the applications are really broad, there’s a lot of content out there for example. Movie and television catalogs that, are impossible to navigate today. And today’s state of the technology of natural language understanding, allows you to really drill down, and get what you want. And so, it’s really going to maximize the value of a lot of this up there.
GEORGE GILBERT 19:53
Okay. On that note guys, I think it’s– we’re out of time. But, thanks for the discussion I think–
CURRIE BOYLE 20:00
GEORGE GILBERT 20:00
–shedded a lot of light–
VLAD SEJNOHA 20:03
GEORGE GILBERT 20:04
VLAD SEJNOHA 20:04