Stay on Top of Enterprise Technology Trends
Get updates impacting your industry from our GigaOm Research Community
In this episode, Byron and Peter talk about defining intelligence, Venn diagrams, transfer learning, image recognition, and Xiaoice.
Byron Reese: This is Voices in AI, brought to you by Gigaom. I’m Byron Reese. Today our guest is Peter Lee. He is a computer scientist and corporate Vice President at Microsoft Research. He leads Microsoft’s New Experiences and Technologies organization, or NExT, with the mission to create research powered technology and products and advance human knowledge through research. Prior to Microsoft, Dr. Lee held positions in both government and academia. At DARPA, he founded a division focused on R&D programs in computing and related areas. Welcome to the show, Peter.
Peter Lee: Thank you. It’s great to be here.
I always like to start with a seemingly simple question which turns out not to be quite so simple. What is artificial intelligence?
Wow. That is not a simple question at all. I guess the simple, one line answer is artificial intelligence is the science or the study of intelligent machines. And, I realize that definition is pretty circular, and I am guessing that you understand that that’s the fundamental difficulty, because it leaves open the question: what is intelligence? I think people have a lot of different ways to think about what is intelligence, but, in our world, intelligence is, “how do we compute how to set and achieve goals in the world.” And this is fundamentally what we’re all after, right now in AI.
That’s really fascinating because you’re right, there is no consensus definition on intelligence, or on life, or on death for that matter. So, I would ask that question: why do you think we have such a hard time defining what intelligence is?
I think we only have one model of intelligence, which is our own, and so when you think about trying to define intelligence it really comes down to a question of defining who we are. There’s fundamental discomfort with that. That fundamental circularity is difficult. If we were able to fly off in some starship to a far-off place, and find a different form of intelligence—or different species that we would recognize as intelligent—maybe we would have a chance to dispassionately study that, and come to some conclusions. But it’s a hard when you’re looking at something so introspective.
When you get into computer science research, at least here at Microsoft Research, you do have to find ways to focus on specific problems; so, we ended up focusing our research in AI—and our tech development in AI, roughly speaking—in four broad categories, and I think these categories are a little bit easier to grapple with. One is perception—that’s endowing machines with the ability to see and hear, much like we do. The second category is learning—how to get machines to get better with experience? The third is reasoning—how do you make inferences, logical inferences, commonsense inferences about the world? And then the fourth is language—how do we get machines to be intelligent in interacting with each other and with us through language? Those four buckets—perception, learning, reasoning and language—they don’t define what is intelligence, but they at least give us some kind of clear set of goals and directions to go after.
Well, I’m not going to spend too much time down in those weeds, but I think it’s really interesting. In what sense do you think it’s artificial? Because it’s either artificial in that it’s just mechanical—or that’s just a shorthand we use for that—or it’s artificial in that it’s not really intelligence. You’re using words like “see,” “hear,” and “reason.” Are you using those words euphemistically—can a computer really see or hear anything, or can it reason—or are you using them literally?
The question you’re asking really gets to the nub of things, because we really don’t know. If you were to draw the Venn diagram; you’d have a big circle and call that intelligence, and now you want to draw a circle for artificial intelligence—we don’t know if that circle is the same as the intelligence circle, whether it’s separate but overlapping, whether it’s a subset of intelligence… These are really basic questions that we debate, and people have different intuitions about, but we don’t really know. And then we get to what’s actually happening—what gets us excited and what is actually making it out into the real world, doing real things—and for the most part that has been a tiny subset of these big ideas; just focusing on machine learning, on learning from large amounts of data, models that are actually able to do some useful task, like recognize images.
Right. And I definitely want to go deep into that in just a minute, but I’m curious… So, there’s a wide range of views about AI. Should we fear it? Should we love it? Will it take us into a new golden age? Will it do this? Will it cap out? Is an AGI possible? All of these questions.
And, I mean, if you ask, “How will we get to Mars?” Well, we don’t know exactly, but we kind of know. But if you ask, “What’s AI going to be like in fifty years?” it’s all over the map. And do you think that is because there isn’t agreement on the kinds of questions I’m asking—like people have different ideas on those questions—or are the questions I’m asking not really even germane to the day-to-day “get up and start building something”?
I think there’s a lot of debate about this because the question is so important. Every technology is double-edged. Every technology has the ability to be used for both good purposes and for bad purposes, has good consequences and unintended consequences. And what’s interesting about computing technologies, generally, but especially with a powerful concept like artificial intelligence, is that in contrast to other powerful technologies—let’s say in the biological sciences, or in nuclear engineering, or in transportation and so on—AI has the potential to be highly democratized, to be codified into tools and technologies that literally every person on the planet can have access to. So, the question becomes really important: what kind of outcomes, what kinds of possibilities happen for this world when literally every person on the planet can have the power of intelligent machines at their fingertips? And because of that, all of the questions you’re asking become extremely large, and extremely important for us. People care about those futures, but ultimately, right now, our state of scientific knowledge is we don’t really know.
I sometimes talk in analogy about way, way back in the medieval times when Gutenberg invented mass-produced movable type, and the first printing press. And in a period of just fifty years, they went from thirty thousand books in all of Europe, to almost thirteen million books in all of Europe. It was sort of the first technological Moore’s Law. The spread of knowledge that that represented, did amazing things for humanity. It really democratized access to books, and therefore to a form of knowledge, but it was also incredibly disruptive in its time and has been since.
In a way, the potential we see with AI is very similar, and maybe even a bigger inflection point for humanity. So, while I can’t pretend to have any hard answers to the basic questions that you’re asking about the limits of AI and the nature of intelligence, it’s for sure important; and I think it’s a good thing that people are asking these questions and they’re thinking hard about it.
Well, I’m just going to ask you one more and then I want to get more down in the nitty-gritty.
If the only intelligent thing we know of in the universe, the only general intelligence, is our brain, do you think it’s a settled question that that functionality can be reproduced mechanically?
I think there is no evidence to the contrary. Every way that we look at what we do in our brains, we see mechanical systems. So, in principle, if we have enough understanding of how our own mechanical system of the brain works, then we should be able to, at a minimum, reproduce that. Now, of course, the way that technology develops, we tend to build things in different ways, and so I think it’s very likely that the kind of intelligent machines that we end up building will be different than our own intelligence. But there’s no evidence, at least so far, that would be contrary to the thesis that we can reproduce intelligence mechanically.
So, to say to take the opposite position for a moment. Somebody could say there’s absolutely no evidence to suggest that we can, for the following reasons. One, we don’t know how the brain works. We don’t know how thoughts are encoded. We don’t know how thoughts are retrieved. Aside from that, we don’t know how the mind works. We don’t know how it is that we have capabilities that seem to be beyond what a hunk of grey matter could do—we’re creative, we have a sense of humor and all these other things. We’re conscious, and we don’t even have a scientific language for understanding how consciousness could come about. We don’t even know how to ask that question or look for that answer, scientifically. So, somebody else might look at it and say, “There’s no reason whatsoever to believe we can reproduce it mechanically.”
I’m going to use a quote here from, of all people, a non-technologist Samuel Goldwyn, the old movie magnate. And I always reach to this when I get put in a corner like you’re doing to me right now, which is, “It’s absolutely impossible, but it has possibilities.”
Our current understanding is that brains are fundamentally closed systems, and so we’re learning more and more, and in fact what we learn is loosely inspiring some of the things we’re doing in AI systems, and making progress. How far that goes? It’s really, as you say, it’s unclear because there are so many mysteries, but it sure looks like there are a lot of possibilities.
Now to get kind of down to the nitty-gritty, let’s talk about difficulties and where we’re being successful and where we’re not. My first question is, why do you think AI is so hard? Because humans acquire their intelligence seemingly simply, right? You put a little kid in playschool and you show them some red, and you show them the number three, and then, all of a sudden, they understand what three red things are. I mean, we, kind of, become intelligent so naturally, and yet my frequent flyer program that I call in can’t tell, when I’m telling it my number if I said 8 or H. Why do you think it’s so hard?
What you said is true, although it took you many years to reach that point. And even a child that’s able to do the kinds of things that you just expressed has had years of life. The kinds of expectations that we have, at least today—especially in the commercial sphere for our intelligent machines—sometimes there’s a little bit less patience. But having said that, I think what you’re saying is right.
I mentioned before this Venn diagram; so, there’s this big circle which is intelligence, and let’s just assume that there is some large subset of that which is artificial intelligence. Then you zoom way, way in, and a tiny little bubble inside that AI bubble is machine learning—this is just simply machines that get better with experience. And then a tiny bubble inside that tiny bubble is machine learning from data—where the models that are extracted, that codify what has been learned, are all extracted from analyzing large amounts of data. That’s really where we’re at today—in this tiny bubble, inside this tiny bubble, inside this big bubble we call artificial intelligence.
What is remarkable is that, despite how narrow our understanding is—for the most part all of the exciting progress is just inside this little, tiny, narrow idea of machine learning from data, and there’s even a smaller bubble inside that that’s called a supervised manner—even from that we’re seeing tremendous power, a tremendous ability to create new computing systems that do some pretty impressive and valuable things. It is pretty crazy just how valuable that’s become to companies, like Microsoft. At the same time, it is such a narrow little slice of what we understand of intelligence.
The simple examples that you mentioned, for example, like one-shot learning, where you can show a small child a cartoon picture of a fire truck, and even if that child has never seen a fire truck before in her life, you can take her out on the street, and the first real fire truck that goes down the road the child will instantly recognize as a fire truck. That sort of one-shot idea, you’re right, our current systems aren’t good at.
While we are so excited about how much progress we’re making on learning from data, there are all the other things that are wrapped up in intelligence that are still pretty mysterious to us, and pretty limited. Sometimes, when that matters, our limits get in the way, and it creates this idea that AI is actually still really hard.
You’re talking about transfer learning. Would you say that the reason she can do that is because at another time she saw a drawing of a banana, and then a banana? And another time she saw a drawing of a cat, and then a cat. And so, it wasn’t really a one-shot deal.
How do you think transfer learning works in humans? Because that seems to be what we’re super good at. We can take something that we learned in one place and transfer that knowledge to another context. You know, “Find, in this picture, the Statue of Liberty covered in peanut butter,” and I can pick that out having never seen a Statue of Liberty in peanut butter, or anything like that.
Do you think that’s a simple trick we don’t understand how to do yet? Is that what you want it to be, like an “a-ha” moment, where you discover the basic idea. Or do you think it’s a hundred tiny little hacks, and transfer learning in our minds is just, like, some spaghetti code written by some drunken programmer who was on a deadline, right? What do you think that is? Is it a simple thing, or is it a really convoluted, complicated thing?
Transfer learning turns out to be incredibly interesting, scientifically, and also commercially for Microsoft, turns out to be something that we rely on in our business. What is kind of interesting is, when is transfer learning more generally applicable, versus being very brittle?
For example, in our speech processing systems, the actual commercial speech processing systems that Microsoft provides, we use transfer learning, routinely. When we train our speech systems to understand English speech, and then we train those same systems to understand Portuguese, or Mandarin, or Italian, we get a transfer learning effect, where the training for that second, and third, and fourth language requires less data and less computing power. And at the same time, each subsequent language that we add onto it improves the earlier languages. So, training that English-based system to understand Portuguese actually improves the performance of our speech systems in English, so there are transfer learning effects there.
In our image recognition tasks, there is something called the ImageNet competition that we participate in most years, and the last time that we competed was two years ago in 2015. There are five image processing categories. We trained our system to do well on Category 1—on the basic image classification—then we used transfer learning to not only win the first category, but to win all four other ImageNet competitions. And so, without any further kind of specialized training, there was a transfer learning effect.
Transfer learning actually does seem to happen. In our deep neural net, deep learning research activities, transfer learning effects—when we see them—are just really intoxicating. It makes you think about what you and I do as human beings.
At the same time, it seems to be this brittle thing. We don’t necessarily understand when and how this transfer learning effect is effective. The early evidence from studying these things is that there are different forms of learning, and that somehow the one-shot ideas that even small children are very good at, seem to be out of the purview of the deep neural net systems that we’re working on right now. Even this intuitive idea that you’ve expressed of transfer learning, the fact is we see it in some cases and it works so well and is even commercially-valuable to us, but then we also see simple transfer learning tasks where these systems just seem to fail. So, even those things are kind of mysterious to us right now.
It seems—and I don’t have any evidence to support this, but it seems, at a gut level to me—that maybe what you’re describing isn’t pure transfer learning, but rather what you’re saying is, “We built a system that’s really good at translating languages, and it works on a lot of different languages.”
It seems to me that the essence of transfer learning is when you take it to a different discipline, for example, “Because I learned a second language, I am now a better artist. Because I learned a second language, I’m now a better cook.” That, somehow, we take things that are in a discipline, and they add to this richness and depth and indimensionality of our knowledge in a way that they really impact our relationships.
I was chatting with somebody the other day who said that learning a second language was the most valuable thing he’d ever done, and that his personality in that second language is different than his English personality. I hear what you’re saying, and I think those are hits that point us in the right direction. But I wonder if, at its core, it’s really multidimensional, what humans do, and that’s why we can seemingly do the one-shot things, because we’re taking things that are absolutely unrelated to cartoon drawings of something relating to real life. Do you have even any kind of a gut reaction to that?
One thing, at least in our current understanding of the research fields, is that there is a difference between learning and reasoning. The example I like to go to is, we’ve done quite a bit of work on language understanding, and specifically in something called machine reading—where you want to be able to read text and then answer questions about the text. And a classic place where you look to test your machine reading capabilities is parts of the verbal part of the SAT exam. The nice thing about the SAT exam is you can try to answer the questions and you can measure the progress just through the score that you get on the test. That’s steadily improving, and not just here at Microsoft Research, but at quite a few great university research areas and centers.
Now, subject those same systems to, say, the third-grade California Achievement Test, and the intelligence systems just fall apart. If you look at what third graders are expected to be able to do, there is a level of commonsense reasoning that seems to be beyond what we try to do in our machine reading system. So, for example, one kind of question you’ll get on that third-grade achievement test is, maybe, four cartoon drawings: a ball sitting on the grass, some raindrops, an umbrella, and a puppy dog—and you have to know which pairs of things go together. Third-graders are expected to be able to make the right logical inferences from having the right life experiences, the right commonsense reasoning inferences to put those two pairs together, but we don’t actually have the AI systems that, reliably, are able to do that. That commonsense reasoning is something that seems to be—at least today, with the state of today’s scientific and technological knowledge—outside of the realm of machine learning. It’s not something that we think machine learning will ultimately be effective at.
That distinction is important to us, even commercially. I’m looking at an e-mail today that someone here at Microsoft sent me to get ready to talk to you today. The e-mail says, it’s right in front of me here, “Here is the briefing doc for tomorrow morning’s podcast. If you want to review it tonight, I’ll print it for you tomorrow.” Right now, the system has underlined, “want to review tonight,” and the reason it’s underlined that is it’s somehow made the logical commonsense inference that I might want a reminder on my calendar to review the briefing documents. But it’s remarkable that it’s managed to do that, because there are references to tomorrow morning as well as tonight. So, making those sorts of commonsense inferences, doing that reasoning, is still just incredibly hard, and really still requires a lot of craftsmanship by a lot of smart researchers to make real.
It’s interesting because you say, you had just one line in there that solving the third-grade problem isn’t a machine learning task, so how would we solve that? Or put another way, I often ask these Turing Test systems, “What’s bigger, a nickel or the sun?” and none of them have ever been able to answer it. Because “sun” is ambiguous, maybe, and “nickel” is ambiguous.
In any case, if we don’t use machine learning for those, how do we get to the third grade? Or do we not even worry about the third grade? Because most of the problems we have in life aren’t third-grade problems, they’re 12th-grade problems that we really want the machines to be able to do. We want them to be able to translate documents, not match pictures of puppies.
Well, for sure, if you just look at what companies like Microsoft, and the whole tech industry, are doing right now, we’re all seeing, I think, at least a decade, of incredible value to people in the world just with machine learning. There are just tremendous possibilities there, and so I think we are going to be very focused on machine learning and it’s going to matter a lot. It’s going to make people’s lives better, and it’s going to really provide a lot of commercial opportunities for companies like Microsoft. But that doesn’t mean that commonsense reasoning isn’t crucial, isn’t really important. Almost any kind of task that you might want help with—even simple things like making travel arrangements, shopping, or bigger issues like getting medical advice, advice about your own education—these things almost always involve some elements of what you would call commonsense reasoning, making inferences that somehow are not common, that are very particular and specific to you, and maybe haven’t been seen before in exactly that way.
Now, having said that, in the scientific community, in our research and amongst our researchers, there’s a lot of debate about how much of that kind of reasoning capability could be captured through machine learning, and how much of it could be captured simply by observing what people do for long enough and then just learning from it. But, for me at least, I see what is likely is that there’s a different kind of science that we’ll need to really develop much further if we want to capture that kind of commonsense reasoning.
Just to give you a sense of the debate, one thing that we’ve been doing—it’s been an experiment ongoing in China—is we have a new kind of chatbot technology in China that takes the form of a person named Xiaolce. Xiaolce is a persona that lives on social media in China, and actually has a large number of followers, tens of millions of followers.
Typically, when we think about chatbots and intelligent agents here in the US market—things like Cortana, or Siri, or Google Assistant, or Alexa—we put a lot of emphasis on semantic understanding; we really want the chatbot to understand what you’re saying at the semantic level. For Xiaolce, we ran a different experiment, and instead of trying to put in that level of semantic understanding, we instead looked at what people say on social media, and we used natural language processing to pick out statement response pairs, and templatize them, and put them in a large database. And so now, if you say something to Xiaolce in China, Xiaolce looks at what other people say in response to an utterance like that. Maybe it’ll come up with a hundred likely responses based on what other people have done, and then we use machine learning to rank order those likely responses, trying to optimize the enjoyment and engagement in the conversation, optimize the likelihood that the human being who is engaged in the conversation will stick with a conversation. Over time, Xiaolce has become extremely effective at doing that. In fact, for the top, say, twenty million people who interact with Xiaolce on a daily basis, the conversations are taking more than twenty-three turns.
What’s remarkable about that—and fuels the debate about what’s important in AI and what’s important in intelligence—is that at least the core of Xiaolce really doesn’t have any understanding at all about what you’re talking about. In a way, it’s just very intelligently mimicking what other people do in successful conversations. It raises the question, when we’re talking about machines and machines that at least appear to be intelligent, what’s really important? Is it really a purely mechanical, syntactic system, like we’re experimenting with Xiaolce, or is it something where we want to codify and encode our semantic understanding of the world and the way it works, the way we’re doing, say, with Cortana.
These are fundamental debates in AI. What’s sort of cool, at least in my day-to-day work here at Microsoft, is we are in a position where we’re able, and allowed, to do fundamental research in these things, but also build and deploy very large experiments just to see what happens and to try to learn from that. It’s pretty cool. At the same time, I can’t say that leaves me with clear answers yet. Not yet. It just leaves me with great experiences and we’re sharing what we’re learning with the world but it’s much, much harder to then say, definitively, what these things mean.
You know, it’s true. In 1950 Alan Turing said, “Can a machine think?” And that’s still a question that many can’t agree on because they don’t necessarily agree on the terms. But you’re right, that chatbot could pass the Turing Test, in theory. At twenty-three turns, if you didn’t tell somebody it was a chatbot, maybe it would pass it.
But you’re right that that’s somehow unsatisfying that this is somehow this big milestone. Because if you saw it as a user in slow motion—that you ask a question, and then it did a query, and then it pulled back a hundred things and it rank ordered them, and looked for how many of those had successful follow-ups, and thumbs up, and smiley faces, and then it gave you one… It’s that whole thing about, once you know how the magic trick works, it isn’t nearly as interesting.
It’s true. And with respect to achieving goals, or completing tasks in the world with the help of the Xiaolce chatbot, well, in some cases it’s pretty amazing how helpful Xiaolce is to people. If someone says, “I’m in the market for a new smartphone, I’m looking for a larger phablet, but I still want it to fit in my purse,” Xiaolce is amazingly effective at giving you a great answer to that question, because it’s something that a lot of people talk about when they’re shopping for a new phone.
At the same time, Xiaolce might not be so good at helping you decide which hotels to stay in, or helping you arrange your next vacation. It might provide some guidance, but maybe not exactly the right guidance that’s been well thought out. One more thing to say about this is, today—at least at the scale and practicality that we’re talking about—for the most part, we’re learning from data, and that data is essentially the digital exhaust from human thought and activity. There’s also another sense in which Xiaolce, while it passes the Turing Test, it’s also, in some ways, limited by human intelligence, because almost everything it’s able to do is observed and learned from what other people have done. We can’t discount the possibility of future systems which are less data dependent, and are able to just understand the structure of the world, and the problems, and learn from that.
Right. I guess Xiaolce wouldn’t know the difference, “What’s bigger, a nickel or the sun?”
That’s right, yes.
Unless the transcript of this very conversation were somehow part of the training set, but you notice, I’ve never answered it. I’ve never given the answer away, so, it still wouldn’t know.
We should try the experiment at some point.
Why do you think we personify these AIs? You know about Weizenbaum and ELIZA and all of that, I assume. He got deeply disturbed when people were relating to a lie, knowing it was a chatbot. He got deeply concerned that people poured out their heart to it, and he said that when the machine says, “I understand,” it’s just a lie. That there’s no “I,” and there’s nothing that “understands” anything. Do you think that somehow confuses relationships with people and that there are unintended consequences to the personification of these technologies that we don’t necessarily know about yet?
I’m always internally scolding myself for falling into this tendency to anthropomorphize our machine learning and AI systems, but I’m not alone. Even the most hardened, grounded researcher and scientist does this. I think this is something that is really at the heart of what it means to be human. The fundamental fascination that we have and drive to propagate our species is surfaced as a fascination with building autonomous intelligent beings. It’s not just AI, but it goes back to the Frankenstein kinds of stories that have just come up in different guises, and different forms throughout, really, all of human history.
I think we just have a tremendous drive to build machines, or other objects and beings, that somehow capture and codify, and therefore promulgate, what it means to be human. And nothing defines that more for us than some sort of codification of human intelligence, and especially human intelligence that is able to be autonomous, make its own decisions, make its own choices moving forward. It’s just something that is so primal in all of us. Even in AI research, where we really try to train ourselves and be disciplined about not making too many unfounded connections to biological systems, we fall into the language of biological intelligence all the time. Even the four categories I mentioned at the outset of our conversation—perception, learning, reasoning, language—these are pretty biologically inspired words. I just think it’s a very deep part of human nature.
That could well be the case. I have a book coming out on AI in April of 2018 that talks about these questions, and there’s a whole chapter about how long we’ve been doing this. And you’re right, it goes back to the Greeks, and the eagle that allegedly plucked out Prometheus’ liver every day, in some accounts, was a robot. There’s just tons of them. The difference of course, now, is that, up until a few years ago, it was all fiction, and so these were just stories. And we don’t necessarily want to build everything that we can imagine in fiction. I still wrestle with it, that, somehow, we are going to convolute humans and machines in a way which might be to the detriment of humans, and not to the ennobling of the machine, but time will tell.
Every technology, as we discussed earlier, is double-edged. Just to strike an optimistic note here—to your last comment, which is, I think, very important—I do think that this is an area where people are really thinking hard about the kinds of issues you just raised. I think that’s in contrast to what was happening in computer science and the tech industry even just a decade ago, where there’s more or less an ethos of, “Technology is good and more technology is better.” I think now there’s much more enlightenment about this. I think we can’t impede the progress of science and technology development, but what is so good and so important is that, at least as a society, we’re really trying to be thoughtful about both the potential for good, as well as the potential for bad that comes out of all of this. I think that gives us a much better chance that we’ll get more of the good.
I would agree. I think the only other corollary to this, where there’s been so much philosophical discussion about the implications of the technology, is the harnessing of the atom. If you read the contemporary literature written at the time, people were like, “It could be energy too cheap to meter, or it could be weapons of colossal destruction, or it could be both.” There was a precedent there for a long and thoughtful discussion about the implications of the technology.
It’s funny you mentioned that because that reminds me of another favorite quote of mine which is from Albert Einstein, and I’m sure you’re familiar with it. “The difference between stupidity and genius is that genius has its limits.”
And of course, he said that at the same time that a lot of this was developing. It was a pithy way to tell the scientific community, and the world, that we need to be thoughtful and careful. And I think we’re doing that today. I think that’s emerging very much so in the field of AI.
There’s a lot of practical concern about the effect of automation on employment, and these technologies on the planet. Do you have an opinion on how that’s all going to unfold?
Well, for sure, I think it’s very likely that there’s going to be massive disruptions in how the world works. I mentioned the printing press, the Gutenberg press, movable type; there was incredible disruption there. When you have nine doublings in the spread of books and printing presses in the period of fifty years, that’s a real medieval Moore’s Law. And if you think about the disruptive effect of that, by the early 1500s, the whole notion of what it meant to educate your children suddenly involved making sure that they could read and write. That’s a skill that takes a lot of expense, and years of formal training and it has this sort of destructive impact. So, while the overall impact on the world and society was hugely positive—really the printing press laid the foundation for the Age of Enlightenment and the Renaissance—it had an absolutely disruptive effect on what it meant and what it took for people to succeed in the world.
AI, I’m pretty sure, is going to have the same kind of disruptive effect, because it has the same sort of democratizing force that the spread of books has had. And so, for us, we’ve been trying very hard to keep the focus on, “What can we do to put AI in the hands of people, that really empowers them, and augments what they’re able to do? What are the codifications of AI technologies that enable people to be more successful in whatever they’re pursuing in life?” And that focus, that intent by our research labs and by our company, I think, is incredibly important, because it takes a lot of the inventive and innovative genius that we have access to, and tries to point it in the right direction.
Talk to me about some of the interesting work you’re doing right now. Start with the healthcare stuff, what can you tell us about that?
Healthcare is just incredibly interesting. I think there are maybe three areas that just really get me excited. One is just fundamental life sciences, where we’re seeing some amazing opportunities and insights being unlocked through the use of machine learning, large-scale machine, and data analytics—the data that’s being produced increasingly cheaply through, say, gene sequencing, and through our ability to measure signals in the brain. What’s interesting about these things is that, over and over again, in other areas, if you put great innovative research minds and machine learning experts together with data and computing infrastructure, you get this burst of unplanned and unexpected innovations. Right now, in healthcare, we’re just getting to the point where we’re able to arrange the world in such a way that we’re able to get really interesting health data into the hands of these innovators, and genomics is one area that’s super interesting there.
Then, there is the basic question of, “What happens in the day-to-day lives of doctors and nurses?” Today, doctors are spending an average—there are several recent studies about this—of one hundred and eight minutes a day just entering health data into electronic health record systems. This is an incredible burden on those doctors, though it’s very important because it’s managed to digitize people’s health histories. But we’re now seeing an amazing ability for intelligent machines to just watch and listen to the conversation that goes on between the doctor and the patient, and to dramatically reduce the burden of all of that record keeping on doctors. So, doctors can stop being clerks and record keepers, and instead actually start to engage more personally with their patients.
And then the third area which I’m very excited about, but maybe is a little more geeky, is determining how we can create a system, how can we create a cloud, where more data is open to more innovators, where great researchers at universities, great innovators at startups who really want to make a difference in health, can provide a platform and a cloud where we can supply them with access to lots of valuable data, so they can innovate, they can create models that do amazing things.
Those three things just all really get me excited because the combination of these things I think can really make the lives of doctors, and nurses, and other clinicians better; can really lead to new diagnostics and therapeutic technologies, and unleash the potential of great minds and innovators. Stepping back for a minute, it really just amounts to creating systems that allow innovators, data, and computing infrastructure to all come together in one place, and then just having the faith that when you do that, great things will happen. Healthcare is just a huge opportunity area for doing this, that I’ve just become really passionate about.
I guess we will reach a point where you can have essentially the very best doctor in the world in your smartphone, and the very best psychologist, and the very best physical therapist, and the very best everything, right? All available at essentially no cost. I guess the internet always provided, at some abstract level, all of that information if you had an infinite amount of time and patience to find it. And the promise of AI, the kinds of things you’re doing, is that it was that difference, what did you say, between learning and reasoning, that it kind of bridges that gap. So, paint me a picture of what you think, just in the healthcare arena, the world of tomorrow will look like. What’s the thing that gets you excited?
I don’t actually see healthcare ever getting away from being an essentially human-to-human activity. That’s something very important. In fact, I predict that healthcare will still be largely a local activity where it’s something that you will fundamentally access from another person in your locality. There are lots of reasons for this, but there’s something so personal about healthcare that it ends up being based in relationships. I see AI in the future relieving senseless and mundane burden from the heroes in healthcare—the doctors, and nurses, and administrators, and so on—that provide that personal service.
So, for example, we’ve been experimenting with a number of healthcare organizations with our chatbot technology. That chatbot technology is able to answer—on demand, through a conversation with a patient—routine and mundane questions about some health issue that comes up. It can do a, kind of, mundane textbook triage, and then, once all that is done, make an intelligent connection to a local healthcare provider, summarize very efficiently for the healthcare provider what’s going on, and then really allow the full creative potential and attention of the healthcare provider to be put to good use.
Another thing that we’ll be showing off to the world at a major radiology conference next week is the use of computer vision and machine learning to learn the habits and tricks of the trade for radiologists, that are doing radiation therapy planning. Right now, radiation therapy planning involves, kind of, a pixel by pixel clicking on radiological images that is extremely important; it has to be done precisely, but also has some artistry. Every good radiologist has his or her different kinds of approaches to this. So, one nice thing about the machine learning basic computer vision today, is that you can actually observe and learn what radiologists do, their practices, and then dramatically accelerate and relieve a lot of the mundane efforts, so that instead of two hours of work that is largely mundane with only maybe fifteen minutes of that being very creative, we can automate the noncreative aspects of this, and allow the radiologists to devote that full fifteen minutes, or even half an hour to really thinking through the creative aspects of radiology. So, it’s more of an empowerment model rather than replacing those healthcare workers. It still relies on human intuition; it still relies on human creativity, but hopefully allows more of that intuition, and more of that creativity to be harnessed by taking away some of the mundane, and time-consuming aspects of things.
These are approaches that I view as very human-focused, very humane ways to, not just make healthcare workers more productive, but to make them happier and more satisfied in what they do every day. Unlocking that with AI is just something that I feel is incredibly important. And it’s not just us here at Microsoft that are thinking this way, I’m seeing some really enlightened work going on, especially with some of our academic collaborators in this way. I find it truly inspiring to see what might be possible. Basically, I’m pushing back on the idea that we’ll be able to replace doctors, replace nurses. I don’t think that’s the world that we want, and I don’t even know that that’s the right idea. I don’t think that that necessarily leads to better healthcare.
To be clear, I’m talking about the great, immense parts of the world where there aren’t enough doctors for people, where there is this vast shortage of medical professionals, to somehow fill that gap, surely the technology can do that.
Yes. I think access is great. Even with some of the health chatbot pilot deployments that we’ve been experimenting with right now, you can just see that potential. If people are living in parts of the world where they have access issues, it’s an amazing and empowering thing to be able to just send a message to chatbot that’s always available and ready to listen, and answer questions. Those sorts of things, for sure, can make a big difference. At the same time, the real payoff is when technologies like that then enable healthcare workers—really great doctors, really great clinicians—to clear enough on their plate that their creative potential becomes available to more people; and so, you win on both ends. You win both on an instant access through automation, but you can also have a potential to win by expanding and enhancing the throughput and the number of patients that the clinics and clinicians can deal with. It’s a win-win situation in that respect.
Well said and I agree. It sounds like overall you are bullish on the future, you’re optimistic about the future and you think this technology overall is a force for great good, or am I just projecting that on to you?
I’d say we think a lot about this. I would say, in my own career, I’ve had to confront both the good and bad outcomes, both the positive and unintended consequences of technology. I remember when I was back at DARPA—I arrived at DARPA in 2009—and in the summer of 2009, there was an election in Iran where the people in Iran felt that the results were not valid. This sparked what has been called the Iranian Twitter revolution. And what was interesting about the Iranian Twitter revolution is that people were using social media, Friendster and Twitter, in order to protest the results of this election and to organize protests.
This came to my attention at DARPA, through the State Department, because it became apparent that US-developed technologies to detect cyber intrusions and to help protect corporate networks were being used by the Iranian regime to hunt down and prosecute people who were using social media to organize these protests. The US took very quick steps to stop the sale of these technologies. But the thing that’s important is that these technologies, I’m pretty sure, were developed with only the best of intentions in mind—to help make computer networks safer. So, the idea that these technologies could be used to suppress free speech and freedom of assembly was, I’m sure never contemplated.
This really, kind of, highlights the double-edged nature of technology. So, for sure, we try to bring that thoughtfulness into every single research project we have across Microsoft Research, and that motivates our participation in things like the partnership on AI that involves a large number of industry and academic players, because we always want to have the technology, industry, and the research world be more and more thoughtful and enlightened on these ideas. So, yes, we’re optimistic. I’m optimistic certainly about the future, but that optimism, I think, is founded on a good dose of reality that if we don’t actually take proactive steps to be enlightened, on both the good and bad possibilities, good and bad outcomes, then the good things don’t just happen on their own automatically. So, it’s something that we work at, I guess, is the bottom line for what I’m trying to say. It’s earned optimism.
I like that. “Earned optimism,” I like that. It looks like we are out of time. I want to thank you for an hour of fascinating conversation about all of these topics.
It was really fascinating, and you’ve asked some of the hardest question of the day. It was a challenge, and tons of fun to noodle on them with you.
Like, “What is bigger, the sun or a nickel?” Turns out that’s a very hard question.
I’m going to ask Xiaolce that question and I’ll let you know what she says.
All right. Thank you again.
Byron explores issues around artificial intelligence and conscious computers in his upcoming book The Fourth Age, to be published in April by Atria, an imprint of Simon & Schuster. Pre-order a copy here.