For nearly 40 years, Ronald Kaplan has been working on better ways to talk with machines.
A former researcher at Xerox PARC and Microsoft, he's now a consulting professor in computational linguistics at Stanford and a senior director at the Silicon Valley research division of Nuance Communications, a Massachusetts company that makes voice-enabled software for cars, smartphones, corporate call centers and medical transcription services.
While its software has been used by Apple, Samsung, Ford and Subaru, among others, Nuance also competes with major tech firms, including Google and Microsoft, that are working on their own voice systems. We recently spoke with Kaplan at the Nuance offices in Sunnyvale; the following was edited for length and clarity.
Q Can you give me a high-level overview of where we are with voice technology?
A The quality of speech recognition (programs) has really improved over the past five years. It used to be you would be surprised when it works, and now you're kind of surprised when it doesn't. But it's not just getting machines to recognize words. It's getting them to understand and act on speech. That's the work we're doing here.
Typing (commands on a keyboard) doesn't work anymore. Particularly as we see these new, smaller devices and wearables (like smart watches) -- you're not going to be sitting around and typing, or even touching (icons), on a tiny screen.
(In addition,) there are all these devices that people encounter in their daily lives, and they can't control them. Take a television remote: You can turn it on and off, move the channel up and down. But it has all kinds of other buttons that probably do really useful functions that maybe you need once every six months. If you don't use them every day, you probably don't remember how they work. It's the same thing with programmable thermostats or car navigation systems.
Nuance sits in that space between the way that ordinary people say things and the complex functional interfaces that engineers are building into all these devices. Voice technology is one part of filling that gap. We're also working on how to coordinate experiences across devices (so different gadgets will recognize a user's voice and habits or interests).
Q Which is more difficult: teaching a machine to understand language, or getting it to understand intent?
A They're both hard problems. There has been 40 years of work on speech recognition and machine translation. Those are problems that are susceptible to a big data approach: If you get enough examples of inputs and outputs, you can learn the correlation.
But to make inferences about users' underlying intent, and to take action, those are problems of a different sort. That's the contrast between "big data" and "big knowledge." If "big data" is a lot of facts, "big knowledge" is a sensible organization of those facts, organized in a way that would allow you to make inferences or generalizations.
Q You once said that voice interaction with computers was similar to talking with a 5- or 6-year-old child. Is that still true?
A I think that we've grown up a bit. With Siri, the convenience has improved. You don't have to push a bunch of buttons (to open the Siri app); you can just say "Hey, Siri." And that (ease of use) is driving further development.
But the advertising claims sometimes overshoot, and that's confusing. People have to understand you can't ask philosophical questions.
Q Is it realistic to think we'll someday converse with machines the way we can speak with adults?
A I think it is. If we go back to the age analogy, from 1 to 3 years, we don't have much conversation. At age 5 to 7, you're beginning to have multiple terms for things. We're seeing the same kind of progression.
And as we're beginning to be exposed to commercial products, the expectation that people will have for these things is going to really move the technology forward. Google now talks about conversational search from one input to the next. (You can ask Google about President Obama and then ask a follow-up question about "him" and the search engine knows you're talking about Obama.)
That's good, but there's much more to do. You might want to go out to dinner and you say, "I want spaghetti. Make a reservation." Yelp already knows about Italian restaurants, but the knowledge that might be missing is that spaghetti is Italian. Or that spaghetti is also pasta.
Q What first attracted you to this field?
A I was interested in the psychology of language: How do children learn a language that enables them to speak and understand an infinite number of sentences, and how do they apply it? I thought this was a really interesting question. But getting a large number of 2-year-olds to cooperate for an experiment is really hard compared to writing a new algorithm. So before you knew it, I had fallen off the psychology track and got involved in computational linguistics and mathematics.
Contact Brandon Bailey at 408-920-5022. Follow him at Twitter.com/BrandonBailey.
Job: Senior director and distinguished scientist at Nuance Communications
Career: Previously worked as a research fellow in natural language theory at Xerox PARC, chief scientist at Xerox spinoff Microlytics, chief scientist at Powerset and research group leader at Microsoft Bing; he's also currently a consulting professor of computational linguistics at Stanford University
Education: Earned bachelor's degree in mathematics and language behavior from UC Berkeley; master's and doctoral degrees in social psychology from Harvard
Personal: Married, father of two adult sons
Source: Mercury News reporting
five things to know about ronald kaplan
1) He was drawn to study psychology because he was interested in how children learn to speak.
2) As a graduate student, he studied and developed computer models of grammar known as augmented transition networks.
3) He holds 36 patents for inventions in language technology.
4) He's a past president of the Association for Computational Linguistics and co-recipient of the Association for Computing Machinery's Software Systems Award.
5) He and his wife, a clinical psychologist, are fond of walking and watching the San Francisco Giants.
Source: Mercury News reporting