A Centerpiece Conversation with Nuance’s Paul Ricci

Paul Ricci, Chairman and CEO, Nuance Corp.; hosted by Russ Daggatt, General Partner, Denny Hill Capital

RD: Nuance is THE powerhouse in speech and voice technologies. History of Nuance and how you got where you are today?

PR: Nuance structured around core set of voice recognition and processing technologies, deployed in healthcare market (doctors dictate info about their patients), customer call center management, and mobility (which has become the business of a great deal of focus).

Scansoft was a scanning software comp that did doc recognition, but Ricci had doubts about the breadth of that market, when he joined in 2000. Believed speech tech would be ubiquitous. Believed there would be few companies that would keep a sustained focus over a decade or more, which is what they did and found new emerging markets in that area.

Healthcare tech is about 40% of Nuance. Found they were OEMing tech to a number of firms. Royalties from this were rising. Built some products, brought our version of Dragon for medical applications, acquired Dictaphone, which led to a large footprint in hospitals.

Doctors have to capture v. rich narrative about procedure — very dense, unstructured, etc. Impractical to write, type, etc. Over the last decade, there’s beenn enormous interest in digitizing this information.

This has set the stage for the transformation of doctor notes into structured facts to help inform the rest of the hospital’s operations. That’s where the future of this business is for us.

Rd: There seems to be even greater value in being able to compare history and outcomes coming out of these situations.

PR: Managing the outcomes and improving outcomes efficacy is the single most important enabler in the healthcare system. For the next 3 years, doctors will have a positive incentive for achieving certain goals in the industry in terms of moving towards digital use. After that, they’ll be punished for failing to do so.

RD: Struck me that the real potential for IBM’s Watson is in identifying patterns in medical diagnostics.

PR: Has been working with IBM for several years. Developing a natural language processing for medical applications with IBM, commercializing Watson technology for medical applications and improving diagnoses. Challenges: very complex architecture that allows ability to bring a great deal of evidence from multitudes of sources, which honors each of sources and make sense of them, but in healthcare, the adaptation will require very different sources and the incorporation of prior knowledge, textual and structural info.

Incredibly sophisticated context-aware system.

RD: Context awareness: how much of your tech is unique to a particular speaker, vs. statistical inference from a larger database of speakers.

PR: All systems are speaker-independent systems: don’t require training to work, but improve based on user interaction.

RD: Mobile devices — thin clients, taking place in cloud, whereas on the desktop, more learning takes place from the individual speaker.

PR: All systems take advantage of the virtual recognition advancement and the individual. Computing power on a PC is much better now. State-of-the-art systems are hybrid systems.

RD: Your initial challenges were with literal translations, vs now they’ve moved to contextual translations — what do I mean?

PR: Speech recognition has been highly statistical for quite some time now. The focus on data and empiricism in the models was really driving improvement. Attempts to produce abstract recognition of the language has led to an attempt to create data sets that are moving the field forward very quickly. Statistical techniques are being used to discern intent from this language. The systems we’re going to increasingly see is far more intention-based; the ability to act on the intention fo text, rather than just id text itself. This will draw on location-based data.

RD: Shazaam seems to have developed to the point where it is very robust to outside noises. Do use the same techniques of capturing intensive energy moments for voice recognition.

PR: There have been significant challenges in the area of ambient noise. The greatest of which, being the automobile. In some devices dual mics have been distributed to take advantage of sound mechanics.

RD: As challenge moves towards adding more and more data, is the challenge somewhat similar then to search. To what extent are you converging on search?

PR: Natural language understanding and search are components of a broader level of computing. Watson relies on search tools as one of its array of technologies. I think search will move towards natural language processing rather than the other way around.

Mobile Dragon currently supports about 20 languages, and challenges are essentially connotable across all languages.

RD: Is translation then a fairly simple logical leap?

PR: There is not a canonical representation between languages. The state of the art in machine translation today is much more statistically based; relyling less on normative representation model. Have made investments in language translation, though not fully there.

Machine translation could lend itself to literacy work in the developing world. Have had success with Indian wireless carriers, who are interested in deploying transcription broadly to areas of illiteracy.

RD: How is voice regognition going?

PR: A speaker ID tech has been available for many years, but in the last year or two, they’ve seen far more interest in that then ever before. Security in that access becomes important. Voice verification as a means of security.

RD: Identifying emotion? Long term huge challenge?

PR: There have been emotion detection offerings available for some time, but doesn’t seme to have been profoundly productive business. A lot of interest and press, but not as much leverage in the market.

RD: Nuance’s tech is used in iPhone 4. Rumors that N is an acquisition target of Apple. Where will mobile systems go in the next 3-5 yrs?

PR: Search will move to intentionality, getting at specific info and take action on that info in a very context aware setting– taking advantage of info in your social graph, location, previous queries. Eg. Series system, Nuance’s Dragon search tech. Will develop VERY quickly and appear in mobile devices as a standard feature.

Sri from Intuit: Easiest to speak. Chances are voice will get garbled in multiple translations. To Intuit, voice is still not there as an interface. As a vendor of software, how should they think about it.

PR: Believed that in speech recognition, they would make real advances over committed periods of time. Tech might not be the right tech to obviate use of mouse and keyboard, but 50% of physicians in US use speech recognition because it successfully captures their needs. Nuance has made its tech available to developers throughout the industry. Believes the tech will be imbedded in more and more modules. But, that doesn’t mean that will obviate other interfaces.

Craig Vashon: People are concerned with IP/broken patent system. What does Nuance see as a fix?

PR: Patent system has evolved, and will continue to evolve to help manage that. Not an area of his expertise.