How will iPhone change the speech industry?

by Jeff Haynie on January 15, 2007 · Comments

This is a question I have been pondering lately. I’m not as active in the speech industry these days – for several reasons. I won’t entirely list them here, I’ll save that for another post one day. However, one reason is because I believe the speech industry is stagnated a bit and isn’t enjoying the growth we’re seeing in other technology sectors. I am however still watching the industry and still involved loosely with a few projects.

Today, I received my normal email solicitation from Bill Meisel. Bill is a legend in the Speech Recognition world if you don’t already know him. He’s currently an analyst and runs a firm called TMA Associates and publishes a newsletter titled “Speech Strategy News”. Today’s brief email said the following:

Apple’s iPhone is a well-publicized example of the changes in telephony. On the device side, the wireless phone is migrating into all all-purpose portable assistant and entertainment center. Communications infrastructure is also getting a makeover with the integration of the telephone and other communications modalities using Web and network standards. Ad-supported services, including directory assistance and information portals, are making telephony even more analogous to the developments in the Web world.

But the Web analogy can be taken too far. The Graphical User Interface that served the PC and Web world so well has obvious limitations on mobile devices with small screens and clumsy text-input modes. Any phone can call a network-based service and immediately have a Voice User Interface using speech recognition, and that VUI is the same on any phone. This creates opportunities that some companies are already beginning to exploit for services to consumers and businesses. Call centers will soon find that free directory assistance and other trends increase the number of calls and change their nature. Company telephone systems are also evolving into Unified Communications systems, and speech technology makes the many features of such systems usable. Speech technology can also automate field forces using only cell phones.

Do speech recognition and other speech technologies work? Yes, despite a tendency to paint anything short of human abilities as a limitation. When implemented well and used in appropriate applications, the technology is extremely effective. And the many successful deployed applications prove it.

Bill’s right about how much a device can be limited compared to the offering of an in-network speech recognition application. Speech recognition applications can also enjoy a lot of the benefits of a web application, especially when built on a VoiceXML (and CCXML and other related web standards) architecture.

But what’s the hangup for the industry? Why the slow down?

It seems, like usual, there are always a multiple of reasons – many of which are timing, technology, finance and just pure circumstantial – all of which are uncontrollable. However, I believe there are a couple of reasons that are addressable by the industry as a whole.
First part, the pricing doesn’t work.

Lucky for Nuance these days, they’re not only the 800 lb gorilla – they’re really the only significant player left in the marketplace as far as real speech recognition and text-to-speech product offerings go. They’re the monopoly player in the speech space, and they’re really exploiting that advantage from a distribution and pricing standpoint. They’re going to extract as much from that position in terms of power and pricing as they can. Ultimately, I believe, to their detriment – or better said, to the reduction of their own marketplace.
Speech has always been a “nice to have” – not because you can’t build compelling applications that reduce cost. Not because you can’t craft a compelling ROI. Mike Dickerson, my previous co-founder at Vocalocity (which we are no longer at), always use to say that speech suffers from the “I” part of the ROI being “too expensive” to get to the “return on” part of ROI. In other words, if the investment is so large that you have to risk the return, it’s no longer compelling and just increases the risk that you were hoping to mitigate in the business case. Speech is just too expensive. Speech, except for a few applications, is a nice-to-have, it’s a luxury item – it’s nice technology, but only for the chosen few. It really helps and generally people like it – if you are willing to spend more to get it. In every case I was involved in, once you got it deployed and working, they loved it. You just had to justify the cost first, and build a business case that made sense. That seemed to always be the hardest with a nice-to-have technology like speech recognition.

Unfortunately, as the web continues to be more pervasive and ubiquitous, it continues to be just as fast, far less risky and much less expensive than voice. And Bill is right, “the web analogy can be taken too far”. People should stop trying to make the voice web fit the visual web. However, what I think we’re seeing, is the interactive web is making it more difficult to justify (at the current pricing) the voice web. Of course, I’m not trying to say the voice web is necessarily going away – that would be silly. However, with email, IM, SMS and web browsers built directly into these devices – and with the availability of wi-fi networks everywhere – it’s almost impossible to not have access to the Web, when needed. Also, as more and more software is deployed over the web as services using a modern browser, you almost don’t even have to have your own computer to gain access to necessary data and applications anymore (Another early justification for the phone and VoiceXML, that’s losing ground).

So how do you change this? Lowering pricing is certainly one way. I think you’d start to see, over a period of time, the ability to make speech more dominant in phone applications. However, you have to couple that with a healthy eco-system and partner base of developers to get innovative applications conceived, developed and deployed. And that leads to the second issue…

Second part, the ecosystem isn’t healthy.

The voice ecosystem has been dying in the past 2-3 years. The evidence of this is that many small to mid-size companies (in relative terms to the industry itself) have merged with larger companies. SpeechWorks was bought by Scansoft and then Scansoft and Nuance merged into Nuance after many years of fierce competition. VoiceGenie was acquired by Genesys Labs, which was also the acquirer of Telera. Numerous smaller companies are either getting collapsed by larger companies or they’re simply going away. Several smaller companies such as Voxeo and Angel.com seem to really be making serious headway in their businesses (I will say because I believe they’ve embraced the web model directly) and are growing significantly. However, the sign of so much merger mania really is the fall out of the fact that the “speech industry” isn’t there. It’s not big enough, it’s not compelling enough, and hard enough to maintain on its own. Speech really is just (or really should be) another technology feature of an interactive application. And, to make matters worse, even the industry conference and magazine recently merged with a larger conference as John Kelly sold SpeechTek to Information Today. Mergers can be good. In this case, they’re the direct reaction to survival. You must merge to stay alive and in most of these cases, that’s the underlying reason. Heck, even Intel sold off its telecommunications unit, Dialogic, to another smaller company, Eicon Networks – and Brooktrout (the Dialogic competitor) has merged with Excel Switching to become Cantata. While they’re not speech only companies, everyone in the industry recognizes them as major players in the industry. (By the way, these are just a few mergers, there have been many, many more such as Intervoice and Edify or the shocking IP Unity and Glenayre).

If you’re in to “reading the tealeaves”, another interesting note is how many people are “leaving” the industry. Or, maybe not really leaving, but relocating to bigger opportunities, whether that be directly or indirectly related to the “speech industry”. Industry veteran and ex co-founder and CTO of Snowshore and CTO of Cantata just left this past week to join BEA as Deputy CTO. Smart move in my opinion given the state of everything that’s going on there and in the industry. Of course, numerous other people have joined Google in the past 2 years from the industry.
So, where is the industry headed?

Gosh, if I knew that, I’d still be in the industry and probably be an analyst, or better yet, a billionaire.

There are a lot of changing dynamics in the “speech industry”. The 800 lb gorilla is small in relative terms, they have a $2B market cap and did almost $400M in 2006. But, they also have a fairly diverse portfolio of products such as scanners and dictation, of which speech recognition is only one part.

For many years, we’ve thought IBM and Microsoft would get serious and get into the business. They’ve made some decent steps forward, but nothing serious yet.

Like we’ve said in the past, maybe 2007 will be the year… (But, don’t bet on it)

Technorati technorati tags: , , , , , , ,

Popularity: 21% [?]

If you enjoyed this post, make sure you subscribe to my RSS feed!

  • "Speech is too expensive"

    Amen. And one part of that expense that I would hilight is app development. Writing speech apps is not easy, and there are few people with the skills to do it. Even after deploying an app there are expensive tuning activities that need to be done.

    I've worked with both Nuance and Microsoft speech engines. Recognition quality is comparable between the two. Microsoft's whole SALT strategy failed; the idea of sprinkling a little "salt" on HTML to voice-enable it was flawed. However the 2007 version of MSS and Vista will be good products, which may bring down prices in the industry.
  • Thanks Chris. I'm such a speech nut and hope you're correct. The "next 5 year" cycle has just been happening for about 20 years now. :)
  • Chris Prophet
    You need to get back in the Speech business........it is growing and significantly...nearly everything that we sell these days uses Speech.
    Speech will become a major industry in the next 5 years
blog comments powered by Disqus

Previous post: D-Link’s terrible firmware upgrade experience

Next post: Podcamp Atlanta