Smartphones are great devices, but not for everything. There are often times when having a device on you is the last thing you need, particularly if you are not using the screen. Amazon, Google and Apple are all betting that screen-less, intelligent, voice interaction will play a large role the future of the internet at least in the home (if not the office, the car etc.). This article looks at some of the trends and recent announcements that suggest that voice control interfaces linked to artificial intelligence (AI) is the future of the internet.
There were two major milestones in the technology calendar this week. Firstly, Recode’s ‘Code Conference’ featuring Amazon’s Jeff Bezos and Tesla/Space X ‘s Elon Musk in fascinating interviews. Bezos spoke candidly about the company’s Alexa AI platform, the Echo and the future of retail. Also this week, Mary Meeker (from VC firm KPBC) released her yearly Internet Trends report which covers everything from device usage and sales to the uptake of ride-sharing globally. If you add in announcements from Google’s I/O developer conference last month about its voice assistant ‘Google Home’ and Tensor Flow AI platform, there are a number of points supporting the growing role of voice control and AI in the future of the internet.
Smartphone growth and innovation cools off in 2016
One of the interesting stats in the Mary Meeker report was the slowdown in the growth of smartphones globally. The device now has over 2.5BN users worldwide but usage and shipments slowed in 2016 – usage (’15 31% YOY vs. ’16 21% YOY) and shipments (’15 28% YOY vs. ’16 10% YOY). This is partly related to reaching a critical mass (particularly in mature markets) but it is also related to innovation of the form factor. At the Mobile World Congress this year it was evident that smartphone innovation has reached a peak. With the exception of LG’s G5 modular design there were only tweaks to speed, screens and cameras in phones. As physical form factor peaks we are set to see software and in particular voice recognition play a bigger role in device innovation.
Touch is ‘tapped out’ – voice as the interface
Voice control has a number of benefits over touch as a user interface. For most it is faster to speak than type and easier to do so. Voice is also more likely to be context driven and ‘in the moment’ as opposed to having to type into a small screen, or sit down at a PC to converse. There are also a number of unique qualities to voice control. Firstly, for queries it provides random access to a database as opposed to the user having to search for the website (e.g. Google, Yahoo) and then query – the interface takes care of that. Secondly, it is lower cost for the user as it is screen-less and can be smaller (i.e. it only requires, microphone, connectivity, processor). Finally, it requires natural language processing which has an application in textual inquiry but a broader application for voice. It is advancements in voice recognition and natural language processing that are driving greater adoption of voice as the interface.
Advancements in voice recognition and voice assistant devices
The Internet Trends report has some interesting information on the growing accuracy of voice recognition as well as the adoption rate. The report suggests that voice automation technology is “still [in] early innings.” That said, voice-controlled assistants like Apple’s Siri and Microsoft’s Cortana saw big jumps in use, from just 30 percent of its respondents in 2013 to 65 percent in 2015. Word accuracy is also improving with 90% accuracy being reported by Google and Biadu assistance which equates to ~10 million words recognised by machine.
The growing popularity of voice assistant devices like Amazon’s Echo or Google Home is likely to speed up improvements in voice recognition over time. Since launch in November 2014, Amazon has sold 4 million Echo units with ~1 million shipments in Q1 2016 alone. This is a significant head start on Google and Apple with Google Home set for launch in US Autumn 2016. Amazon and arguably Apple are likely to have an advantage over the Google Home device in their ability to create a ‘command and control’ ec0-system around voice recognition. Google on the other hand is likely to be a very strong competitor to the Amazon and Apple as an AI platform supporting voice. There are a two major advantages – Google has the ability to scale globally (across languages) and Google has a huge advantage in Natural Language Processing based on its data platform (e.g. years of search queries, OK Google etc.)
Improvements in AI platforms and developer kits supporting voice
The AI platform supporting voice is a crucial to the success of voice as an interface. At the moment, the Amazon Echo has a lead over Google and Apple thanks to the surge in companies like Spotify, Uber and Domino’s Pizza integrating their services or “skills” to Echo’s computer interface, which is named Alexa. Recent reports show that within nine months from September 2015 to May 2016, the skills that Alexa learned jumped from 14 to over 1,000. Amazon’s Jeff Bezos at the Recod Code Conference made some interesting comments about Alexa that show why the platform is so important:
We expose two different SDK’s for Alexa. One is the Alexa Voice Service which lets you through a set of APIs embed Alexa in your own device or app and do with it what you want. So you could make an alarm clock and you could embed Alexa Voice Service in it. And then we have the Alexa Skill Kit which lets you teach Alexa new skills, and those two things work together.
The point is… The Echo is the first, best customer of Alexa. Longer term, it appears the goal is to have Alexa everywhere. Like all of Amazon’s other businesses that get better with scale (e.g. AWS), Alexa — as is the case for all of these machine-learning based assistants — will only get better the bigger and more widespread it is. The question of competition against Google is whether Google’s vast AI Platform will catch Alexa – time will tell.
The future is voice
Touch as the primary interface for computing and the internet has gone through many generations – cards, keyboard, joystick, mouse and smartphone touch. Touch will continue to remain relevant for many years even as Moore’s Law plays out – it is a very useful interface but not for everything. Voice on the other hand offers huge potential as a widely used interface particularly as word recognition, devices, device ecosystems and supporting platforms improve.
Picture Credit: Mary Meeker Internet Trends Report