Redefining the next generation of human machine interaction: a Q&A with Mobvoi

Image credit: TechRadar (Image credit: Future)

If you’ve heard of the company Mobvoi recently, it’s likely because of its Ticwatch smartwatch line or its other accessories that have gained tremendous popularity over the last few years. However, Mobvoi is actually a Chinese artificial intelligence (AI) giant that was started by a former Google employee.

To learn more about how the company was founded, its partnership with Google and its recent push into the enterprise market, TechRadar Pro spoke with its CEO and co-founder Zhifei Li.

Can you tell us a bit about the company’s history and its Chumenwenwen voice search app?

While working as a research scientist at Google on Google translate, I realised that human machine interaction would soon evolve from keyboard and touch to voice. In 2011, Siri was introduced but voice assistant was very much a novelty and there was nothing like it in China. I wanted to lead this voice first revolution in China, so in 2012 I founded my own company, Mobvoi, combining mobile and voice.

Chumenwenwen not only introduced a standalone voice search app but also a WeChat corporate account so users could do voice searches directly in WeChat. It was the first voice first consumer product and gathered a lot of user interest right away.

The product Chumenwenwen means “go out and ask” in Chinese, which indicates its two salient features: mobile and voice search. Chumenwenwen voice search removes the need for typing into keyboards or physically interacting with your phone. It allows for voice activated commands that can be used to control every aspect of your phone use. It is an application that is available on multiple platforms including WeChat, the most popular mobile messaging app in China. 

Our voice-activated AI assistance app, that is available on Android and iOS devices and allows searches in over 60 vertical domains of interest, gives users the ability to ask for directions, restaurant suggestions, news, and weather information among many other options. Overall, it behaves very similarly to Google Assistant or Alexa. In 2015, Chumenwenwen became the official voice service provider for Google’s WearOS users in China.

Over the last seven years, Mobvoi has developed Chinese voice recognition, natural language processing, and vertical search technology in-house and made a line of award-winning voice-enabled consumer products such as TicWatch and TicPods. The company has since undergone six rounds of funding by the likes of Sequoia Capital, Zhenfund, Google, Volkswagen (VW) and SIG.

Image credit: TechRadar

Image credit: TechRadar (Image credit: Future)

How has Mobvoi benefitted from its partnership with Google, and has being the official voice search provider for Wear OS users in China helped your company grow?

Our partnership with Google has been a huge success for us, bringing voice search to Android Wear (now called WearOS) users in China. This also allowed us to incorporate Android Wear with TicWear, providing users with Google-powered services in their Chinese smartwatches.

This ultimately led to financing from Google in the tens of millions which opened up new doors of opportunity for us and we have been able to hire talent from across the world, helping constantly refine our software services for our customers.  

Your company launched an AI enabled mirror for automobiles called TicMirror in 2016. Was it difficult adapting your technology to be used in automobiles?

TicMirror is a smart rearview mirror that provides navigation, search function for points of interest, instant messaging, and on-board infotainment through voice command. We introduced TicMirror to the Chinese market in 2016 as a follow-up from the success seen from our TicWatch line, which was what initially attracted investment from Google. 

We took TicMirror from concept to market in less than nine months, an amazing feat in any industry, which ultimately led to another round of investment led by Volkswagen Group China.

The difficulty doesn’t lie in ‘'adapting’ our technology because we analyse the same core elements in terms of speech recognition, natural language processing for all projects we work on. However, VW has very high requirements for embedded functionality, meaning we have to put in a lot of resources in R&D, which has paid off, and alongside them we are very proud of the results.

Image credit: UnSplash

Image credit: UnSplash (Image credit: Image Credit: RawPixel / Unsplash)

What prompted your decision to move into the enterprise market? Do you feel that voice search and virtual assistants are being underutilized in corporate environments?

I founded Mobvoi with the mission of re-defining next generation human machine interaction through AI, and I feel that we accomplish this through both our own voice-enabled smart consumer products as well as voice-enabling other businesses to improve customer experience and workplace efficiency.

The number of smart speaker users in the UK is set to grow by almost one-third in 2019, after doubling this year, according to eMarketer’s inaugural forecast on smart speaker usage in the UK. So there’s already consensus that voice is the future and leading companies across industries are actively pursuing application in customer service, anti-fraud and authentication, consumer engagement etc. But enterprise adoption will take longer not only because the technology itself is still evolving, but because the solutions have to be customised to the specific needs of the industry vertical, being integrated into existing systems and processes, provide business customers complete data ownership and peace of mind in terms of data security. These needs cannot be completely met by tech giants building platform solutions.

With our mission to redefine the next generation of human machine interaction, the decision to integrate voice into consumer devices was to make voice AI useful. We have successfully created a voice interaction and IoT ecosystem over the last few years and see enterprise application as the next big opportunity.

What benefits does embedded voice AI have over cloud-based AI solutions?

Embedded voice AI enables companies to incorporate voice interaction into their existing products or operations in a more reliable and secure way. It runs on the local device without the need for internet connection, and therefore the voice interaction is more responsive and reliable. Typically, the embedded voice AI is much faster than cloud AI and the response speed is more consistent. Embedded voice AI also doesn’t need to transmit voice data to the cloud and is more suitable for tasks with concerns on privacy. 

Image credit: Volkswagen

Image credit: Volkswagen (Image credit: Future)

Mobvoi and Volkswagen Group China have established a joint venture to add an in-car voice assistant to the automaker’s vehicles. How is your solution different from similar offerings from Google and Apple in the automotive space?

This partnership is a key milestone for Mobvoi as we endeavor to integrate our leading AI technologies into consumers’ daily lives through cross-industry partnerships. 

The in-vehicle experience of human machine interaction is one of my key focuses at Mobvoi. Through the partnership with Volkswagen Group China many car owners will soon experience the differences these new technologies can make, and by having voice AI technologies within the car it brings a new dimension to the currently-existing smart-cars and will prove to be a leading move in the automotive space.

The voice solution we have worked on at Mobvoi is deeply integrated with the car operating system, which differs from Google or Apple in automotive space. Our solution is custom-made to VW’s specifications, with a custom wake-word, and is able to provide many voice interaction functions without the need for an internet connection. When an internet connection is available, the solution adopts a hybrid system to combine the results from the local device and cloud, providing the best result in terms of accuracy and speed. 

On top of that, the solution taps into the IoT nature of technology that goes beyond the vehicle itself. Full connectivity with the smart home, smartwatch and smart phone can be achieved even before entering the vehicle. For instance, users can request directions to be sent to their vehicles from their smart speakers and they can operate their air conditioning in their homes straight from their car. This allows for a seamless experience regardless of where you go.

Do you think that embedded voice assistants will soon become the norm? Why or why not?

Although the majority of attention and discussion over the last few years has been around smart speakers with cloud-based general purpose voice assistants, I believe that embedded or on-premise voice assistant adoption will increase and co-exist to fit the variety and complexity of business and consumer needs. We believe that 30 per cent of human machine interaction will be through voice in five years through many different types of voice assistants.