Voice-based Applications for Consumers

Since the invention of the very first smartphone, many things have changed. While the first apps were only provided by the developers themselves, there are nowadays app stores with millions of apps for nearly everything [Ref1].

Challenges when developing consumer applications

While developing an app, the developer has to think about multiple aspects. Among these, and the most important ones, are the user interface and the user experience which have the highest requirements. Every user interface should guarantee that the app is intuitive to use without the need for any instructions or manual. As the number of apps is growing and as these become more complex, an intuitive user interface becomes crucial. If a user has to think about where he/she has to click next, he/she will probably leave the app and delete it.

When it comes to complexity, another thing the developer has to care about is errors. The more complex an app becomes the more likely errors will occur. Because manual testing is very time-consuming, developers write small tests which allow them to get a quick overview whether the most important parts of their app work as expected.

Nowadays, most apps offer synchronization for saved documents and settings with the rest of the users’ devices via a cloud service. This can be achieved by saving files to the users’ cloud storage solution like Dropbox or Google Drive. Another option is that developers own a cloud service. When sending data to a cloud service there is another matter developers have to pay attention to: data protection and privacy. At least since the revelations of Edward Snowden [Ref2] and the entry into force of the GDPR (see our previous blog post as well as our deliverable D5.1 – Data Protection and GDPR Requirements), privacy has become more important than ever before. This means that developers must also keep this in mind when developing their apps.

Before any release, the responsible developer has to make sure that the app is translated into the languages spoken by its users. Although the majority of users speak English, it is much more convenient for them when the interface is available in their native language. From a developer’s perspective, the developer now has to translate every piece of text output manually into every supported language. This process takes a lot of time and money, especially when outsourcing this work to agencies.

Talking to the device

Over the past years, the way how users interact with computers has changed a lot. The first computers had to be operated by experts. Later on, users became able to control computers using a mouse and a keyboard. Portable devices like smartphones and tablets have a touch screen which allows users to just touch where they want to navigate to. A few years ago, another way to interact with computers has become popular. Virtual personal assistants like Siri, Alexa and Google Home enable users to interact with computers using the most natural way: voice. Apps that are used during other activities like cooking or driving benefit most from voice interaction support. In that way, users can still concentrate on cooking for instance while setting a timer for the spaghetti. For this reason, many users already have one of these voice assistants installed in their home [Ref3].

Developing intelligent apps is complicated, and this applies to voice recognition in particular. The spoken voice must be first translated into text by a speech-to-text module and afterwards interpreted by a natural language understanding module. The system then tries to predict the users’ intent [Ref4]. All this is typically done on the servers of the provider who developed the virtual assistant because this requires computing power. But how should the virtual assistant know that the user is actually talking to it? Most assistants wait for a predefined “wake-up” word or phrase. This approach prevents all audio to be continuously sent to the provider because the wake-up word can be detected with the limited resources available on the device itself.

For developers, these virtual assistants usually provide an interface which makes it easier to integrate apps into the virtual assistant. But as great as this possibility is, there are still two big problems. First of all, multilingualism is much more important in voice-based applications than in classical screen-driven user interfaces. While users will translate written content subconsciously into their own language, they would now have to also respond to the system in one of the languages provided. This means that a developer has to pay much more effort into adding more languages than before.

The second issue is about privacy. When using virtual assistants, users share tons of sensitive data [Ref5]. Most of them are not even aware of it. Without any specific context, you can already find out about their gender, how they feel, where they currently are or what they are probably doing right now. Adding the actual context of a users’ message to all the data, providers could for instance extrapolate medical or financial information about a user. Based on this, some providers already link this information to their marketing systems for a better performance.

In conclusion, the integration of voice interaction support into apps is easy using existing interfaces. Apps which are mainly used besides other activities like driving or cooking especially benefit from such integration. But, paying serious respect to the users’ privacy makes it nearly impossible to implement voice interaction support in a reasonable way today.

References

[Ref1] “Number of apps available in leading app stores as of 4^th quarter 2019“, Statista (January 15, 2020)

[Ref2] “Edward Snowden: Leaks that exposed US spy programme”, BBC (January 17, 2014)

[Ref3] “The smart audio report winter 2019 from NPR and Edison Research”, Edison Research (January 8, 2020)

[Ref4] “How do digital voice assistants (e.g. Alexa, Siri) work?”, University of Southern California (October 17, 2017)

[Ref5] “Amazon reportedly employs thousands of people to listen to your Alexa conversations”, CNN Business (April 11, 2019)

Developer survey: Since you are here and interested in our project, could you please spare a moment to share your concerns and answer 12 questions related to developing voice-enabled apps.