In July 2019, the Flemish news outlet VRT published an article revealing that internet giant Google employs thousands of people around the globe to listen to recordings of the customers – mostly unbeknownst to them. This came as quite a shock given that an almost identical report was previously published by the American news outlet Bloomberg, and that had already caused quite a stir in April 2019 — except that back then, it was another internet company employing the same practices: Amazon.
Both companies were recording their users’ voice commands through their respective “smart speaker” products, Google Home and Amazon Alexa. And both times, the stories were quickly picked up by other media, and within 24 hours, made international headlines. It is important to mention that the recording procedure complies with the law and company policies restrict what the people who listen to the recordings can do. Also, voice assistants are not meant to record human conversations but simple “commands” and they can do so only after hearing the “wake-up” word “OK Google” or “Alexa”. So why is this an issue? Well, the answer is quite simple: privacy is a concern when the user’s command contains some private information, or when someone else’s speech is overlapping with the command. It also happens that the voice assistant thinks it has heard the wake-up word even though it hasn’t been pronounced, and it records a conversation between several people instead.
“For example, we use your requests to Alexa to train our speech recognition and natural language understanding systems. The more data we use to train these systems, the better Alexa works, and training Alexa with voice recordings from a diverse range of customers helps ensure Alexa works well for everyone.”
As the VRT article suggests, Google Home users were equally unaware of these practices.
So, why would Google and Amazon even have any interest in not only recording their customers but also having their utterances transcribed as text?
Why would Google and Amazon want to collect and transcribe your voice commands?
In a previous post, we sketched how voice assistants work and the key role that machine learning plays in modern voice-based systems. And there’s an old tongue-in-cheek saying in machine learning: “there’s no data like more data”. It makes sense because if your data collection is too small, it might give a skewed view of what the world is actually like, and so any model of this data would not necessarily be a good model of the real world.
With this in mind, we now have a pretty good idea about those companies’ incentive to not only record you when you talk to Home or Alexa, but even have employees transcribe your utterances: your commands are additional training data that will allow Google and Amazon to improve the components of their system that are based on machine learning. And as we’ve previously discussed, for supervised models, the raw data is not sufficient: the training data must also contain the actual targets that the model should learn. In the case of automatic speech recognition, these are the words contained in your speech signal. Hence, transcribing your utterances is simply the way to create the labels required to run supervised machine learning on the data collection.
Why would you care?
Now, if in the end your voice-based assistant is going to work better thanks to the training data you provided, shouldn’t that be a reason for users of smart speakers to be happy? Then why the massive backlash?
Well, supposedly no one would object to a system that performs better. The question is what price you have to pay for it, and here, “price” is not a monetary term. You are paying with your data. And this can potentially be problematic.
Part of the reason is also that few users were aware of the fact that Google and Amazon record them in the first place, and that they can have access to this data and request it to be deleted. Rather than an opt-out, wouldn’t it be better to make this an opt-in for the users?
- Ford, M., & Palmer, W. (2019). Alexa, are you listening to me? An analysis of Alexa voice service network traffic. Personal and Ubiquitous Computing, 23(1), 67-79. DOI: 10.1007/s00779-018-1174-x.