Data annotation for Speech-to-Text and Natural Language Understanding
What is Machine Learning?
Machine learning refers to “Training AI algorithms to make decisions or predictions based on large amounts of data”.
There are several types of machine learning techniques, depending on the amount of human knowledge provided as input: supervised learning, weakly supervised learning, unsupervised learning, etc.
How does this relate to Speech-to-Text systems
Speech-to-Text (STT) systems are trained by feeding them with speech samples (from hundreds to tens of thousands of hours) together with the corresponding text transcription.
Note: Be careful! STT systems must be trained with diverse data (e.g., male and female speakers, native and non-native speakers,etc.) to prevent biases.
Is that all?
Voice-enabled applications rely on other technologies too.
For instance, they use Natural Language Understanding (NLU) to interpret the user’s intent and Dialog Management systems to provide an appropriate response.
Most companies rely on the manual annotation of all training utterances (hundreds to tens of thousands of hours of data).
I.e., manual transcription of speech into text and manual labelling of the user’s intent, to train STT and NLU models. The costs associated with this manual annotation are very high and, sometimes, inaccessible, especially to SMEs.
Is there a solution?
Fortunately to all SMEs out there, Weakly Supervised Learning has emerged as a cost-effective alternative to human labelling.
It uses an automatic annotation module together with side information when available to obtain cheap, possibly inaccurate annotations, and a machine learning module to train AI systems on these annotations.
COMPRISE has developed innovative software modules for automatic data annotation and STT and NLU system training based on Weakly Supervised Learning.