Discover COMPRISE Weakly Supervised NLU, an innovative automatic data labelling and model training software for Natural Language Understanding (NLU).
Today’s Deep Learning technology is hungry for data – the more, the better. But for many tasks, raw data is not enough: every sample of data needs to be labelled first, which is a laborious and costly process. For example, NLU tasks such as Named Entity Recognition require every word in every sentence to be marked with a label, for tens of thousands of sentences:
Clark | Kent | works | for | the | Daily | Planet | . |
B-PERSON | I-PERSON | O | O | O | B-ORGANISATION | I-ORGANISATION | O |
An Automated Sequence Labelling module that processes unlabelled sentences and outputs a (noisy) label for each sentence or each token in a sentence;COMPRISE Weakly Supervised NLU tackles this challenge for you. It consists of two modules:
- A Machine Learning module that learns NLU models from these noisy labels.
In contrast with current NLU training procedures that are based on fully manual labelling, our Automated Sequence Labelling module allows for automatic or semi-automatic labelling. To do so, it relies on modern Deep Learning and Natural Language Processing technologies. The module’s labelling is not perfect but that’s not a problem.
Clark | Kent | works | for | the | Daily | Planet | . |
B-PERSON | I-PERSON | O | O | O | O | B-LOCATION | O |
The Machine Learning module is where the real magic happens. By combining manually annotated and automatically annotated labels, the model gradually learns what the differences are between the two. Therefore, even the noisy automatically generated information can be used to better predict the real labels for a production system.
COMPRISE Weakly Supervised NLU is available to everyone in open source and it can be licensed to any stakeholder in a proprietary form. It is especially useful for small and medium-sized companies, as it allows a significant reduction to the labelling cost and the time to market.
Developer survey: Since you are here and interested in our project, could you please spare a moment to share your concerns and answer 12 questions related to developing voice-enabled apps.