Software b



Thanks to COMPRISE SDK, developers can create multilingual, voice-enabled applications in a faster, cost-effective, and privacy-driven way. The SDK is made for Smartphone applications developed with the Ionic framework, with Angular as a foundation. It consists of:

  • the COMPRISE Personal Server, which allows the execution of large related services outside the Smartphone, while still preserving privacy;
  • the COMPRISE App Wizard, which helps developers to do all necessary configuration in a quick and easy way;
  • the COMPRISE Client Library, which can be deployed on any Android or iOS device and integrates all required voice functionalities (Speech-to-Text, Spoken Language Understanding, Dialog Management, Spoken Language Generation, Text-to-Speech) together with Machine Translation.

These three components shall be installed and configured in the above order.



The COMPRISE Platform is a cloud-based platform designed to:

  • collect anonymized speech and text data from Smartphone apps operating the COMPRISE SDK,
  • curating and labelling this data,
  • training Speech-to-Text and Spoken Language Understanding models on this data,
  • providing access to these models.

All these functionalities can be accessed via a web service API and interfaces for Developers, Data Annotators, and Administrators.


COMPRISE Voice Transformer is part of COMPRISE SDK. It increases privacy by converting each person’s voice into another person’s voice, while preserving the spoken message. It:

  • ensures that any information extracted from the transformed voice can hardly be traced back to the original speaker, as validated through state-of-the-art biometric protocols;
  • preserves the utility of the transformed data for training Speech-to-Text models;
  • leverages cutting-edge deep learning and speech processing technology;
  • can be followed by the Voice Builder, which further discards sensitive words and expressions.


COMPRISE Text Transformer is part of COMPRISE SDK. It allows users in various application domains to mask out critical information in a text that would otherwise threaten the privacy of third parties, while preserving the sentence structure. It:

  • replaces words and expressions carrying personal information by random alternatives, focusing on persons’ names, organisations, locations, dates and times;
  • is applicable to all kinds of text documents in addition to spoken dialogues;
  • leverages cutting-edge deep learning and natural language processing technology;

provides formal differential privacy guarantees.


COMPRISE Weakly Supervised STT is part of COMPRISE Platform. It makes it possible to train Speech-to-Text models while reducing the need for time-consuming and expensive manual data transcription. It consists of two modules:

  • an Automated Labelling module that processes untranscribed speech utterances and outputs one or more text transcriptions for every utterance that exploit specific information about the dialogue domain;
  • a Machine Learning module that takes the transcribed sentences as inputs (and possibly additional manually transcribed sentences) and outputs trained acoustic and language models to be used by a Speech-to-Text system.



COMPRISE Weakly Supervised NLU is part of COMPRISE Platform. It enables customers to reduce the need for time-consuming and expensive manual data labelling when training a Natural Language Understanding system. It consists of two modules:

  • an Automated Sequence Labelling module that processes unlabelled text sentences and outputs a (noisy) label for each sentence or each token in a sentence;
  • a Machine Learning module that learns Natural Language Understanding models from these noisy labels.



COMPRISE Speech-to-Text Translation combines Speech-to-Text and Machine Translation in a smart manner in order to allow every user to speak his/her own language when interacting with a dialogue system that internally uses a different language. Instead of combining separately trained Speech-to-Text and Machine Translation models in a pipeline, the Machine Translation system is trained to handle Speech-to-Text errors and disfluencies, so as to reduce translation errors.

Comments are closed.