Voice-based applications are creating massive growth opportunities for European SMEs by enabling natural interaction with an ever-increasing range of services. Thanks to its naturalness and efficiency, voice has become a common means of accessing a range of services such as web search, car navigation, and smart home control. This trend has been reinforced with the release of a new generation of devices including smart speakers with limited touch interfaces and visual display. It is now spreading to new sectors, e.g., e-commerce and ambient assisted living, thereby creating massive growth opportunities for European companies, particularly for SMEs. The underlying speech-to-text, spoken languages understanding and dialogue technologies require rare and expensive expertise coupled to huge speech and language data in every language to reach state-of-the-art performance and meet end-users’ expectations.
The COMPRISE SDK is a toolkit including a developer UI and client libraries to help developers implement multilingual, privacy-aware, voice-enabled mobile applications. In this short article, we will highlight how COMPRISE tackles 4 key challenges in order to help delivering first-class voice-based applications.
Data privacy, also called information privacy, is the aspect of information technology that deals with the ability an organization or individual has to determine what data in a system can be shared with third parties, how that information is used, or if that information is used to track users. Privacy concerns arise whenever personally identifiable information or other sensitive information is collected, stored, processed, and/or deleted. This stored and/or processed data is by essence valuable which may interest hackers especially when data is related to personal information. The challenge of data privacy is to utilize data while protecting individuals’ privacy preferences and their personally identifiable information. Applications created with the COMPRISE SDK remove any kind of privacy-related information from the user’s input before it is sent outside of the user’s device. The SDK contains data transformation components to delete as much private information as possible from the users’ speech and text, while preserving enough information to label the resulting “neutral” data manually if needed and to train large-scale user-independent speech-to-text, spoken language understanding, and dialog management systems on these “neutral” data in the cloud.
Multilingual & Inclusive
Existing SDKs for voice-enabled development are usually limited to one language and require significant overhead to make them available in other languages. As a consequence, voice-enabled applications are often available for limited markets only. Unlike American software vendors, which are focusing on the US market, the richness of languages in Europe limits the market possibilities for European software vendors, which have to deal with many different languages due to the federated nature of Europe. The demand for creating voice-enabled applications in any language is huge as it opens huge markets for developers. Because of the high language diversity, this specially opens a big opportunity for European software companies as users want to use voice interaction in their mother tongue. The COMPRISE SDK is natively designed for multilingual voice-enabled application development. The users of COMPRISE Apps are able to speak in their own language to interact with a dialog system in another language.
The “gold standard” in voice-based features is to store the raw in-domain voice commands uttered by the end-users themselves in the cloud and to hire human annotators to transcribe them. Due to the huge cost incurred, SMEs can’t compete and often fall back on voice-based technologies trained on out-of-domain data or on purely text-based chatbots with little benefit for the end-users. The SDK offers compactly bundled functionalities to its users, so that they can easily include all needed functionalities in a voice-enabled application. This avoids developers having to search for multiple interfaces and libraries for Speech-To-Text, Machine Translation, Spoken Language Understanding, and many more on their own, or even to implement such functionalities by themselves. Hence, the SDK is suitable for rapid prototyping, and voice-enabled applications can be quickly created. This leads to cost savings in terms of human resources. Firstly, the developers of these applications need much less time to complete their task, which reduces the cost for employers. Secondly, less expertise is needed to build the product, which translates into lower salaries hence additional cost savings for employers.
The shared vision of the COMPRISE partners is to jointly build a strong, extensible and sustainable open-source project. This strategy will foster the innovation and accelerate the adoption of the whole ecosystem to add more features and extend the fields of application. In addition, DevOps-inspired methods for deployment, testing and continuous integration are used to accelerate the incorporation of multilingualism in internet applications, enhancing the capability of end-users to enter the multilingual market.
By addressing the research and innovation challenges of privacy preservation, cost-effectiveness, and inclusiveness, COMPRISE enables companies of all sizes, including SMEs, to deploy voice interaction technologies for a wider range of languages and application domains and to develop multilingual voice-enabled applications accessible by any user, whatever his/her language. The COMPRISE SDK enables voice-based application developers to design, build and run privacy-aware mobile and web applications while releasing these from the burden of cost, technical complexity and lack of language resources.
For more details, please visit:
Dr. Youssef RIDENE
Developer survey: Since you are here and interested in our project, could you please spare a moment to share your concerns and answer 12 questions related to developing voice-enabled apps.