Papers – H2020 COMPRISE

Publications HAL du projet européen COMPRISE

2024

Journal articles

titre: Training RNN Language Models on Uncertain ASR Hypotheses in Limited Data Scenarios
auteur: Imran Ahamad Sheikh, Emmanuel Vincent, Irina Illina
article: Computer Speech and Language, 2024, 83, pp.101555. ⟨10.1016/j.csl.2023.101555⟩
Accès au texte intégral et bibtex

2023

Journal articles

titre: Privacy in Speech and Language Technology
auteur: Simone Fischer-Hübner, Dietrich Klakow, Peggy Valcke, Emmanuel Vincent
article: Dagstuhl Reports, 2023, 12 (8), pp.60-102. ⟨10.4230/DagRep.12.8.60⟩
Accès au texte intégral et bibtex

titre: Differentially private speaker anonymization
auteur: Ali Shahin Shamsabadi, Brij Mohan Lal Srivastava, Aurélien Bellet, Nathalie Vauquier, Emmanuel Vincent, Mohamed Maouche, Marc Tommasi, Nicolas Papernot
article: Proceedings on Privacy Enhancing Technologies, 2023, 2023 (1), ⟨10.48550/arXiv.2202.11823⟩
Accès au bibtex

2022

Journal articles

titre: Privacy and utility of x-vector based speaker anonymization
auteur: Brij Mohan Lal Srivastava, Mohamed Maouche, Md Sahidullah, Emmanuel Vincent, Aurélien Bellet, Marc Tommasi, Natalia Tomashenko, Xin Wang, Junichi Yamagishi
article: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2022, ⟨10.1109/TASLP.2022.3190741⟩
Accès au texte intégral et bibtex

titre: The VoicePrivacy 2020 Challenge: Results and findings
auteur: Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O’Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche
article: Computer Speech and Language, 2022, 74, pp.101362. ⟨10.1016/j.csl.2022.101362⟩
Accès au texte intégral et bibtex

Conference papers

titre: TOKEN is a MASK: Few-shot named entity recognition with pre-trained language models
auteur: Ali Davody, David Ifeoluwa Adelani, Thomas Kleinbauer, Dietrich Klakow
article: 25th International Conference on Text, Speech and Dialogue, Sep 2022, Brno, Czech Republic
Accès au bibtex

titre: Enhancing speech privacy with slicing
auteur: Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
article: Interspeech 2022 – Human and Humanizing Speech Technology, Sep 2022, Incheon, South Korea
Accès au texte intégral et bibtex

titre: Transformer versus LSTM Language Models Trained on Uncertain ASR Hypotheses in Limited Data Scenarios
auteur: Imran Ahamad Sheikh, Emmanuel Vincent, Irina Illina
article: LREC 2022 – 13th Language Resources and Evaluation Conference, Jun 2022, Marseille, France
Accès au texte intégral et bibtex

titre: Adapting Language Models When Training on Privacy-Transformed Data
auteur: Mehmet Ali Tugtekin Turan, Dietrich Klakow, Emmanuel Vincent, Denis Jouvet
article: LREC 2022 – 13th Language Resources and Evaluation Conference, Jun 2022, Marseille, France
Accès au texte intégral et bibtex

Preprints, Working Papers, …

titre: Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
auteur: Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O’Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche
article: 2022
Accès au texte intégral et bibtex

2021

Journal articles

titre: MasakhaNER: Named entity recognition for African languages
auteur: David Ifeoluwa Adelani, Jade Abbott, Graham Neubig, Daniel d’Souza, Julia Kreutzer, Constantine Lignos, Chester Palen-Michel, Happy Buzaaba, Shruti Rijhwani, Sebastian Ruder, Stephen Mayhew, Israel Abebe Azime, Shamsuddeen H Muhammad, Chris Chinenye Emezue, Joyce Nakatumba-Nabende, Perez Ogayo, Anuoluwapo Aremu, Catherine Gitau, Derguene Mbaye, Jesujoba Alabi, Seid Muhie Yimam, Tajuddeen Rabiu Gwadabe, Ignatius Ezeani, Rubungo Andre Niyongabo, Jonathan Mukiibi, Verrah Otiende, Iroro Orife, Davis David, Samba Ngom, Tosin Adewumi, Paul Rayson, Mofetoluwa Adeyemi, Gerald Muriuki, Emmanuel Anebi, Chiamaka Chukwuneke, Nkiruka Odu, Eric Peter Wairagala, Samuel Oyerinde, Clemencia Siro, Tobius Saul Bateesa, Temilola Oloyede, Yvonne Wambui, Victor Akinode, Deborah Nabagereka, Maurice Katusiime, Ayodele Awokoya, Mouhamadane Mboup, Dibora Gebreyohannes, Henok Tilaye, Kelechi Nwaike, Degaga Wolde, Abdoulaye Faye, Blessing Sibanda, Orevaoghene Ahia, Bonaventure F P Dossou, Kelechi Ogueji, Ibrahima Thierno, Abdoulaye Diallo, Adewale Akinfaderin, Tendai Marengereke, Salomey Osei
article: Transactions of the Association for Computational Linguistics, 2021, ⟨10.1162/tacl⟩
Accès au texte intégral et bibtex

titre: Enabling voice-based apps with European values
auteur: Akira Campbell, Thomas Kleinbauer, Marc Tommasi, Emmanuel Vincent
article: ERCIM News, 2021, 126, pp.38-39
Accès au bibtex

titre: Monolingual and cross-lingual intent detection without training data in target languages
auteur: Jurgita Kapočiūtė-Dzikienė, Askars Salimbajevs, Raivis Skadiņš
article: Electronics, 2021, 10, ⟨10.3390/electronics10121412⟩
Accès au texte intégral et bibtex

titre: Anonymisation and re-identification risk for voice data
auteur: Alvaro Moretón, Ariadna Jaramillo
article: European Data Protection Law Review, 2021, 7, pp.274 – 284. ⟨10.21552/edpl/2021/2/20⟩
Accès au texte intégral et bibtex

Conference papers

titre: Preventing author profiling through zero-shot multilingual back-translation
auteur: David Ifeoluwa Adelani, Miaoran Zhang, Xiaoyu Shen, Ali Davody, Thomas Kleinbauer, Dietrich Klakow
article: 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2021, Punta Cana, Dominica
Accès au texte intégral et bibtex

titre: The effect of domain and diacritics in Yorùbá-English neural machine translation
auteur: David Ifeoluwa Adelani, Dana Ruiter, Jesujoba O Alabi, Damilola Adebonojo, Adesina Ayeni, Mofetoluwa Adeyemi, Ayodele Awokoya, Cristina Espana-Bonet
article: 18th Biennial Machine Translation Summit, Aug 2021, Orlando, United States
Accès au texte intégral et bibtex

titre: Benchmarking and challenges in security and privacy for voice biometrics
auteur: Jean-Francois Bonastre, Hector Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Paul-Gauthier Noe, Jose Patino, Md Sahidullah, Brij Mohan Lal Srivastava, Massimiliano Todisco, Natalia Tomashenko, Emmanuel Vincent, Xin Wang, Junichi Yamagishi
article: SPSC 2021, 1st ISCA Symposium on Security and Privacy in Speech Communication, ISCA, Nov 2021, Magdeburg, Germany. ⟨10.21437/SPSC.2021-11⟩
Accès au texte intégral et bibtex

Preprints, Working Papers, …

titre: D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning
auteur: Aurélien Bellet, Anne-Marie Kermarrec, Erick Lavoie
article: 2021
Accès au texte intégral et bibtex

titre: On the effect of normalization layers on Differentially Private training of deep Neural networks
auteur: Ali Davody, David Ifeoluwa Adelani, Thomas Kleinbauer, Dietrich Klakow
article: 2021
Accès au bibtex

2020

Journal articles

titre: How can private information recorded by voice-enabled systems be identified?
auteur: Álvaro Moretón, Ariadna Jaramillo
article: European Data Protection Law Review, 2020, 6 (3), pp.464-469. ⟨10.21552/edpl/2020/3/17⟩
Accès au texte intégral et bibtex

titre: Peut-on faire confiance aux IA ?
auteur: Emmanuel Vincent
article: The Conversation France, 2020
Accès au bibtex

Conference papers

titre: Privacy guarantees for de-identifying text transformations
auteur: David Ifeoluwa Adelani, Ali Davody, Thomas Kleinbauer, Dietrich Klakow
article: INTERSPEECH 2020, Oct 2020, Shanghai, China
Accès au texte intégral et bibtex

titre: Distant supervision and noisy label learning for low resource named entity recognition: A study on Hausa and Yorùbá
auteur: David Ifeoluwa Adelani, Michael A Hedderich, Dawei Zhu, Esther van den Berg, Dietrich Klakow
article: ICLR Workshops (AfricaNLP & PML4DC 2020), Apr 2020, Addis Ababa, Ethiopia
Accès au texte intégral et bibtex

titre: Data augmentation for pipeline-based speech translation
auteur: Diego Alves, Askars Salimbajevs, Mārcis Pinnis
article: 9th International Conference on Human Language Technologies – the Baltic Perspective (Baltic HLT 2020), Sep 2020, Kaunas, Lithuania
Accès au texte intégral et bibtex

titre: Private Protocols for U-Statistics in the Local Model and Beyond
auteur: James Bell, Aurélien Bellet, Adrià Gascón, Tejas Kulkarni
article: AISTATS 2020 – 23rd International Conference on Artificial Intelligence and Statistics, Aug 2020, Palermo, Italy
Accès au texte intégral et bibtex

titre: Who started this rumor? Quantifying the natural differential privacy guarantees of gossip protocols
auteur: Aurélien Bellet, Rachid Guerraoui, Hadrien Hendrikx
article: DISC 2020 – 34th International Symposium on Distributed Computing, Oct 2020, Freiburg / Virtual, Germany
Accès au texte intégral et bibtex

titre: Transfer learning and distant supervision for multilingual Transformer models: A study on African languages
auteur: Michael A Hedderich, David I Adelani, Dawei Zhu, Jesujoba Alabi, Udia Markus, Dietrich Klakow
article: 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Nov 2020, Punta Cana, Dominica
Accès au texte intégral et bibtex

titre: Assessing Unintended Memorization in Neural Discriminative Sequence Models
auteur: Mossad Helali, Thomas Kleinbauer, Dietrich Klakow
article: 23rd International Conference on Text, Speech and Dialogue, Sep 2020, Brno, Czech Republic
Accès au texte intégral et bibtex

titre: A comparative study of speech anonymization metrics
auteur: Mohamed Maouche, Brij Mohan Lal Srivastava, Nathalie Vauquier, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
article: INTERSPEECH 2020, Oct 2020, Shanghai, China
Accès au texte intégral et bibtex

titre: Using privacy-transformed speech in the automatic speech recognition acoustic model training
auteur: Askars Salimbajevs
article: 9th International Conference on Human Language Technologies – the Baltic Perspective (Baltic HLT 2020), Sep 2020, Kaunas, Lithuania
Accès au texte intégral et bibtex

titre: On semi-supervised LF-MMI training of acoustic models with limited data
auteur: Imran Sheikh, Emmanuel Vincent, Irina Illina
article: INTERSPEECH 2020, Oct 2020, Shanghai, China
Accès au texte intégral et bibtex

titre: The COMPRISE Cloud Platform
auteur: Raivis Skadiņš, Askars Salimbajevs
article: 1st International Workshop on Language Technology Platforms, May 2020, Marseille, France
Accès au texte intégral et bibtex

titre: Design Choices for X-vector Based Speaker Anonymization
auteur: Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi
article: INTERSPEECH 2020, International Speech Communication Association (ISCA), Oct 2020, Shanghai, China
Accès au texte intégral et bibtex

titre: Evaluating Voice Conversion-based Privacy Protection against Informed Attackers
auteur: Brij Mohan Lal Srivastava, Nathalie Vauquier, Md Sahidullah, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
article: ICASSP 2020 – 45th International Conference on Acoustics, Speech, and Signal Processing, IEEE Signal Processing Society, May 2020, Barcelona, Spain. pp.2802-2806
Accès au texte intégral et bibtex

titre: Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks
auteur: Aleena Thomas, David Ifeoluwa Adelani, Ali Davody, Aditya Mogadala, Dietrich Klakow
article: 23rd International Conference on Text, Speech and Dialogue, Sep 2020, brno, Czech Republic
Accès au texte intégral et bibtex

titre: Introducing the VoicePrivacy initiative
auteur: Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
article: INTERSPEECH 2020, Oct 2020, Shanghai, China
Accès au texte intégral et bibtex

titre: Achieving Multi-Accent ASR via Unsupervised Acoustic Model Adaptation
auteur: Mehmet Ali Tuğtekin Turan, Emmanuel Vincent, Denis Jouvet
article: INTERSPEECH 2020, Oct 2020, Shanghai, China
Accès au texte intégral et bibtex

titre: Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs
auteur: Valentina Zantedeschi, Aurélien Bellet, Marc Tommasi
article: AISTATS 2020 – The 23rd International Conference on Artificial Intelligence and Statistics, Aug 2020, Palerme / Virtual, Italy
Accès au texte intégral et bibtex

Poster communications

titre: Échange de bruit corrélé pour le calcul distribué de moyenne avec garanties de confidentialité différentielle
auteur: César Sabater, Aurélien Bellet, Jan Ramon
article: Conférence sur l’Apprentissage Automatique 2020, Jun 2020, Vannes (Virtual), France.
Accès au bibtex

titre: Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties
auteur: César Sabater, Aurélien Bellet, Jan Ramon
article: NeurIPS 2020 workshop on Privacy Preserving Machine Learning – PriML and PPML Joint Edition, Dec 2020, Vancouver (Virtual Workshop), Canada.
Accès au bibtex

Preprints, Working Papers, …

titre: Privacy Amplification by Decentralization
auteur: Edwige Cyffers, Aurélien Bellet
article: 2020
Accès au texte intégral et bibtex

titre: Distributed Differentially Private Averaging with Improved Utility and Robustness to Malicious Parties
auteur: César Sabater, Aurélien Bellet, Jan Ramon
article: 2020
Accès au texte intégral et bibtex

2019

Conference papers

titre: Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?
auteur: Brij Mohan Lal Srivastava, Aurélien Bellet, Marc Tommasi, Emmanuel Vincent
article: INTERSPEECH 2019 – 20th Annual Conference of the International Speech Communication Association, Sep 2019, Graz, Austria
Accès au texte intégral et bibtex