Project Details
Neural Representations in multi-modal and multi-lingual modeling
Project Period: 1. 1. 2019 - 31. 12. 2023
Project Type: grant
Code: GX19-26934X
Agency: Czech Science Foundation
Program: Grantové projekty exelence v základním výzkumu EXPRO - 2019
deep learning;machine learning;neural networks;continuous representations;natural language processing;speech and text processing;machine translation;multi-modality;multi-linguality
The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of such NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has an ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.
Karafiát Martin, Ing., Ph.D. (UPGM FIT VUT) , team leader
Veselý Karel, Ing., Ph.D. (UPGM FIT VUT) , team leader
Baskar Murali K. (UPGM FIT VUT)
Beneš Karel, Ing. (UPGM FIT VUT)
2020
- MATĚJKA Pavel, PLCHOT Oldřich, GLEMBEK Ondřej, BURGET Lukáš, ROHDIN Johan A., ZEINALI Hossein, MOŠNER Ladislav, SILNOVA Anna, NOVOTNÝ Ondřej, DIEZ Sánchez Mireia and ČERNOCKÝ Jan. 13 years of speaker recognition research at BUT, with longitudinal analysis of NIST SRE. Computer Speech and Language, vol. 2020, no. 63, pp. 1-15. ISSN 0885-2308. Detail
- ALAM Jahangir, BOULIANNE Gilles, BURGET Lukáš, DAHMANE Mohamed, DIEZ Sánchez Mireia, GLEMBEK Ondřej, LALONDE Marc, LOZANO Díez Alicia, MATĚJKA Pavel, MIZERA Petr, MOŠNER Ladislav, NOISEUX Cédric, MONTEIRO Joao, NOVOTNÝ Ondřej, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna, SLAVÍČEK Josef, STAFYLAKIS Themos, ST-CHARLES Pierre-Luc, WANG Shuai and ZEINALI Hossein. Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge. In: Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop. Tokyo: International Speech Communication Association, 2020, pp. 289-295. ISSN 2312-2846. Detail
- ZULUAGA-GOMEZ Juan, MOTLÍČEK Petr, ZHAN Qingran, VESELÝ Karel and BRAUN Rudolf. Automatic Speech Recognition Benchmark for Air-Traffic Communications. In: Proceedings of Interspeech 2020. Sanghai: International Speech Communication Association, 2020, pp. 2297-2301. ISSN 1990-9772. Detail
- LOZANO Díez Alicia, SILNOVA Anna, PULUGUNDLA Bhargav, ROHDIN Johan A., VESELÝ Karel, BURGET Lukáš, PLCHOT Oldřich, GLEMBEK Ondřej, NOVOTNÝ Ondřej and MATĚJKA Pavel. BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Sanghai: International Speech Communication Association, 2020, pp. 761-765. ISSN 1990-9772. Detail
- WANG Shuai, ROHDIN Johan A., PLCHOT Oldřich, BURGET Lukáš, YU Kai and ČERNOCKÝ Jan. Investigation of Specaugment for Deep Speaker Embedding Learning. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Barcelona: IEEE Signal Processing Society, 2020, pp. 7139-7143. ISBN 978-1-5090-6631-5. Detail
- SILNOVA Anna, BRUMMER Niko, ROHDIN Johan A., STAFYLAKIS Themos and BURGET Lukáš. Probabilistic embeddings for speaker diarization. In: Proceedings of Odyssey 2020 The Speaker and Language Recognition Workshop. Tokyo: International Speech Communication Association, 2020, pp. 24-31. ISSN 2312-2846. Detail
2019
- ZEINALI Hossein, ČERNOCKÝ Jan and BURGET Lukáš. A multi purpose and large scale speech corpus in Persian and English for speaker and speech Recognition: the DeepMine database. In: Proceedings of ASRU 2019. Sentosa, Singapore: IEEE Signal Processing Society, 2019, pp. 397-402. ISBN 978-1-7281-0306-8. Detail
- ALAM Jahangir, BOULIANNE Gilles, BURGET Lukáš, GLEMBEK Ondřej, LOZANO Díez Alicia, MATĚJKA Pavel, MIZERA Petr, MOŠNER Ladislav, NOVOTNÝ Ondřej, PLCHOT Oldřich, ROHDIN Johan A., SILNOVA Anna, SLAVÍČEK Josef, STAFYLAKIS Themos, WANG Shuai, ZEINALI Hossein, DAHMANE Mohamed, ST-CHARLES Pierre-Luc, LALONDE Marc, NOISEUX Cédric and MONTEIRO Joao. ABC System Description for NIST Multimedia Speaker Recognition Evaluation 2019. In: Proceedings of NIST 2019 SRE Workshop. Sentosa, Singapore: National Institute of Standards and Technology, 2019, pp. 1-7. Detail
- MATĚJKA Pavel, PLCHOT Oldřich, ZEINALI Hossein, MOŠNER Ladislav, SILNOVA Anna, BURGET Lukáš, NOVOTNÝ Ondřej and GLEMBEK Ondřej. Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 2448-2452. ISSN 1990-9772. Detail
- NOVOTNÝ Ondřej, PLCHOT Oldřich, GLEMBEK Ondřej, ČERNOCKÝ Jan and BURGET Lukáš. Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition. Computer Speech and Language, vol. 2019, no. 58, pp. 403-421. ISSN 0885-2308. Detail
- DIEZ Sánchez Mireia, BURGET Lukáš, LANDINI Federico Nicolás and ČERNOCKÝ Jan. Analysis of Speaker Diarization based on Bayesian HMM with Eigenvoice Priors. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 28, no. 1, pp. 355-368. ISSN 2329-9290. Detail
- DIEZ Sánchez Mireia, BURGET Lukáš, WANG Shuai, ROHDIN Johan A. and ČERNOCKÝ Jan. Bayesian HMM based x-vector clustering for Speaker Diarization. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 346-350. ISSN 1990-9772. Detail
- ONDEL Lucas, VYDANA Hari K., BURGET Lukáš and ČERNOCKÝ Jan. Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery. In: Proceedings of Interspeech 2019. Graz: International Speech Communication Association, 2019, pp. 261-265. ISSN 1990-9772. Detail
- ZEINALI Hossein, WANG Shuai, SILNOVA Anna, MATĚJKA Pavel and PLCHOT Oldřich. BUT System Description to VoxCeleb Speaker Recognition Challenge 2019. In: Proceedings of The VoxCeleb Challange Workshop 2019. Graz, 2019, pp. 1-4. Detail
- ROHDIN Johan A., SILNOVA Anna, DIEZ Sánchez Mireia, PLCHOT Oldřich, MATĚJKA Pavel, BURGET Lukáš and GLEMBEK Ondřej. End-to-end DNN based text-independent speaker recognition for long and short utterances. Computer Speech and Language, vol. 2020, no. 59, pp. 22-35. ISSN 0885-2308. Detail
- NOVOTNÝ Ondřej, PLCHOT Oldřich, GLEMBEK Ondřej and BURGET Lukáš. Factorization of Discriminatively Trained i-Vector Extractor for Speaker Recognition. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 4330-4334. ISSN 1990-9772. Detail
- WANG Shuai, ROHDIN Johan A., BURGET Lukáš, PLCHOT Oldřich, QIAN Yanmin, YU Kai and ČERNOCKÝ Jan. On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 1148-1152. ISSN 1990-9772. Detail
- STAFYLAKIS Themos, ROHDIN Johan A., PLCHOT Oldřich, MIZERA Petr and BURGET Lukáš. Self-supervised speaker embeddings. In: Proceedings of Interspeech. Graz: International Speech Communication Association, 2019, pp. 2863-2867. ISSN 1990-9772. Detail
- ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, KINOSHITA Keisuke, OCHIAI Tsubasa, NAKATANI Tomohiro, BURGET Lukáš and ČERNOCKÝ Jan. SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures. IEEE Journal of Selected Topics in Signal Processing, vol. 13, no. 4, pp. 800-814. ISSN 1932-4553. Detail
2020
- Bayesian HMM based x-vector clustering - VBx, software, 2020
Authors: Diez Sánchez Mireia, Landini Federico Nicolás, Burget Lukáš Detail