Project Details
Multi-lingualita v řečových technologiích
Project Period: 1. 1. 2020 - 31. 8. 2023
Project Type: grant
Code: LTAIN19087
Agency: Ministry of Education, Youth and Sports Czech Republic
Program: INTER-EXCELLENCE - Podprogram INTER-ACTION
multi-linguality, speech recognition, machine learning, data, transfer learning
Speech data mining technologies and human-machine interfaces based on speech have witnessed significant advances in the past decade and numerous applications have been successfully commercialized. However, they usually work correctly only in favorable scenarios - in languages with abundance of training data and in relatively clean environments, such as office or apartment. In fast developing big markets such as the Indian one, severe problems make the exploitation of speech difficult: multitude of languages (some of them with limited or missing resources), highly noisy conditions (lots of business is simply done on the streets in Indian cities), and highly variable numbers of speakers in a conversation (from normal two to whole families). These make the development of automatic speech recognition (ASR), speaker recognition (SR) and speaker diarization (determining who spoke when, SD) complicated. In the proposed project, two established research institutes with significant track multi-lingual ASR, robust SR and SD: Brno University of Technology (BUT), IIT Madras (IIT-M) have teamed up with an important player on the Indian and global personal electronics markets - Samsung R&D Institute India-Bangalore (SRI-B), and propose significant advances in several speech technologies, notably in multi-lingual low-resource ASR. While BUT and IIT-M will provide top speech research (based, among others, on the U.S. IARPA Babel and Material programs, victory in IARPA ASpIRE evaluation and in Interspeech 2018 Low Resource Speech Recognition Challenge for Indian Languages, and on Indian MANDI project), SRI-B will provide data, industrial guidelines and to produce demonstrators of technologies.
Žižka Josef, Ing. (UPGM FIT VUT) , team leader
Egorova Ekaterina, Ing. (UPGM FIT VUT)
Skácel Miroslav, Ing. (UPGM FIT VUT)
2021
- YUSUF Bolaji, ONDEL Lucas Antoine Francois, BURGET Lukáš, ČERNOCKÝ Jan and SARAÇLAR Murat. A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 3710-3714. ISBN 978-1-7281-7605-5. Detail
- LANDINI Federico Nicolás, GLEMBEK Ondřej, MATĚJKA Pavel, ROHDIN Johan A., BURGET Lukáš, DIEZ Sánchez Mireia and SILNOVA Anna. Analysis of the BUT Diarization System for Voxconverse Challenge. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 5819-5823. ISBN 978-1-7281-7605-5. Detail
- ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, RAJ Desh, WATANABE Shinji and ČERNOCKÝ Jan. Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics. In: Proceedings of 2021 Interspeech. Brno: International Speech Communication Association, 2021, pp. 1464-1468. ISSN 1990-9772. Detail
- KOCOUR Martin, CÁMBARA Guillermo, LUQUE Jordi, BONET David, FARRÚS Mireia, KARAFIÁT Martin, VESELÝ Karel and ČERNOCKÝ Jan. BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge. In: Proceedings of IberSPEECH 2021. Vallaloid: International Speech Communication Association, 2021, pp. 113-117. Detail
- BASKAR Murali K., BURGET Lukáš, WATANABE Shinji, ASTUDILLO Ramon and ČERNOCKÝ Jan. Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition. In: ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toronto, Ontario: IEEE Signal Processing Society, 2021, pp. 6753-6757. ISBN 978-1-7281-7605-5. Detail
- PENG Junyi, QU Xiaoyang, GU Rongzhi, WANG Jianzong, XIAO Jing, BURGET Lukáš and ČERNOCKÝ Jan. Effective Phase Encoding for End-To-End Speaker Verification. In: Proceedings Interspeech 2021. Brno: International Speech Communication Association, 2021, pp. 2366-2370. ISSN 1990-9772. Detail
- PENG Junyi, QU Xiaoyang, WANG Jianzong, GU Rongzhi, XIAO Jing, BURGET Lukáš and ČERNOCKÝ Jan. ICSpk: Interpretable Complex Speaker Embedding Extractor from Raw Waveform. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Brno: International Speech Communication Association, 2021, pp. 511-515. ISSN 1990-9772. Detail
- ŽMOLÍKOVÁ Kateřina, DELCROIX Marc, BURGET Lukáš, NAKATANI Tomohiro and ČERNOCKÝ Jan. Integration of Variational Autoencoder and Spatial Clustering for Adaptive Multi-Channel Neural Speech Separation. In: 2021 IEEE Spoken Language Technology Workshop, SLT 2021 - Proceedings. Shenzhen - virtual : IEEE Signal Processing Society, 2021, pp. 889-896. ISBN 978-1-7281-7066-4. Detail
2020
- LOZANO Díez Alicia, SILNOVA Anna, PULUGUNDLA Bhargav, ROHDIN Johan A., VESELÝ Karel, BURGET Lukáš, PLCHOT Oldřich, GLEMBEK Ondřej, NOVOTNÝ Ondřej and MATĚJKA Pavel. BUT Text-Dependent Speaker Verification System for SdSV Challenge 2020. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Shanghai: International Speech Communication Association, 2020, pp. 761-765. ISSN 1990-9772. Detail