Project Details
Neural Representations in multi-modal and multi-lingual modeling
Project Period: 1. 1. 2019 – 31. 12. 2023
Project Type: grant
Code: GX19-26934X
Agency: Czech Science Foundation
Program: Grantové projekty exelence v základním výzkumu EXPRO - 2019
deep learning;machine learning;neural networks;continuous representations;natural
language processing;speech and text processing;machine
translation;multi-modality;multi-linguality
The NEUREM3 project encompasses basic research in speech processing (SP) and
natural language processing (NLP) with accent on multi-linguality and
multi-modality (speech and text processing with the support of visual
information). Current deep machine learning methods are based on continuous
vector representations that are created by the neural networks (NN) themselves
during the training. Although empirically, the results of such NNs are often
excellent, our knowledge and understanding of such representations is
insufficient. NEUREM3 has an ambition to fill this gap and to study neural
representations for speech and text units of different scopes (from phonemes and
letters to whole spoken and written documents) and representations acquired both
for isolated tasks and multi-task setups. NEUREM3 will also improve NN
architectures and training techniques, so that they can be trained on incomplete
or incoherent data.
Baskar Murali Karthick, Ing., Ph.D.
Beneš Karel, Ing., Ph.D. (DCGM)
Han Jiangyu (DCGM)
Karafiát Martin, Ing., Ph.D. (DCGM)
Kesiraju Santosh, Ph.D. (DCGM)
Peng Junyi (DCGM)
Plchot Oldřich, Ing., Ph.D. (DCGM)
Rohdin Johan Andréas, M.Sc., Ph.D. (DCGM)
Sarvaš Marek, Ing.
Veselý Karel, Ing., Ph.D. (DCGM)
2024
- BENEŠ, K.; KOCOUR, M.; BURGET, L. Hystoc: Obtaining Word Confidences for Fusion of End-To-End ASR Systems. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024.
p. 11276-11280. ISBN: 979-8-3503-4485-1. Detail - HAN, J.; LANDINI, F.; ROHDIN, J.; DIEZ SÁNCHEZ, M.; BURGET, L.; CAO, Y.; LU, H.; ČERNOCKÝ, J. Diacorrect: Error Correction Back-End for Speaker Diarization. In ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Seoul: IEEE Signal Processing Society, 2024.
p. 11181-11185. ISBN: 979-8-3503-4485-1. Detail - KLEMENT, D.; DIEZ SÁNCHEZ, M.; LANDINI, F.; BURGET, L.; SILNOVA, A.; DELCROIX, M.; TAWARA, N. Discriminative Training of VBx Diarization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024.
p. 11871-11875. ISBN: 979-8-3503-4485-1. Detail - LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L. DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors. IEEE Transactions on Audio, Speech, and Language Processing, 2024, vol. 32, no. 7,
p. 3450-3465. ISSN: 1558-7916. Detail - PENG, J.; DELCROIX, M.; OCHIAI, T.; ASHIHARA, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Probing Self-Supervised Learning Models With Target Speech Extraction. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024.
p. 535-539. ISBN: 979-8-3503-7451-3. Detail - PENG, J.; DELCROIX, M.; OCHIAI, T.; PLCHOT, O.; ARAKI, S.; ČERNOCKÝ, J. Target Speech Extraction with Pre-Trained Self-Supervised Learning Models. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Seoul: IEEE Signal Processing Society, 2024.
p. 10421-10425. ISBN: 979-8-3503-4485-1. Detail
2023
- DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S. Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 3477-3481. ISSN: 1990-9772. Detail - KAKOUROS, S.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L. Speech-Based Emotion Recognition with Self-Supervised Models Using Attentive Channel-Wise Correlations and Label Smoothing. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - KESIRAJU, S.; BENEŠ, K.; TIKHONOV, M.; ČERNOCKÝ, J. BUT Systems for IWSLT 2023 Marathi - Hindi Low Resource Speech Translation Task. In 20th International Conference on Spoken Language Translation, IWSLT 2023 - Proceedings of the Conference. Toronto (in-person and online): Association for Computational Linguistics, 2023.
p. 227-234. ISBN: 978-1-959429-84-5. Detail - KESIRAJU, S.; SARVAŠ, M.; PAVLÍČEK, T.; MACAIRE, C.; CIUBA, A. Strategies for Improving Low Resource Speech to Text Translation Relying on Pre-trained ASR Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 2148-2152. ISSN: 1990-9772. Detail - LANDINI, F.; DIEZ SÁNCHEZ, M.; LOZANO DÍEZ, A.; BURGET, L. Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - MATĚJKA, P.; SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; PLCHOT, O.; KLČO, M.; PENG, J.; STAFYLAKIS, T.; BURGET, L. Description and Analysis of ABC Submission to NIST LRE 2022. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 511-515. ISSN: 1990-9772. Detail - MOŠNER, L.; PLCHOT, O.; PENG, J.; BURGET, L.; ČERNOCKÝ, J. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 1693-1697. ISSN: 1990-9772. Detail - PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. An attention-based backend allowing efficient fine-tuning of transformer models for speaker verification. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023.
p. 555-562. ISBN: 978-1-6654-7189-3. Detail - PENG, J.; PLCHOT, O.; STAFYLAKIS, T.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Improving Speaker Verification with Self-Pretrained Transformer Models. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Dublin: International Speech Communication Association, 2023.
p. 5361-5365. ISSN: 1990-9772. Detail - PENG, J.; STAFYLAKIS, T.; GU, R.; PLCHOT, O.; MOŠNER, L.; BURGET, L.; ČERNOCKÝ, J. Parameter-Efficient Transfer Learning of Pre-Trained Transformer Models for Speaker Verification Using Adapters. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - SILNOVA, A.; BRUMMER, J.; SWART, A.; BURGET, L. Toroidal Probabilistic Spherical Discriminant Analysis. In Proceedings of ICASSP 2023. Rhodes Island: IEEE Signal Processing Society, 2023.
p. 1-5. ISBN: 978-1-7281-6327-7. Detail - SILNOVA, A.; SLAVÍČEK, J.; MOŠNER, L.; KLČO, M.; PLCHOT, O.; MATĚJKA, P.; PENG, J.; STAFYLAKIS, T.; BURGET, L. ABC System Description for NIST LRE 2022. Proceedings of NIST LRE 2022 Workshop. Washington DC: National Institute of Standards and Technology, 2023.
p. 1-5. Detail - STAFYLAKIS, T.; MOŠNER, L.; KAKOUROS, S.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations. In 2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings. Doha: IEEE Signal Processing Society, 2023.
p. 1136-1143. ISBN: 978-1-6654-7189-3. Detail - YU, D.; GONG, Y.; PICHENY, A.; RAMABHADRAN, B.; HAKKANI-TÜR, D.; PRASAD, R.; ZEN, H.; SKOGLUND, J.; ČERNOCKÝ, J.; BURGET, L.; MOHAMED, A. Twenty-Five Years of Evolution in Speech and Language Processing. IEEE SIGNAL PROCESSING MAGAZINE, 2023, vol. 40, no. 5,
p. 27-39. ISSN: 1558-0792. Detail - YUSUF, B.; ČERNOCKÝ, J.; SARAÇLAR, M. End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, 2023, vol. 31, no. 08,
p. 3070-3080. ISSN: 2329-9290. Detail
2022
- ALAM, J.; BURGET, L.; GLEMBEK, O.; MATĚJKA, P.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; STAFYLAKIS, T. Development of ABC systems for the 2021 edition of NIST Speaker Recognition evaluation. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 346-353. Detail - BASKAR, M.; HERZIG, T.; NGUYEN, D.; DIEZ SÁNCHEZ, M.; POLZEHL, T.; BURGET, L.; ČERNOCKÝ, J. Speaker adaptation for Wav2vec2 based dysarthric ASR. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 3403-3407. ISSN: 1990-9772. Detail - BRUMMER, J.; SWART, A.; MOŠNER, L.; SILNOVA, A.; PLCHOT, O.; STAFYLAKIS, T.; BURGET, L. Probabilistic Spherical Discriminant Analysis: An Alternative to PLDA for length-normalized embeddings. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 1446-1450. ISSN: 1990-9772. Detail - BURGET, L.; BOJAR, O. NEUREM3 Interim Research Report. Brno: Department of Computer Graphics and Multimedia FIT BUT, 2022.
p. 1-78. Detail - KOCOUR, M.; UMESH, J.; KARAFIÁT, M.; ŠVEC, J.; LOPEZ, F.; BENEŠ, K.; DIEZ SÁNCHEZ, M.; SZŐKE, I.; LUQUE, J.; VESELÝ, K.; BURGET, L.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2022 Speech to Text Challenge. Proceedings of IberSpeech 2022. Granada: International Speech Communication Association, 2022.
p. 276-280. Detail - LANDINI, F.; LOZANO DÍEZ, A.; DIEZ SÁNCHEZ, M.; BURGET, L. From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 5095-5099. ISSN: 1990-9772. Detail - PENG, J.; GU, R.; MOŠNER, L.; PLCHOT, O.; BURGET, L.; ČERNOCKÝ, J. Learnable Sparse Filterbank for Speaker Verification. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 5110-5114. ISSN: 1990-9772. Detail - PENG, J.; ZHANG, C.; ČERNOCKÝ, J.; YU, D. Progressive contrastive learning for self-supervised text-independent speaker verification. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 17-24. Detail - SILNOVA, A.; STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; MATĚJKA, P.; BURGET, L.; GLEMBEK, O.; BRUMMER, J. Analyzing speaker verification embedding extractors and back-ends under language and channel mismatch. Proceedings of The Speaker and Language Recognition Workshop (Odyssey 2022). Beijing: International Speech Communication Association, 2022.
p. 9-16. Detail - STAFYLAKIS, T.; MOŠNER, L.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; BURGET, L.; ČERNOCKÝ, J. Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Proceedings of Interspeech. Incheon: International Speech Communication Association, 2022.
p. 605-609. ISSN: 1990-9772. Detail
2021
- KOCOUR, M.; CÁMBARA, G.; LUQUE, J.; BONET, D.; FARRÚS, M.; KARAFIÁT, M.; VESELÝ, K.; ČERNOCKÝ, J. BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge. Proceedings of IberSPEECH 2021. Vallaloid: International Speech Communication Association, 2021.
p. 113-117. Detail - LANDINI, F.; LOZANO DÍEZ, A.; BURGET, L.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; ŽMOLÍKOVÁ, K.; GLEMBEK, O.; MATĚJKA, P.; STAFYLAKIS, T.; BRUMMER, J. BUT System Description for The Third DIHARD Speech Diarization Challenge. Proceedings available at Dihard Challenge Github. on-line by LDC and University of Pennsylvania: 2021.
p. 1-5. Detail
2020
- BURGET, L.; GLEMBEK, O.; LOZANO DÍEZ, A.; MATĚJKA, P.; NOVOTNÝ, O.; PLCHOT, O.; PULUGUNDLA, B.; ROHDIN, J.; SILNOVA, A.; VESELÝ, K. BUT System Description to SdSV Challenge 2020. Proceedings of Short-duration Speaker Verification Challenge 2020 Workshop. Shanghai, on-line event of Interspeech 2020 Conference: 2020.
p. 1-5. Detail
2019
- ALAM, J.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; LOZANO DÍEZ, A.; MATĚJKA, P.; MIZERA, P.; MOŠNER, L.; NOVOTNÝ, O.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; SLAVÍČEK, J.; STAFYLAKIS, T.; WANG, S.; ZEINALI, H.; DAHMANE, M.; ST-CHARLES, P.; LALONDE, M.; NOISEUX, C.; MONTEIRO, J. ABC System Description for NIST Multimedia Speaker Recognition Evaluation 2019. Proceedings of NIST 2019 SRE Workshop. Sentosa, Singapore: National Institute of Standards and Technology, 2019.
p. 1-7. Detail - ZEINALI, H.; WANG, S.; SILNOVA, A.; MATĚJKA, P.; PLCHOT, O. BUT System Description to VoxCeleb Speaker Recognition Challenge 2019. Proceedings of The VoxCeleb Challange Workshop 2019. Graz: 2019.
p. 1-4. Detail
- BURGET, L.; BOJAR, O. NEUREM3 Final Research Report. Brno:
p. 1-96. Detail