Publication Details

Multi-Channel Speech Separation with Cross-Attention and Beamforming

MOŠNER Ladislav, PLCHOT Oldřich, PENG Junyi, BURGET Lukáš and ČERNOCKÝ Jan. Multi-Channel Speech Separation with Cross-Attention and Beamforming. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Dublin: International Speech Communication Association, 2023, pp. 1693-1697. ISSN 1990-9772. Available from: https://www.isca-speech.org/archive/interspeech_2023/mosner23_interspeech.html
Czech title
Vícekanálová separace řeči s cross-attention a beamformingem
Type
conference paper
Language
english
Authors
URL
Keywords

multi-channel source separation, cross-channel attention, beamforming

Abstract

Originally, single-channel source separation gained more research interest. It resulted in immense progress. Multichannel (MC) separation comes with new challenges posed by adverse indoor conditions making it an important field of study. We seek to combine promising ideas from the two worlds. First, we build MC models by extending current single-channel time-domain separators relying on their strength. Our approach allows reusing pre-trained models by inserting designed lightweight reference channel attention (RCA) combiner, the only trained module. It comprises two blocks: the former allows attending to different parts of other channels w.r.t. the reference one, and the latter provides an attention-based combination of channels. Second, like many successful MC models, our system incorporates beamforming and allows for the fusion of the network and beamformer outputs. We compare our approach with the SOTA models on the SMS-WSJ dataset and show better or similar performance.

Published
2023
Pages
1693-1697
Journal
Proceedings of Interspeech - on-line, vol. 2023, no. 8, ISSN 1990-9772
Proceedings
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Conference
Interspeech Conference, Dublin, IE
Publisher
International Speech Communication Association
Place
Dublin, IE
DOI
EID Scopus
BibTeX
@INPROCEEDINGS{FITPUB13108,
   author = "Ladislav Mo\v{s}ner and Old\v{r}ich Plchot and Junyi Peng and Luk\'{a}\v{s} Burget and Jan \v{C}ernock\'{y}",
   title = "Multi-Channel Speech Separation with Cross-Attention and Beamforming",
   pages = "1693--1697",
   booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
   journal = "Proceedings of Interspeech - on-line",
   volume = 2023,
   number = 08,
   year = 2023,
   location = "Dublin, IE",
   publisher = "International Speech Communication Association",
   ISSN = "1990-9772",
   doi = "10.21437/Interspeech.2023-2537",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/13108"
}
Back to top