Thesis Details
Learning Speech Separation Using Spatial Cues
This thesis discusses the idea of using spatial cues in speech separation for estimating target masks, that is stated in article \textit{Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures}. This idea may make it possible to use real-world mixtures for the training of speech separation systems, which use neural networks. In the thesis two training methods, permutation invariant training and deep clustering method are mentioned and used for experiments with training neural networks using target masks estimated by spatial cues. The result of the work is a comparison of the results of these experiments with the results of the above-mentioned article. This comparison showed that the use of estimated masks with the help of spatial information can lead to a quality training of the speaker separation system.
Speech separation, deep clustering, spatial cues, machine learning, neural networks, long-short term memory
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen
@bachelorsthesis{FITBT23153, author = "J\'{a}n Pavlus", type = "Bachelor's thesis", title = "Learning Speech Separation Using Spatial Cues", school = "Brno University of Technology, Faculty of Information Technology", year = 2020, location = "Brno, CZ", language = "english", url = "https://www.fit.vut.cz/study/thesis/23153/" }