Thesis Details

Learning Speech Separation Using Spatial Cues

Bachelor's Thesis Student: Pavlus Ján Academic Year: 2019/2020 Supervisor: Žmolíková Kateřina, Ing., Ph.D.

Czech title

Učení separace řečníků pomocí prostorové informace

Language

English

Abstract

This thesis discusses the idea of using spatial cues in speech separation for estimating target masks, that is stated in article \textit{Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures}. This idea may make it possible to use real-world mixtures for the training of speech separation systems, which use neural networks. In the thesis two training methods, permutation invariant training and deep clustering method are mentioned and used for experiments with training neural networks using target masks estimated by spatial cues. The result of the work is a comparison of the results of these experiments with the results of the above-mentioned article. This comparison showed that the use of estimated masks with the help of spatial information can lead to a quality training of the speaker separation system.

Keywords

Speech separation, deep clustering, spatial cues, machine learning, neural networks, long-short term memory

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology

Files

Status

defended, grade C

Date

10 July 2020

Reviewer

Mošner Ladislav, Ing.

Committee

Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
Rychlý Marek, RNDr., Ph.D. (DIFS FIT BUT), člen

Citation

PAVLUS, Ján. Learning Speech Separation Using Spatial Cues. Brno, 2020. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2020-07-10. Supervised by Žmolíková Kateřina. Available from: https://www.fit.vut.cz/study/thesis/23153/

BibTeX

@bachelorsthesis{FITBT23153,
    author = "J\'{a}n Pavlus",
    type = "Bachelor's thesis",
    title = "Learning Speech Separation Using Spatial Cues",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2020,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/23153/"
}

Theses