Thesis Details

Odhad obličeje z řečového signálu

Bachelor's Thesis Student: Krušina Josef Academic Year: 2021/2022 Supervisor: Plchot Oldřich, Ing., Ph.D.

English title

Learning the Face Behind a Voice

Language

Czech

Abstract

This work addresses the problem of mapping fixed representations (embeddings) of a speech signal to face embeddings and then generating a face from the mapped embedding using a generative adversarial network (GAN) that was trained for face generation. GANs are a type of neural networks that can generate data similar to the data they were trained on. The architecture of the proposed system is based on four components: a face embedding extractor, a voice embedding extractor, an algorithm on top of a GAN that can generate a face from a face embedding, and my mapping network used to map a voice embedding to a face embedding. The pre-trained neural networks FaceNet and SpeechBrain are adopted as embedding extractors. A model that uses a pre-trained StyleGAN2 is adopted for backward face generation. The contribution of this work is that it allows the extrapolation of a face from audio signal only.

Keywords

Feature extraction, Mapping, Embedding, FaceNet, SpeechBrain, StyleGAN2

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology

Files

Status

defended, grade C

Date

15 June 2022

Reviewer

Matějka Pavel, Ing., Ph.D.

Committee

Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, prof. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen

Citation

KRUŠINA, Josef. Odhad obličeje z řečového signálu. Brno, 2022. Bachelor's Thesis. Brno University of Technology, Faculty of Information Technology. 2022-06-15. Supervised by Plchot Oldřich. Available from: https://www.fit.vut.cz/study/thesis/24895/

BibTeX

@bachelorsthesis{FITBT24895,
    author = "Josef Kru\v{s}ina",
    type = "Bachelor's thesis",
    title = "Odhad obli\v{c}eje z \v{r}e\v{c}ov\'{e}ho sign\'{a}lu",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2022,
    location = "Brno, CZ",
    language = "czech",
    url = "https://www.fit.vut.cz/study/thesis/24895/"
}

Theses