Thesis Details

Odhad obličeje z řečového signálu

Bachelor's Thesis Student: Krušina Josef Academic Year: 2021/2022 Supervisor: Plchot Oldřich, Ing., Ph.D.
English title
Learning the Face Behind a Voice

This work addresses the problem of mapping fixed representations (embeddings) of a speech signal to face embeddings and then generating a face from the mapped embedding using a generative adversarial network (GAN) that was trained for face generation. GANs are a type of neural networks that can generate data similar to the data they were trained on. The architecture of the proposed system is based on four components: a face embedding extractor, a voice embedding extractor, an algorithm on top of a GAN that can generate a face from a face embedding, and my mapping network used to map a voice embedding to a face embedding. The pre-trained neural networks FaceNet and SpeechBrain are adopted as embedding extractors. A model that uses a pre-trained StyleGAN2 is adopted for backward face generation. The contribution of this work is that it allows the extrapolation of a face from audio signal only.


Feature extraction, Mapping, Embedding, FaceNet, SpeechBrain, StyleGAN2

Degree Programme
defended, grade C
15 June 2022
Černocký Jan, prof. Dr. Ing. (DCGM FIT BUT), předseda
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
