Thesis Details
Odhad obličeje z řečového signálu
This work addresses the problem of mapping fixed representations (embeddings) of a speech signal to face embeddings and then generating a face from the mapped embedding using a generative adversarial network (GAN) that was trained for face generation. GANs are a type of neural networks that can generate data similar to the data they were trained on. The architecture of the proposed system is based on four components: a face embedding extractor, a voice embedding extractor, an algorithm on top of a GAN that can generate a face from a face embedding, and my mapping network used to map a voice embedding to a face embedding. The pre-trained neural networks FaceNet and SpeechBrain are adopted as embedding extractors. A model that uses a pre-trained StyleGAN2 is adopted for backward face generation. The contribution of this work is that it allows the extrapolation of a face from audio signal only.
Feature extraction, Mapping, Embedding, FaceNet, SpeechBrain, StyleGAN2
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT), člen
Češka Milan, doc. RNDr., Ph.D. (DITS FIT BUT), člen
Jaroš Jiří, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Orság Filip, Ing., Ph.D. (DITS FIT BUT), člen
@bachelorsthesis{FITBT24895, author = "Josef Kru\v{s}ina", type = "Bachelor's thesis", title = "Odhad obli\v{c}eje z \v{r}e\v{c}ov\'{e}ho sign\'{a}lu", school = "Brno University of Technology, Faculty of Information Technology", year = 2022, location = "Brno, CZ", language = "czech", url = "https://www.fit.vut.cz/study/thesis/24895/" }