Jan Chorowski: Representation learning for speech and handwriting

Place

FIT VUT, Božetěchova 2, 612 00 Brno, CZ

Organiser

Department of Computer Graphics and Multimedia FIT BUT

Type

seminar

Access

free

URL

http://bit.ly/2tLHiVf

Description

VGS Invited Talks @ FIT
Jan Chorowski: Representation learning for speech and handwriting
The talk takes place on Friday, January 10, 2020 at 13:00 in room A112.

Jan Chorowski is an Associate Professor at Faculty of Mathematics and Computer Science at the University of Wrocław and Head of AI at NavAlgo. He received his M.Sc. degree in electrical engineering from the Wrocław University of Technology, Poland and EE PhD from the University of Louisville, Kentucky in 2012. He has worked with several research teams, including Google Brain, Microsoft Researchand Yoshua Bengios lab at the University of Montreal. He has led a research topic during the JSALT 2019 workshop. His research interests are applications of neural networks to problems which are intuitive and easy for humans and difficult for machines, such as speech and natural language processing.

Representation learning for speech and handwriting

Learning representations of data in an unsupervised way is still an open problem of machine learning. We consider representations of speech and handwriting learned using autoencoders equipped with autoregressive decoders such as WeveNets or PixelCNNs. In those autoencoders, the encoder only needs to provide the little information needed to supplement all that can be inferred by the autoregressive decoder. This allows learning a representation able to capture high level semantic content from the signal, e.g. phoneme or character identities, while being invariant to confounding low level details in the signal such as the underlying pitch contour or background noise. I will show how the design choices of the autoencoder, such as the bottleneck kind its hyperparameters impact the induced latent representation. I will also show applications to unsupervised acoustic unit discovery on the ZeroSpeech task. Finally, Ill show how knowledge about the average unit duration can be enforced during training ,as well as during inference on new data.