Result Details

Deep Auto-encoder Based Multi-task Learning Using Probabilistic Transcriptions

DAS, A.; HASEGAWA-JOHNSON, M.; VESELÝ, K. Deep Auto-encoder Based Multi-task Learning Using Probabilistic Transcriptions. In Proceedings of Interspeech 2017. Proceedings of Interspeech. Stockholm: International Speech Communication Association, 2017. no. 08, p. 2073-2077. ISSN: 1990-9772.

Type

conference paper

Language

English

Authors

Das Amit
Hasegawa-Johnson Mark
Veselý Karel, Ing., Ph.D., FIT (FIT), DCGM (FIT)

Abstract

This article is about deep auto-encoder based Multi-task Learning using probabilistic transcriptions.

Keywords

cross-lingual speech recognition, probabilistictranscription, deep neural networks, multi-task learning

URL

Annotation

We examine a scenario where we have no access to native transcribers in the target language. This is typical of language communities that are under-resourced. However, turkers (online crowd workers) available in online marketplaces can serve as valuable alternative resources for providing transcripts in the target language. We assume that the turkers neither speak nor have any familiarity with the target language. Thus, they are unable to distinguish all phone pairs in the target language; their transcripts therefore specify, at best, a probability distribution called a probabilistic transcript (PT). Standard deep neural network (DNN) training using PTs do not necessarily improve error rates. Previously reported results have demonstrated some success by adopting the multi-task learning (MTL) approach. In this study, we report further improvements by introducing a deep auto-encoder based MTL. This method leverages large amounts of untranscribed data in the target language in addition to the PTs obtained from turkers. Furthermore, to encourage transfer learning in the feature space, we also examine the effect of using monophones from transcripts in well-resourced languages. We report consistent improvement in phone error rates (PER) for Swahili, Amharic, Dinka, and Mandarin.

Published

2017

Pages

2073–2077

Journal

Proceedings of Interspeech, vol. 2017, no. 08, ISSN 1990-9772

Proceedings

Proceedings of Interspeech 2017

Conference

Interspeech Conference

Publisher

International Speech Communication Association

Place

Stockholm

DOI

10.21437/Interspeech.2017-582

UT WoS

000457505000434

EID Scopus

2-s2.0-85039159851

BibTeX

@inproceedings{BUT144494,
  author="Amit {Das} and Mark {Hasegawa-Johnson} and Karel {Veselý}",
  title="Deep Auto-encoder Based Multi-task Learning Using Probabilistic Transcriptions",
  booktitle="Proceedings of Interspeech 2017",
  year="2017",
  journal="Proceedings of Interspeech",
  volume="2017",
  number="08",
  pages="2073--2077",
  publisher="International Speech Communication Association",
  address="Stockholm",
  doi="10.21437/Interspeech.2017-582",
  issn="1990-9772",
  url="http://www.isca-speech.org/archive/Interspeech_2017/pdfs/0582.PDF"
}

Projects

Zpracování, zobrazování a analýza multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-17-3984, start: 2017-03-01, end: 2020-02-29, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)