Result Details

Semi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data

EGOROVA, E.; SERRANO, J. Semi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data. In Procedia Computer Science. Procedia Computer Science. Yogyakarta: Elsevier Science, 2016. no. 81, p. 114-120. ISSN: 1877-0509.

Type

conference paper

Language

English

Authors

Egorova Ekaterina, Ing., Ph.D., DCGM (FIT)
Serrano Jordi Lugue

Abstract

This article is about semi-supervised training of language model on Spanish conversational telephone speech data.

Keywords

Speech recognition, language modeling, semi-supervised learning

URL

Annotation

This work addresses one of the common issues arising when building a speech recognition system within a low-resourced scenario - adapting the language model on unlabeled audio data. The proposed methodology makes use of such data by means of semisupervised learning. Whilst it has been proven that adding system-generated labeled data for acoustic modeling yields good results, the benefits of adding system-generated sentence hypotheses to the language model are vaguer in the literature. This investigation focuses on the latter by exploring different criteria for picking valuable, well-transcribed sentences. These criteria range from confidence measures at word and sentence level to sentence duration metrics and grammatical structure frequencies. The processing pipeline starts with training a seed speech recognizer using only twenty hours of Fisher Spanish phone call conversations corpus. The proposed procedure attempts to augment this initial system by supplementing it with transcriptions generated automatically from unlabeled data with the use of the seed system. After generating these transcriptions, it is estimated how likely they are, and only the ones with high scores are added to the training data. Experimental results show improvements gained by the use of an augmented language model. Although these improvements are still lesser than those obtained from a system with only acoustic model augmentation, we consider the proposed system (with its low cost in terms of computational resources and the ability for task adaptation) an attractive technique worthy of further exploration.

Published

2016

Pages

114–120

Journal

Procedia Computer Science, vol. 2016, no. 81, ISSN 1877-0509

Proceedings

Procedia Computer Science

Conference

The 5th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU'16)

Publisher

Elsevier Science

Place

Yogyakarta

DOI

10.1016/j.procs.2016.04.038

UT WoS

000387446500016

EID Scopus

2-s2.0-84976431005

BibTeX

@inproceedings{BUT131005,
  author="Ekaterina {Egorova} and Jordi Lugue {Serrano}",
  title="Semi-Supervised Training of Language Model on Spanish Conversational Telephone Speech Data",
  booktitle="Procedia Computer Science",
  year="2016",
  journal="Procedia Computer Science",
  volume="2016",
  number="81",
  pages="114--120",
  publisher="Elsevier Science",
  address="Yogyakarta",
  doi="10.1016/j.procs.2016.04.038",
  issn="1877-0509",
  url="http://www.sciencedirect.com/science/article/pii/S1877050916300527"
}

Files

pdf egorova_sltu2016_22-8042.pdf 172 kB

Projects

Big speech data analytics for contact centers, EU, Horizon 2020, start: 2015-01-01, end: 2017-12-31, completed
Meeting Assistant (MINT), TAČR, Program aplikovaného výzkumu a experimentálního vývoje ALFA, TA04011311, start: 2014-10-01, end: 2017-12-31, completed
Zpracování, rozpoznávání a zobrazování multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-14-2506, start: 2014-01-01, end: 2016-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)