Result Details

PCA-based Feature Extraction for Phonotactic Language Recognition

MIKOLOV, T.; PLCHOT, O.; GLEMBEK, O.; MATĚJKA, P.; BURGET, L.; ČERNOCKÝ, J. PCA-based Feature Extraction for Phonotactic Language Recognition. In Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop. Brno: International Speech Communication Association, 2010. p. 251-255. ISBN: 978-80-214-4114-9.

Type

conference paper

Language

English

Authors

Mikolov Tomáš, Ing., Ph.D., DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Matějka Pavel, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

This paper is on PCA-based Feature Extraction for Phonotactic Language Recognition. This technique improves speed of the training, in some cases more than 1000 times.

Keywords

speech, language recognition, automatic recognition, large amounts of data.

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2010/mikolov_odys2010…

Annotation

Phonotactic language recognition is one of major techniques used for automatic recognition of spoken languages. We propose a feature extraction technique based on PCA to be used with SVM-based systems. This technique improves speed of the training, in some cases more than 1000 times, allowing systems to be effectively trained on much larger data sets. Speed-up of the test phase can be even greater, which makes the resulting systems much more useful for processing large amounts of data. We report our results on NIST LRE 2009 task.

Published

2010

Pages

251–255

Proceedings

Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop

Conference

The Speaker and Language Recognition Workshop

ISBN

978-80-214-4114-9

Publisher

International Speech Communication Association

Place

Brno

EID Scopus

2-s2.0-85073108716

BibTeX

@inproceedings{BUT34853,
  author="Tomáš {Mikolov} and Oldřich {Plchot} and Ondřej {Glembek} and Pavel {Matějka} and Lukáš {Burget} and Jan {Černocký}",
  title="PCA-based Feature Extraction for Phonotactic Language Recognition",
  booktitle="Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop",
  year="2010",
  pages="251--255",
  publisher="International Speech Communication Association",
  address="Brno",
  isbn="978-80-214-4114-9",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_odys2010.pdf"
}

Projects

EOARD - Improving the capacity of language recognition systems to handle rare languages using radio broadcast data, start: 2008-10-15, end: 2010-12-14, completed
Mobile Biometry, MŠMT, Podpora projektů sedmého rámcového programu Evropského společenství pro výzkum, technologický rozvoj a demonstrace (2007 až 2013) podle zákona č. 171/2007 Sb., 7E08042, start: 2008-01-01, end: 2010-12-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)