Thesis Details

The Best Possible Speech Recognizer on Your Own Data

Master's Thesis Student: Sýkora Tomáš Academic Year: 2019/2020 Supervisor: Szőke Igor, Ing., Ph.D.

Czech title

Co nejlepší rozpoznávač řeči na vlastních datech

Language

English

Abstract

Many state-of-the-art results in different machine learning areas are presented on day-to-day basis. By adjusting these systems to perform perfectly on a specific subset of all general data, huge improvements may be achieved in their resulting accuracy. Usage of domain adaptation in automatic speech recognition can bring us to production level models capable of transcribing difficult and noisy customer conversations way more accurately than the general models trained on all kinds of language and speech data. In this work I present 17% word error rate improvement in our speech recognition task over the general domain speech recognizer from Google. The improvement was achieved by both very precise annotation and preparation of domain data and by combining state-of-the-art techniques and algorithms. The described system was successfully integrated into a production environment of the Parrot transcription company, where I am a member of the initial team, which drastically increased performance of the human transcribers.

Keywords

automatic speech recognition, domain data, kaldi, dataset, speech data cleaning

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Information Technology, Field of Study Intelligent Systems

Files

Status

defended, grade A

Date

14 July 2020

Reviewer

Veselý Karel, Ing., Ph.D.

Committee

Rogalewicz Adam, doc. Mgr., Ph.D. (DITS FIT BUT), předseda
Bidlo Michal, doc. Ing., Ph.D. (DCSY FIT BUT), člen
Burget Lukáš, doc. Ing., Ph.D. (DCGM FIT BUT), člen
Hradiš Michal, Ing., Ph.D. (DCGM FIT BUT), člen
Hrubý Martin, Ing., Ph.D. (DITS FIT BUT), člen
Rozman Jaroslav, Ing., Ph.D. (DITS FIT BUT), člen

Citation

SÝKORA, Tomáš. The Best Possible Speech Recognizer on Your Own Data. Brno, 2020. Master's Thesis. Brno University of Technology, Faculty of Information Technology. 2020-07-14. Supervised by Szőke Igor. Available from: https://www.fit.vut.cz/study/thesis/18056/

BibTeX

@mastersthesis{FITMT18056,
    author = "Tom\'{a}\v{s} S\'{y}kora",
    type = "Master's thesis",
    title = "The Best Possible Speech Recognizer on Your Own Data",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2020,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/thesis/18056/"
}

Theses