Finite-state based recognition networks for forward-backward speech decoding

Czech title

Rozpoznávácí sítě založené na konečných stavových převodnících pro dopředné a zpětné dekódování v rozpoznávání řeči

Language

English

Abstract

Many tasks can be formulated in the mathematical framework of weighted finite state transducers (WFST). This is also the case for automatic speech recognition (ASR). Nowadays, ASR makes extensive use of composed probabilistic models -- called decoding graphs or recognition networks. They are constructed from the individual components via WFST operations like composition. Each component is a probabilistic knowledge source that constrains the search for the best path through the composed graph -- called decoding. The usage of a coherent framework guarantees, that the resulting automata will be optimal in a well-defined sense. WFSTs can be optimized with the help of determinization and minimization in a given semi-ring. The application of these algorithms results in the optimal structure for search and the optimal distribution of weights is achieved by applying a weight pushing algorithm. The goal of this thesis is to further develop the recipes and algorithms for the construction of optimal recognition networks. We introduce an alternative weight pushing algorithm, that is suitable for an important class of models -- language model transducers, or more generally cyclic WFSTs and WFSTs with failure (back-off) transitions. We also present a recipe to construct recognition networks, which are suitable for decoding backwards in time, and which, at the same time, are guaranteed to give exactly the same probabilities as the forward recognition network. For that purpose, we develop an algorithm for exact reversal of back-off language models and their corresponding language model transducers. We apply these backward recognition networks in an optimization technique: In a static network decoder, we use it for a two-pass decoding setup (forward search and backward search). This approach is called tracked decoding and allows to incorporate the first pass decoding into the second pass decoding by tracking hypotheses from the first pass lattice. This technique results in significant speed-ups, since it allows to decode with a variable beam width, which is most of the time much smaller than the baseline beam. We also show that it is possible to apply the algorithms in a dynamic network decoder by using the incrementally refining recognition setup. This additionally leads to a partial parallelization of the decoding.

Keywords

Automatic speech recognition, LVCSR decoding, recognition networks, weighted finite state transducers, N-gram language models, weight pushing

Department

Department of Computer Graphics and Multimedia FIT BUT

Degree Programme

Computer Science and Engineering, Field of Study Computer Science and Engineering

Files

Status

defended

Date

26 September 2016

Citation

HANNEMANN, Mirko. Finite-state based recognition networks for forward-backward speech decoding. Brno, 2016. Ph.D. Thesis. Brno University of Technology, Faculty of Information Technology. 2016-09-26. Supervised by Burget Lukáš. Available from: https://www.fit.vut.cz/study/phd-thesis/517/

BibTeX

@phdthesis{FITPT517,
    author = "Mirko Hannemann",
    type = "Ph.D. thesis",
    title = "Finite-state based recognition networks for forward-backward speech decoding",
    school = "Brno University of Technology, Faculty of Information Technology",
    year = 2016,
    location = "Brno, CZ",
    language = "english",
    url = "https://www.fit.vut.cz/study/phd-thesis/517/"
}