Detail výsledku

Residual Memory Networks: Feed-forward approach to learn long-term temporal dependencies

BASKAR, M.; KARAFIÁT, M.; BURGET, L.; VESELÝ, K.; GRÉZL, F.; ČERNOCKÝ, J. Residual Memory Networks: Feed-forward approach to learn long-term temporal dependencies. In Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017. p. 4810-4814. ISBN: 978-1-5090-4117-6.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Baskar Murali Karthick, Ing., Ph.D., UPGM (FIT)
Karafiát Martin, Ing., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Veselý Karel, Ing., Ph.D., UPGM (FIT)
Grézl František, Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)

Abstrakt

Training deep recurrent neural network (RNN) architectures iscomplicated due to the increased network complexity. This disruptsthe learning of higher order abstracts using deep RNN. Incase of feed-forward networks training deep structures is simpleand faster while learning long-term temporal information isnot possible. In this paper we propose a residual memory neuralnetwork (RMN) architecture to model short-time dependenciesusing deep feed-forward layers having residual and time delayedconnections. The residual connection paves way to constructdeeper networks by enabling unhindered flow of gradientsand the time delay units capture temporal information withshared weights. The number of layers in RMN signifies both thehierarchical processing depth and temporal depth. The computationalcomplexity in training RMN is significantly less whencompared to deep recurrent networks. RMN is further extendedas bi-directional RMN (BRMN) to capture both past and futureinformation. Experimental analysis is done on AMI corpus tosubstantiate the capability of RMN in learning long-term informationand hierarchical information. Recognition performanceof RMN trained with 300 hours of Switchboard corpus is comparedwith various state-of-the-art LVCSR systems. The resultsindicate that RMN and BRMN gains 6 % and 3.8 % relativeimprovement over LSTM and BLSTM networks.

Klíčová slova

Automatic speech recognition, LSTM, RNN,Residual memory networks.

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2017/baskar… PDF

Rok

2017

Strany

4810–4814

Sborník

Proceedings of ICASSP 2017

Konference

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)

ISBN

978-1-5090-4117-6

Vydavatel

IEEE Signal Processing Society

Místo

New Orleans

DOI

10.1109/ICASSP.2017.7953070

UT WoS

000414286204194

EID Scopus

2-s2.0-85023739371

BibTeX

@inproceedings{BUT144448,
  author="Murali Karthick {Baskar} and Martin {Karafiát} and Lukáš {Burget} and Karel {Veselý} and František {Grézl} and Jan {Černocký}",
  title="Residual Memory Networks: Feed-forward approach to learn long-term temporal dependencies",
  booktitle="Proceedings of ICASSP 2017",
  year="2017",
  pages="4810--4814",
  publisher="IEEE Signal Processing Society",
  address="New Orleans",
  doi="10.1109/ICASSP.2017.7953070",
  isbn="978-1-5090-4117-6",
  url="https://www.fit.vut.cz/research/publication/11467/"
}

Soubory

pdf baskar_icassp2017_0004810.pdf 245 kB

Projekty

Analytika velkých řečových dat pro kontaktní centra, EU, Horizon 2020, zahájení: 2015-01-01, ukončení: 2017-12-31, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)