Detail výsledku

Learning document representations using subspace multinomial model

KESIRAJU, S.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J. Learning document representations using subspace multinomial model. In Proceedings of Interspeech 2016. San Francisco: International Speech Communication Association, 2016. p. 700-704. ISBN: 978-1-5108-3313-5.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Kesiraju Santosh, Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Szőke Igor, Ing., Ph.D., UPGM (FIT)
Černocký Jan, prof. Dr. Ing., UPGM (FIT)

Abstrakt

Subspace multinomial model (SMM) is a log-linear model andcan be used for learning low dimensional continuous representationfor discrete data. SMMand its variants have been used forspeaker verification based on prosodic features and phonotacticlanguage recognition. In this paper, we propose a new variantof SMM that introduces sparsity and call the resulting modelas `1 SMM. We show that `1 SMM can be used for learningdocument representations that are helpful in topic identificationor classification and clustering tasks. Our experiments in documentclassification show that SMM achieves comparable resultsto models such as latent Dirichlet allocation and sparse topicalcoding, while having a useful property that the resulting documentvectors are Gaussian distributed.

Klíčová slova

Document representation, subspace modelling,topic identification, latent topic discovery

URL

Rok

2016

Strany

700–704

Sborník

Proceedings of Interspeech 2016

Konference

Interspeech Conference

ISBN

978-1-5108-3313-5

Vydavatel

International Speech Communication Association

Místo

San Francisco

DOI

10.21437/Interspeech.2016-1634

UT WoS

000409394400145

EID Scopus

2-s2.0-84994212710

BibTeX

@inproceedings{BUT132598,
  author="Santosh {Kesiraju} and Lukáš {Burget} and Igor {Szőke} and Jan {Černocký}",
  title="Learning document representations using subspace multinomial model",
  booktitle="Proceedings of Interspeech 2016",
  year="2016",
  pages="700--704",
  publisher="International Speech Communication Association",
  address="San Francisco",
  doi="10.21437/Interspeech.2016-1634",
  isbn="978-1-5108-3313-5",
  url="https://www.researchgate.net/publication/307889473_Learning_Document_Representations_Using_Subspace_Multinomial_Model"
}

Soubory

pdf kesiraju_interspeech2016_IS161634.pdf 281 kB

Projekty

DARPA Jazyky s omezenými zdroji pro potenciální krizové situace (LORELEI) - Využití jazykové informace pro situační povědomí (ELISA, University of Southern California, zahájení: 2015-09-01, ukončení: 2020-03-31, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)