Publication Details

Nalezení slovních kořenů v češtině

CHMELAŘ Petr, HELLEBRAND David, HRUŠECKÝ Michal and BARTÍK Vladimír. Nalezení slovních kořenů v češtině. CEUR Workshop Proceedings, vol. 2011, no. 802, p. 12. ISSN 1613-0073. Available from: http://www.ceur-ws.org/Vol-802
English title
Czech Stemming Algorithm
Type
journal article
Language
czech
Authors
Chmelař Petr, Ing. (DIFS FIT BUT)
Hellebrand David, Ing. (FIT BUT)
Hrušecký Michal (MFF CUNI)
Bartík Vladimír, Ing., Ph.D. (DIFS FIT BUT)
URL
Keywords

Lemmatization, stemmization, Snowball, Czech, grammar.

Abstract

The goal was to create an algorithm for stemming Czech language based on
grammatical rules, in addition to methods using vocabulary for retrieval and
mining of Czech texts. The article includes the basics of Czech word formation
for different word classes, description of problems and several stemming and lemmatization algorithms. The main contribution of this work is the implementation
of the Snowball stemming algorithm for the Czech language based on complete
sets of all prefixes and suffixes, which may occur in Czech words.

Published
2011
Pages
12
Journal
CEUR Workshop Proceedings, vol. 2011, no. 802, ISSN 1613-0073
Book
Selected papers from the 10th annual Czech and Slovak knowledge technology conference (Znalosti 2011)
Publisher
Aachen University of Technology
Place
Aachen, DE
BibTeX
@ARTICLE{FITPUB9952,
   author = "Petr Chmela\v{r} and David Hellebrand and Michal Hru\v{s}eck\'{y} and Vladim\'{i}r Bart\'{i}k",
   title = "Nalezen\'{i} slovn\'{i}ch ko\v{r}en\r{u} v \v{c}e\v{s}tin\v{e}",
   pages = 12,
   booktitle = "Selected papers from the 10th annual Czech and Slovak knowledge technology conference (Znalosti 2011)",
   journal = "CEUR Workshop Proceedings",
   volume = 2011,
   number = 802,
   year = 2011,
   location = "Aachen, DE",
   publisher = "Aachen University of Technology",
   ISSN = "1613-0073",
   language = "czech",
   url = "https://www.fit.vut.cz/research/publication/9952"
}
Back to top