An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm

Motavallian, Rezvan; Komeily, Masoud

dc.creator	Motavallian, Rezvan
dc.creator	Komeily, Masoud
dc.date	2023-11-06
dc.date.accessioned	2024-11-19T15:17:33Z
dc.date.available	2024-11-19T15:17:33Z
dc.identifier	https://onomazein.letras.uc.cl/index.php/onom/article/view/69745
dc.identifier	10.7764/onomazein.61.09
dc.identifier.uri	https://revistaschilenas.uchile.cl/handle/2250/246317
dc.description	In this article, we are going to introduce an automatic mechanism to intelligently extend the training set to improve the n-gram language model of Persian. Given the free word-order property in Persian, our enrichment algorithm diversifies n-gram combinations in baseline training data through dependency reordering, adding permissible sentences and filtering ungrammatical sentences using a hybrid empirical (heuristic) and linguistic approach. Experiments performed on baseline training set (taken from a standard Persian corpus) and the resulting enriched training set indicate a declining trend in average relative perplexity (between 34% to 73%) for informal/spoken vs. formal/written Persian test data.	en-US
dc.format	application/pdf
dc.language	eng
dc.publisher	Facultad de Letras de la Pontificia Universidad Católica de Chile	es-ES
dc.relation	https://onomazein.letras.uc.cl/index.php/onom/article/view/69745/54195
dc.rights	https://creativecommons.org/licenses/by/4.0	es-ES
dc.source	Onomázein ; No. 61 (2023): September; 191-211	en-US
dc.source	Onomázein ; Núm. 61 (2023): Septiembre; 191-211	es-ES
dc.source	0718-5758
dc.subject	training corpus	en-US
dc.subject	n-gram language model	en-US
dc.subject	dependency parsing	en-US
dc.subject	enrichment algorithm	en-US
dc.subject	free word-order	en-US
dc.title	An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm	en-US
dc.title	An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm	es-ES
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/publishedVersion

This item appears in the following Collection(s)

Onomázein: Revista de Linguística, Filología y Traducción
[0-9]{4}

Show simple item record