An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm
An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm
dc.creator | Motavallian, Rezvan | |
dc.creator | Komeily, Masoud | |
dc.date | 2023-11-06 | |
dc.date.accessioned | 2024-11-19T15:17:33Z | |
dc.date.available | 2024-11-19T15:17:33Z | |
dc.identifier | https://onomazein.letras.uc.cl/index.php/onom/article/view/69745 | |
dc.identifier | 10.7764/onomazein.61.09 | |
dc.identifier.uri | https://revistaschilenas.uchile.cl/handle/2250/246317 | |
dc.description | In this article, we are going to introduce an automatic mechanism to intelligently extend the training set to improve the n-gram language model of Persian. Given the free word-order property in Persian, our enrichment algorithm diversifies n-gram combinations in baseline training data through dependency reordering, adding permissible sentences and filtering ungrammatical sentences using a hybrid empirical (heuristic) and linguistic approach. Experiments performed on baseline training set (taken from a standard Persian corpus) and the resulting enriched training set indicate a declining trend in average relative perplexity (between 34% to 73%) for informal/spoken vs. formal/written Persian test data. | en-US |
dc.format | application/pdf | |
dc.language | eng | |
dc.publisher | Facultad de Letras de la Pontificia Universidad Católica de Chile | es-ES |
dc.relation | https://onomazein.letras.uc.cl/index.php/onom/article/view/69745/54195 | |
dc.rights | https://creativecommons.org/licenses/by/4.0 | es-ES |
dc.source | Onomázein ; No. 61 (2023): September; 191-211 | en-US |
dc.source | Onomázein ; Núm. 61 (2023): Septiembre; 191-211 | es-ES |
dc.source | 0718-5758 | |
dc.subject | training corpus | en-US |
dc.subject | n-gram language model | en-US |
dc.subject | dependency parsing | en-US |
dc.subject | enrichment algorithm | en-US |
dc.subject | free word-order | en-US |
dc.title | An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm | en-US |
dc.title | An intelligent extension of the training set for the Persian n-gram language model: an enrichment algorithm | es-ES |
dc.type | info:eu-repo/semantics/article | |
dc.type | info:eu-repo/semantics/publishedVersion |