Show simple item record

dc.creatorCobo,Ángel
dc.creatorRocha,Rocío
dc.date2011-12-01
dc.date.accessioned2019-04-24T21:28:16Z
dc.date.available2019-04-24T21:28:16Z
dc.identifierhttps://scielo.conicyt.cl/scielo.php?script=sci_arttext&pid=S0718-33052011000300005
dc.identifier.urihttp://revistaschilenas.uchile.cl/handle/2250/58743
dc.descriptionThis paper presents a document representation strategy and a bio-inspired algorithm to cluster multilingual collections of documents in the field of economics and business. The proposed approach allows the user to identify groups of related economics documents written in Spanish and English using techniques inspired on clustering and sorting behaviours observed in some types of ants. In order to obtain a language independent vector representation of each document two multilingual resources are used: an economic glossary and a thesaurus. Each document is represented using four feature vectors: words, proper names, economic terms in the glossary and thesaurus descriptors. The proper name identification, word extraction and lemmatization are performed using specific tools. The tf-idf scheme is used to measure the importance of each feature in the document, and a convex linear combination of angular separations between feature vectors is used as similarity measure of documents. The paper shows experimental results of the application of the proposed algorithm in a Spanish-English corpus of research papers in economics and management areas. The results demonstrate the usefulness and effectiveness of the ant clustering algorithm and the proposed representation scheme.
dc.formattext/html
dc.languageen
dc.publisherUniversidad de Tarapacá.
dc.relation10.4067/S0718-33052011000300005
dc.rightsinfo:eu-repo/semantics/openAccess
dc.sourceIngeniare. Revista chilena de ingeniería v.19 n.3 2011
dc.subjectClustering
dc.subjectant-based algorithms
dc.subjectmultilingual documents
dc.subjecttext mining
dc.subjectdocument management
dc.titleIdentification of related multilingual documents using ant clustering algorithms


This item appears in the following Collection(s)

Show simple item record