IMPROVING THE PERFORMANCE OF ANTI-SPAM FILTERS USING OUT-OF-VOCABULARY STATISTICS
Author
Agüero,Pablo Daniel
Castiñeira Moreira,Jorge
Liberatori,Monica
Bonadero,Juan Carlos
Tulli,Juan Carlos
Abstract
This paper presents a feature based on out-of-vocabulary word statistics that complements the information sources used in the decision by state-of-the-art spam filters. The experiments included freely available spam filters as reference, SpamAssassin, Bogofilter, SpamBayes and SpamProbe, as well as a Naive Bayes classifier. The results show that the decision based on the proposed feature improves the performance of all spam filters under study.