This paper presents a feature based on out-of-vocabulary word statistics that complements the information sources used in the decision by state-of-the-art spam filters. The experiments included freely available spam filters as reference, SpamAssassin, Bogofilter, SpamBayes and SpamProbe, as well as a Naive Bayes classifier. The results show that the decision based on the proposed feature improves the performance of all spam filters under study.
Universidad de Tarapacá.
Ingeniare. Revista chilena de ingeniería v.17 n.3 2009
IMPROVING THE PERFORMANCE OF ANTI-SPAM FILTERS USING OUT-OF-VOCABULARY STATISTICS