The paper aims to present a Spam Detection system by a Content Analysis based on Machine Leaning. The system is composed of six units: Tokenization and Cleaning words, Lemmatization, Stopping Word Removal and Synonym Replacement, Term Selection, Bag-of-Words Representer, and Classifier. Experiments performed on two different datasets, i.e., SpamAssassin and Trec2007 show satisfactory results, comparable with the state of the art.
|Titolo:||Spam Detection by Machine Learning-Based Content Analysis|
CAMASTRA, Francesco (Corresponding)
|Data di pubblicazione:||2020|
|Appare nelle tipologie:||2.1 Contributo in volume (Capitolo o Saggio)|