The paper aims to present a Spam Detection system by a Content Analysis based on Machine Leaning. The system is composed of six units: Tokenization and Cleaning words, Lemmatization, Stopping Word Removal and Synonym Replacement, Term Selection, Bag-of-Words Representer, and Classifier. Experiments performed on two different datasets, i.e., SpamAssassin and Trec2007 show satisfactory results, comparable with the state of the art.

Spam Detection by Machine Learning-Based Content Analysis

Camastra F.
;
Ciaramella A.;Staiano A.
2020-01-01

Abstract

The paper aims to present a Spam Detection system by a Content Analysis based on Machine Leaning. The system is composed of six units: Tokenization and Cleaning words, Lemmatization, Stopping Word Removal and Synonym Replacement, Term Selection, Bag-of-Words Representer, and Classifier. Experiments performed on two different datasets, i.e., SpamAssassin and Trec2007 show satisfactory results, comparable with the state of the art.
2020
978-981-15-5092-8
978-981-15-5093-5
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11367/87250
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact