Internet traffic has increased dramatically in recent years due to the popularization of the Internet and the appearance of wireless Internet mobile devices such as smart-phones and tablets. The explosive growth of Internet traffic has introduced a practical example that demonstrates the concept of Big Data. Accurate identification and classification of large network traffic data plays an important role in network management including capacity planning, network forensics, QoS and intrusion detection. However, the state-of-the-art solutions, which rely on a dedicated server, are not scalable for analyzing high volume network traffic data. In this paper, we implement a distributed Support Vector Machines (SVMs) framework for classifying network traffic using Hadoop, an open-source distributed computing framework for Big Data processing. We design a global parameter store that maintains the global shared parameters between SVM training nodes. The distributed SVMs have been deployed on a 20 node cluster to analyze real network traffic trace. The results demonstrate that with 19 Mapper nodes the system is around 30% faster than Cloud SVM solution and outperforms the standalone SVM with nearly 9 times faster in training process and 15 times in the classifying process. In addition, the distributed SVMs architecture is designed to analyze large scale datasets. Therefore, it can be used not only for processing network traffic dataset, but also other large scale datasets such as Web data.

Scalable Network Traffic Classification Using Distributed Support Vector Machines

D'ALESSANDRO, VALERIO;ROMANO, LUIGI;
2015-01-01

Abstract

Internet traffic has increased dramatically in recent years due to the popularization of the Internet and the appearance of wireless Internet mobile devices such as smart-phones and tablets. The explosive growth of Internet traffic has introduced a practical example that demonstrates the concept of Big Data. Accurate identification and classification of large network traffic data plays an important role in network management including capacity planning, network forensics, QoS and intrusion detection. However, the state-of-the-art solutions, which rely on a dedicated server, are not scalable for analyzing high volume network traffic data. In this paper, we implement a distributed Support Vector Machines (SVMs) framework for classifying network traffic using Hadoop, an open-source distributed computing framework for Big Data processing. We design a global parameter store that maintains the global shared parameters between SVM training nodes. The distributed SVMs have been deployed on a 20 node cluster to analyze real network traffic trace. The results demonstrate that with 19 Mapper nodes the system is around 30% faster than Cloud SVM solution and outperforms the standalone SVM with nearly 9 times faster in training process and 15 times in the classifying process. In addition, the distributed SVMs architecture is designed to analyze large scale datasets. Therefore, it can be used not only for processing network traffic dataset, but also other large scale datasets such as Web data.
2015
9781467372879
9781467372879
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11367/52578
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 12
social impact