The Project aim is to analyze IP network traffic flows to predict application layer protocol (specific application) such as Facebook, YouTube, and Instagram.
The dataset can be found here, the dataset contains 87 features. Each instance holds the information of an IP flow generated by a network device i.e., source and destination IP addresses, ports, interarrival times, layer 7 protocol (application) used on the flow that we want to predict class.
For more details go to the project blog post
Considering that most of the network traffic classification datasets are aimed only at identifying the type of application an IP flow holds (WWW, DNS, FTP, P2P, Telnet,etc), this dataset goes a step further by generating machine learning models capable of detecting specific applications such as Facebook, YouTube, Instagram, etc, from IP flow statistics (currently 75 applications).
- keras 2.2.4+
- sklearn 0.21.2+
- numpy 1.16.4+
- seaborn 0.9.0+
- pandas 0.25.0+
- matplotlib 3.1.0+
- /docs folder contain project blog doc and images
ip-flow-analysis.ipynb
is the notebook where the analysis happenmodel.h5
is the deep learning model can be generated from the notebookDataset-Unicauca-Version2-87Atts.csv
is the dataset should be downloaded from here
The conclusion of our analysis is that we can identify the type of IP flow application with 66% accuracy, for more details go to the project blog post
We can improve the model by
- using more features that we have dropped
- extract new features like (Is the flow for ingoing traffic or outgoing? Is the port is privileged or not?)
- aggregate flows by connection
I would like to thank Juan Sebastián Rojas and Universidad Del Cauca for providing this dataset