BEST代写-线上留学生作业代写 & 论文代写专家


C++代写 | COMP SCI 4094/4194/7094 – Distributed Databases and Data Mining Assignment 3

C++代写 | COMP SCI 4094/4194/7094 – Distributed Databases and Data Mining Assignment 3


The assignment

In this assignment you are required to code a traffic packet clustering engine to cluster the raw
network packet to different applications, such as http, smtp. To accomplish this assignment, a
data preprocessing module and a clustering module should be implemented.

You will have two input files, and you should print two(undergraduate) or three(postgraduate)
output files.

0.1 Input File:

The input file1 contains a distance threshold and the raw network packet information, that is,
seven attributes of a packet: source address, source port, destination address, destination port,
protocol, arrival time, and packet length.

1. Input file1.txt is sample traffic ow information, which looks like:

src addr src port dst addr dst port protocol arrival time packet length 49880 80 6 115258 52 49880 80 6 115307 52 55256 443 6 115310 46 50592 80 6 115314 40 49880 80 6 115341 52 50592 80 6 115350 40 50592 80 6 115363 40

2. Input file2.txt has a number K, and on the next line include K integer numbers represent
an initial set of K medoids, which looks like:

1 (k=1)
0 (Start from index 0, as the initial start medoid)

0.2 Output File:

You should print out:

for undergraduate students:
1. Flow.txt (for data preprocessing result, 1 mark per test)
2. KMedoids.txt (for clustering result by Manhattan distance, 2 marks for absolute value, 1
mark for details).
for undergraduate students:
1. Flow.txt (for data preprocessing result, 1 mark per test)
2. KMedoids.txt (for clustering result by Manhattan distance, 2 marks).
3. KMedoidsE.txt (for clustering result by Euclidean distance, 1 mark).

What you need to do:

In the data preprocessing module, your program should prepare the ow data for clustering
by the raw packet data, two steps are involved: you need to firstly merge the packets into ows
by the rule: a network ow includes at least TWO packets with same source address, source
port, destination address, destination port, and protocol, then calculate two clustering features:
average transferring time and the average packet length of a ow.

In the clustering module, you need to apply k-medoids algorithm (course slides Chapter
10, not the book’s random method) to find the minimum number of clusters that the sum of the
distance of each ow to its centroid is less than the given threshold. Note: the clustering features
come from data preprocessing module, the distance measurement is Mannhaton distance.

For your convenience, below is the framework of the k-medoids algorithm which you should

We will use PAM algorithm on ClusBasic.pdf page 20:

