BEST代写-线上编程学术专家

Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

C++代写 | COMP SCI 4094/4194/7094 – Distributed Databases and Data Mining Assignment 3

C++代写 | COMP SCI 4094/4194/7094 – Distributed Databases and Data Mining Assignment 3

本次澳洲代写是C++分布式数据的一个assignment

The assignment

In this assignment you are required to code a traffic packet clustering engine to cluster the raw
network packet to different applications, such as http, smtp. To accomplish this assignment, a
data preprocessing module and a clustering module should be implemented.

You will have two input files, and you should print two(undergraduate) or three(postgraduate)
output files.

0.1 Input File:

The input file1 contains a distance threshold and the raw network packet information, that is,
seven attributes of a packet: source address, source port, destination address, destination port,
protocol, arrival time, and packet length.

1. Input file1.txt is sample traffic ow information, which looks like:

src addr src port dst addr dst port protocol arrival time packet length
202.234.224.254 49880 31.65.181.210 80 6 115258 52
202.234.224.254 49880 31.65.181.210 80 6 115307 52
202.234.35.144 55256 74.39.124.220 443 6 115310 46
119.188.179.82 50592 150.79.7.129 80 6 115314 40
202.234.224.254 49880 31.65.181.210 80 6 115341 52
119.188.179.82 50592 150.79.7.129 80 6 115350 40
119.188.179.82 50592 150.79.7.129 80 6 115363 40

2. Input file2.txt has a number K, and on the next line include K integer numbers represent
an initial set of K medoids, which looks like:

1 (k=1)
0 (Start from index 0, as the initial start medoid)

0.2 Output File:

You should print out:

for undergraduate students:
1. Flow.txt (for data preprocessing result, 1 mark per test)
2. KMedoids.txt (for clustering result by Manhattan distance, 2 marks for absolute value, 1
mark for details).
for undergraduate students:
1. Flow.txt (for data preprocessing result, 1 mark per test)
2. KMedoids.txt (for clustering result by Manhattan distance, 2 marks).
3. KMedoidsE.txt (for clustering result by Euclidean distance, 1 mark).

What you need to do:

In the data preprocessing module, your program should prepare the ow data for clustering
by the raw packet data, two steps are involved: you need to firstly merge the packets into ows
by the rule: a network ow includes at least TWO packets with same source address, source
port, destination address, destination port, and protocol, then calculate two clustering features:
average transferring time and the average packet length of a ow.

In the clustering module, you need to apply k-medoids algorithm (course slides Chapter
10, not the book’s random method) to find the minimum number of clusters that the sum of the
distance of each ow to its centroid is less than the given threshold. Note: the clustering features
come from data preprocessing module, the distance measurement is Mannhaton distance.

For your convenience, below is the framework of the k-medoids algorithm which you should
follow:

We will use PAM algorithm on ClusBasic.pdf page 20: https://myuni.adelaide.edu.au/
courses/64886/discussion_topics/602515

bestdaixie

评论已关闭。