Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

Python代写 | CS 506 Introduction to Python HW0

Python代写 | CS 506 Introduction to Python HW0


1 Parse and Preprocess Data

For this part you will have to write Python code to parse data. Specifically, you
will use the arrhythmia dataset \” (you can find a detailed
description about the dataset here).

Each line in this dataset corresponds to a patient and contains 280 comma-
separated values. The first 279 are the attributes, whereas the last ele-
ment corresponds to the class of this patient (an integer ranging from 1-16).

Your goal is to write a function that reads the dataset and returns two arrays
(X and y), where X contains the attributes for every patient and y the corre-
sponding class.

Be careful! The dataset also contains missing values denoted with a question
mark `?’. You will need to take care of them and store them as NaN entries in
your X array.

def impor t data ( f i l ename ) :
Write your code here
return X, y

2 Impute or delete missing entries

(a) [2pts.] As described above, the matrix X will contain missing entries,
denoted as NaN. Write a function that imputes these missing entries
with the median of the corresponding feature – column in X (note that
you should filter out these missing entries before computing the median).

def imput e mi s s ing ( X ) :
Write your code here
return X

(b) [1pt.] Explain why sometimes it is better to use the median instead
of the mean of an attribute for missing values.

(c) [1pt.] Another way to deal with missing entries is to discard completely
the samples that do not have an entry for every attribute. Write a Python
function that discards those samples from the dataset.
def d i s c a r d mi s s i n g ( X, y ) :
Write your code here
return X, y