本次美国代写是Python编程基础的一个Homework
1 Parse and Preprocess Data
For this part you will have to write Python code to parse data. Specifically, you
will use the arrhythmia dataset \arrhythmia.data” (you can find a detailed
description about the dataset here).
Each line in this dataset corresponds to a patient and contains 280 comma-
separated values. The first 279 are the attributes, whereas the last ele-
ment corresponds to the class of this patient (an integer ranging from 1-16).
Your goal is to write a function that reads the dataset and returns two arrays
(X and y), where X contains the attributes for every patient and y the corre-
sponding class.
Be careful! The dataset also contains missing values denoted with a question
mark `?’. You will need to take care of them and store them as NaN entries in
your X array.
def impor t data ( f i l ename ) :
“””
Write your code here
“””
return X, y
2 Impute or delete missing entries
(a) [2pts.] As described above, the matrix X will contain missing entries,
denoted as NaN. Write a function that imputes these missing entries
with the median of the corresponding feature – column in X (note that
you should filter out these missing entries before computing the median).
def imput e mi s s ing ( X ) :
“””
Write your code here
“””
return X
(b) [1pt.] Explain why sometimes it is better to use the median instead
of the mean of an attribute for missing values.
(c) [1pt.] Another way to deal with missing entries is to discard completely
the samples that do not have an entry for every attribute. Write a Python
function that discards those samples from the dataset.
def d i s c a r d mi s s i n g ( X, y ) :
“””
Write your code here
“””
return X, y