SQL数据库代写｜BUSA8001 Applied Predictive Analytics - Programming Task 2

这是一篇澳洲的sql 数据库开发代码代写

Putall your workinto a file titled BUSA8001_programming_task2_MQ_ID.ipynb where MQ_ID is your Macquarie University student ID number (e.g. if MQ_ID == 12345678 then youneed to submit BUSA8001_programming_task2_12345678.ipynb).

•Failure to submit a correctly named file will result in a loss of 30 points.

•Failure to supply solutions in the cells provided below each question will result in a loss of 30points.

•Follow all instructions closely and not print your variables to screen unless explicitly asked todo so. Failure to do so will result in additional point deductions.

Problem 1 – (30 points)

Perform the following tasks in python, writing your code in the cells provided underneatheach question.

Q1. Import the credit card data from https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default of credit ard clients.xls directly into a pandas DataFrame named `df` making sure you skip the top row when reading the dataset. Delete the ‘ID” column after importing the data. (5 points)

Q2. Rename the column ‘PAY_0’ to ‘PAY_1’ and the column ‘default payment next month’ to ‘payment_default’ (5 points)

Q3. Create a one-dimensional NumPy array named `y` by exporting the first 12,500 observations of ‘payment_default’ column from df (hint: see `ravel` NumPy method). Similarly, create a two-dimensional NumPy array named `X` by exporting the first 12,500 observatations of ‘PAY_1’, ‘PAY_2’, ‘AGE’, ‘SEX’, ‘MARRIAGE’, ‘EDUCATION’ and ‘BILL_AMT1’ columns. (10 points)

Q4. Use an appropriate `scikit-learn` library we learned in class to create the following NumPy arrays: `y_train`, `y_test`, `X_train` and `X_test` by splitting the data into 68% train and 32% test datasets. Set `random_state` to 3 and stratify subsamples so that train and test datasets have roughly equal proportions of the target’s class labels. (5 points)

Q5. Use an appropriate `scikit-learn` library we learned in class to standardize features from train and test datasets to mean zero and variance one, as discussed in class. (5 points)

Problem 2 – (30 Points)

Q6. Using approapriate `scikit-learn` libararies we learned in class to fit the following classifiers to the training dataset constructed in Problem 1.

Logistic Regression – name your instance `lr` set `random_state=11`
Support Vector Machine with Linear Kernel – name your instance `svm_linear` set `C=6.0` and `random_state=11`
Support Vector Machine with RBF Kernel – name your instance `svm_rbf` set `gamma = 21`, `C=5.6`, `random_state=11`
Decision Tree – name your instance `tree` set `criterion=’entropy’`, `max_depth = 4`, `random_state=11`
Random Forest – name your instance `forest` set `criterion=’entropy’`, `n_estimators=21`, `random_state=11`
KNN – name your instance `knn` set `n_neighbors=6`, `p=3`, `metric=’minkowski’`When initializing instances of the above classifiers only set parameters provided above and leave all other parameters equal to their `scikit-learn` default values. (30 points)

Problem 3 – (40 points)

Q7. Using a method built into each of the above classifiers, compute prediction accuracy on training data for each classifier and store it into variables named according to the following pattern: classifier_name_accuracy_train`, for instance you should have `lr_accuracy_train`. (10 points)

Q8. Using a method built into each of the above classifiers, compute prediction accuracy on test data for each classifier and store it into variables named according to the following pattern:classifier_name_accuracy_test`, for instance you should have `lr_accuracy_test`. (10 points)

Q9. Explain which methods rank in the first two places according to their ability to accurately classify train data, and which two methods perform worst on train dataset? (10 points)

Q10.

Exaplain which methods rank in the first two places according to their ability to accurately classify test data, and which two methods perform worst on test dataset? (3 marks)
How do these accuracies compare with the ones reported in Q9? Is this expected, and why (or why not)? (7 marks)

BEST代写-线上留学生作业代写 & 论文代写专家

SQL数据库代写｜BUSA8001 Applied Predictive Analytics – Programming Task 2