BEST代写-线上编程学术专家

Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

SQL数据库代写|BUSA8001 Applied Predictive Analytics – Programming Task 2

SQL数据库代写|BUSA8001 Applied Predictive Analytics – Programming Task 2

这是一篇澳洲的sql 数据库开发代码代写

 

Putall your workinto a file titled BUSA8001_programming_task2_MQ_ID.ipynb where MQ_ID is your Macquarie University student ID number (e.g. if MQ_ID == 12345678 then youneed to submit BUSA8001_programming_task2_12345678.ipynb).

Failure to submit a correctly named file will result in a loss of 30 points.

Failure to supply solutions in the cells provided below each question will result in a loss of 30points.

Follow all instructions closely and not print your variables to screen unless explicitly asked todo so. Failure to do so will result in additional point deductions.

Problem 1 – (30 points)

Perform the following tasks in python, writing your code in the cells provided underneatheach question.

Q1. Import the credit card data from https://archive.ics.uci.edu/ml/machine-learning-databases/00350/default of credit ard clients.xls directly into a pandas DataFrame named `df` making sure you skip the top row when reading the dataset. Delete the ‘ID” column after importing the data. (5 points)

Q2. Rename the column ‘PAY_0’ to ‘PAY_1’ and the column ‘default payment next month’ to ‘payment_default’ (5 points)

Q3. Create a one-dimensional NumPy array named `y` by exporting the first 12,500 observations of ‘payment_default’ column from df (hint: see `ravel` NumPy method). Similarly, create a two-dimensional NumPy array named `X` by exporting the first 12,500 observatations of ‘PAY_1’, ‘PAY_2’, ‘AGE’, ‘SEX’, ‘MARRIAGE’, ‘EDUCATION’ and ‘BILL_AMT1’ columns. (10 points)

Q4. Use an appropriate `scikit-learn` library we learned in class to create the following NumPy arrays: `y_train`, `y_test`, `X_train` and `X_test` by splitting the data into 68% train and 32% test datasets. Set `random_state` to 3 and stratify subsamples so that train and test datasets have roughly equal proportions of the target’s class labels. (5 points)

Q5. Use an appropriate `scikit-learn` library we learned in class to standardize features from train and test datasets to mean zero and variance one, as discussed in class. (5 points)

 Problem 2 – (30 Points)

Q6. Using approapriate `scikit-learn` libararies we learned in class to fit the following classifiers to the training dataset constructed in Problem 1.

  • Logistic Regression – name your instance `lr` set `random_state=11`
  • Support Vector Machine with Linear Kernel – name your instance `svm_linear` set `C=6.0` and `random_state=11`
  • Support Vector Machine with RBF Kernel – name your instance `svm_rbf` set `gamma = 21`, `C=5.6`, `random_state=11`
  • Decision Tree – name your instance `tree` set `criterion=’entropy’`, `max_depth = 4`, `random_state=11`
  • Random Forest – name your instance `forest` set `criterion=’entropy’`, `n_estimators=21`, `random_state=11`
  • KNN – name your instance `knn` set `n_neighbors=6`, `p=3`, `metric=’minkowski’`When initializing instances of the above classifiers only set parameters provided above and leave all other parameters equal to their `scikit-learn` default values. (30 points)

Problem 3 – (40 points)

Q7. Using a method built into each of the above classifiers, compute prediction accuracy on training data for each classifier and store it into variables named according to the following pattern: classifier_name_accuracy_train`, for instance you should have `lr_accuracy_train`. (10 points)

Q8. Using a method built into each of the above classifiers, compute prediction accuracy on test data for each classifier and store it into variables named according to the following  pattern:classifier_name_accuracy_test`, for instance you should have `lr_accuracy_test`. (10 points)

Q9. Explain which methods rank in the first two places according to their ability to accurately classify train data, and which two methods perform worst on train dataset? (10 points)

Q10.

  • Exaplain which methods rank in the first two places according to their ability to accurately classify test data, and which two methods perform worst on test dataset? (3 marks)
  • How do these accuracies compare with the ones reported in Q9? Is this expected, and why (or why not)? (7 marks)
bestdaixie

评论已关闭。