BEST代写-线上编程学术专家

Best代写-最专业靠谱代写IT | CS | 留学生作业 | 编程代写Java | Python |C/C++ | PHP | Matlab | Assignment Project Homework代写

程序代写|Task sheet for Michaelmas Mock Practical Exam

程序代写|Task sheet for Michaelmas Mock Practical Exam

这是一篇来自英国的关于Michaelmas Mock实践考试的程序代写

 

Instruction

The Ultra Mock test will be opened on Thursday 8th December at 1pm. All submissions must be made before 5pm 9th December (Note the actual practical exam will last 2h only). You can save, close and reopen the test if necessary, but, unless you have been granted dispensation, you must stop all work on the test by 5pm 9th December. I will be available online via email (hailiang.du@durham.ac.uk) during the exam, in case you have any queries. You may want to keep the module Ultra page open during the exam, in case I need to make an announcement. Please note that while this practical is open book, and that you may therefore consult any text or any online resource, you are NOT permitted to speak or contact any person during this practical exam except for myself, or CIS if you are experiencing technical diffiffifficulties.

Please open the examination page on Ultra, then under the “Assignments” section, click on Online Mock

Practical Exam 2022. You will fifind initially a list of six multiple choice questions (Q1 to Q6) which test generic lecture material. After completing these (of course, you can return to them later on and edit your answers), please continue with the tasks described on the task sheet below, and answer questions Q7 to Q18.

Data Modelling I: Carseats data

We now explore classifification trees by analyzing the Carseats data set.

library(ISLR)

library(tree)

attach(Carseats)

You may need to install the “tree” and “ISLR” package fifirst, using install.packages(“tree”) and install.packages(“ISLR”).

“Carseats” is a database in the “ISLR” package, after calling attach(Carseats), objects in the Carseats database can be accessed by simply giving their names.

Please try using ?Carseats in the console fifirst to read the help notes of the data.

In these data, we are interested in how the Sales is inflfluenced by the rest of variables. Note that Sales is a continuous variable, to build classifification tree we fifirst record it as a binary variable. We use the ifelse() function to create a variable, called High, which takes on a value of Yes if the Sales variable exceeds 8, and takes on a value of No otherwise. We then include it in the same dataframe via the data.frame() function to merge High with the rest of the Carseats data.

High=ifelse(Sales<=8,”No”,”Yes”)

Carseats=data.frame(Carseats, High)

Carseats$High <- as.factor(Carseats$High)

as.factor() encodes the vector “High” as a factor (categorical variable).

Now please fifit a classifification tree to these data, and summarize and plot it. Notice that you have to exclude

Sales from the predictors (Hint: using -Sales in the model formula), because the response is derived from it.Please report the size of the fifitted classifification tree (Q7) and the training classifification error rate (Q8)

Solution:

tree.carseats=tree(High~.-Sales,data=Carseats)

summary(tree.carseats)

##

## Classification tree:

## tree(formula = High ~ . – Sales, data = Carseats)

## Variables actually used in tree construction:

## [1] “ShelveLoc” “Price” “Income” “CompPrice” “Population”

## [6] “Advertising” “Age” “US”

## Number of terminal nodes: 27

## Residual mean deviance: 0.4575 = 170.7 / 373

## Misclassification error rate: 0.09 = 36 / 400

#For classification trees, the deviance is calculated using cross-entropy (see lecture slides).

plot(tree.carseats)

text(tree.carseats,pretty=0)

#pretty = 0 make sure that the level names of a factor split attributes are used unchanged.

#For a detailed summary of the tree, print it:

tree.carseats

## node), split, n, deviance, yval, (yprob)

## * denotes terminal node

##

## 1) root 400 541.500 No ( 0.59000 0.41000 )

## 2) ShelveLoc: Bad,Medium 315 390.600 No ( 0.68889 0.31111 )

## 4) Price < 92.5 46 56.530 Yes ( 0.30435 0.69565 )

## 8) Income < 57 10 12.220 No ( 0.70000 0.30000 )

## 16) CompPrice < 110.5 5 0.000 No ( 1.00000 0.00000 ) *

## 17) CompPrice > 110.5 5 6.730 Yes ( 0.40000 0.60000 ) *

## 9) Income > 57 36 35.470 Yes ( 0.19444 0.80556 )

## 18) Population < 207.5 16 21.170 Yes ( 0.37500 0.62500 ) *

## 19) Population > 207.5 20 7.941 Yes ( 0.05000 0.95000 ) *

## 5) Price > 92.5 269 299.800 No ( 0.75465 0.24535 )

## 10) Advertising < 13.5 224 213.200 No ( 0.81696 0.18304 )

## 20) CompPrice < 124.5 96 44.890 No ( 0.93750 0.06250 )

40) Price < 106.5 38 33.150 No ( 0.84211 0.15789 )

## 80) Population < 177 12 16.300 No ( 0.58333 0.41667 )

## 160) Income < 60.5 6 0.000 No ( 1.00000 0.00000 ) *

## 161) Income > 60.5 6 5.407 Yes ( 0.16667 0.83333 ) *

## 81) Population > 177 26 8.477 No ( 0.96154 0.03846 ) *

## 41) Price > 106.5 58 0.000 No ( 1.00000 0.00000 ) *

## 21) CompPrice > 124.5 128 150.200 No ( 0.72656 0.27344 )

## 42) Price < 122.5 51 70.680 Yes ( 0.49020 0.50980 )

## 84) ShelveLoc: Bad 11 6.702 No ( 0.90909 0.09091 ) *

## 85) ShelveLoc: Medium 40 52.930 Yes ( 0.37500 0.62500 )

## 170) Price < 109.5 16 7.481 Yes ( 0.06250 0.93750 ) *

## 171) Price > 109.5 24 32.600 No ( 0.58333 0.41667 )

## 342) Age < 49.5 13 16.050 Yes ( 0.30769 0.69231 ) *

## 343) Age > 49.5 11 6.702 No ( 0.90909 0.09091 ) *

## 43) Price > 122.5 77 55.540 No ( 0.88312 0.11688 )

## 86) CompPrice < 147.5 58 17.400 No ( 0.96552 0.03448 ) *

## 87) CompPrice > 147.5 19 25.010 No ( 0.63158 0.36842 )

## 174) Price < 147 12 16.300 Yes ( 0.41667 0.58333 )

## 348) CompPrice < 152.5 7 5.742 Yes ( 0.14286 0.85714 ) *

## 349) CompPrice > 152.5 5 5.004 No ( 0.80000 0.20000 ) *

## 175) Price > 147 7 0.000 No ( 1.00000 0.00000 ) *

## 11) Advertising > 13.5 45 61.830 Yes ( 0.44444 0.55556 )

## 22) Age < 54.5 25 25.020 Yes ( 0.20000 0.80000 )

## 44) CompPrice < 130.5 14 18.250 Yes ( 0.35714 0.64286 )

## 88) Income < 100 9 12.370 No ( 0.55556 0.44444 ) *

## 89) Income > 100 5 0.000 Yes ( 0.00000 1.00000 ) *

## 45) CompPrice > 130.5 11 0.000 Yes ( 0.00000 1.00000 ) *

## 23) Age > 54.5 20 22.490 No ( 0.75000 0.25000 )

## 46) CompPrice < 122.5 10 0.000 No ( 1.00000 0.00000 ) *

## 47) CompPrice > 122.5 10 13.860 No ( 0.50000 0.50000 )

## 94) Price < 125 5 0.000 Yes ( 0.00000 1.00000 ) *

## 95) Price > 125 5 0.000 No ( 1.00000 0.00000 ) *

## 3) ShelveLoc: Good 85 90.330 Yes ( 0.22353 0.77647 )

## 6) Price < 135 68 49.260 Yes ( 0.11765 0.88235 )

## 12) US: No 17 22.070 Yes ( 0.35294 0.64706 )

## 24) Price < 109 8 0.000 Yes ( 0.00000 1.00000 ) *

## 25) Price > 109 9 11.460 No ( 0.66667 0.33333 ) *

## 13) US: Yes 51 16.880 Yes ( 0.03922 0.96078 ) *

## 7) Price > 135 17 22.070 No ( 0.64706 0.35294 )

## 14) Income < 46 6 0.000 No ( 1.00000 0.00000 ) *

## 15) Income > 46 11 15.160 Yes ( 0.45455 0.54545 ) *

The size of the fifitted tree equals to the number of terminal nodes 27. We see that the training classification error rate is 9%.

In order to properly evaluate the performance of a classifification tree on these data, we must estimate the test error rather than simply computing the training error. To do so, we split the observations into a training set (250 observations) and a test set (150 observations). Please do not change the seed number!

set.seed(743)

train_index=sample(1:nrow(Carseats),250)

data_train=Carseats[train_index,]

data_test=Carseats[-train_index,]

Now please build a classifification tree using the training set, and evaluate its performance on the test set.

What is the percentage of correct predictions?(Q9)

bestdaixie

评论已关闭。