Learning objectives being assessed
- Select and develop appropriate models
- Estimate and simulate, and use resampling to measure uncertainty
- Explain and interpret the analyses
- This is an individual assignment. It is expected that you complete this assignment individually, without discussing with other people, including students in this unit, or posting questions to help sites. Any violations will be reported according to the rules in Monash Student Academic Integrity Procedure. You can use the textbook, lecture notes, and any material openly available on the web (appropriately acknowledged) as needed. You are encouraged to seek help from your tutor or lecturer – be aware, that they can help you get past any hurdles and thinking about the problems but will not provide specific answers to the assignment questions.
- The assignment needs to be turned in as Rmarkdown (
.Rmd), and as
html, zipped into a single
.zipfile uploaded to moodle. No other formats will be accepted. It is expected that the knitting the Rmarkdown will produce the html file submitted. If the Rmarkdown file does not knit, then the score for 1 and 2 will be reduced by 50%.
- R code should be hidden in the final report, unless it is specifically requested.
- Original work is expected. Any material used from external sources needs to be acknowledged.
- A skeleton
Rmdfile is provided for you to complete and turn in.
- Total mark will be out of 30
- 2 points will be reserved for readability, and appropriate citing of external sources
- 2 points will be reserved for reproducibility, that the report can be re-generated from the submitted Rmarkdown.
- 2 points reserved for clean and readable code.
- Accuracy and completeness of answers, and clarity of explanations will be the basis for the remaining 24 points.
1. (4pts) About principal components analysis and regression residuals
Equation 12.11 (textbook edition 2) breaks the total variance into variance explained by the first principal components, and the MSE of the -dimensional approximation.
- Explain what this second part is, what is the MSE of the -dimensional approximation (in 30 words or less)?
- What is the quantity measuring (in 20 words or less)?
- Show how equation 12.11 can be rearranged to get , and explain how the second part relates to RSS (residual sum of squares)? (in 30 words or less)
- Explain how residuals from PCA are different from residuals from a regression model (in 50 words or less).
2. (4pts) Discriminant analysis
For Assume that the prior probability is equal for the two groups.
- What is ?
- Compute the pooled variance-covariance matrix.
- Using the formula from lecture notes (lecture 4a, slide 11) show that the discriminant space is .
- Is one variable more important than another? If so, which one? (using 20 words or less)
3. (8pts) This question examines models for a categorical variable
palmerpenguins is a new R data package, with interesting measurements on penguins of three different species. Subset the data to contain just the Adelie and Gentoo species, and only the variables species and the four physical size measurement variables. Use standardised variables for answering all of the questions.