Marks achieved in this assignment will contribute towards 50% of the fifinal module mark. You should attempt all questions on this sheet. Note that the questions are organised in the order we covered the topics, and not in order of diffiffifficulty. Therefore it is advised that you read through the questions fifirst, and start working on those that you feel more comfortable with.
Deadline: Noon (12pm), on 3rd March 2023
You should submit one pdf via eBART containing your solutions – it should be written up using word processing software (e.g. LaTeX, R Markdown, or Word). Solutions are expected to be concise, well structured and well presented. Commented R code (e.g. ‘model <- glm(…)’) and the outcomes/plots should form part of your solutions. Do not display too much raw R output (e.g. don’t display the full output of ‘summary(model)’), but edit this down to the essentials. Ensure to include justifification for each step of your analyses, providing comments alongside your R code to explain what you are doing and add appropriate titles and labelled axes to your plots. Hand written solutions will be accepted where mathematical descriptions are required, but a professional word processed submission is preferred.
You are expected to work independently – strict disciplinary action will be taken for any plagiarism. Late submissions will also be penalised according the University’s late submission policy.
The data required for this assignment datasets_exercises.RData can be downloaded from the ELE page and loaded into R using the load() function.
The data frame nlmodel contains data on a response variable y and a single explanatory variable x. A scatter plot of y versus x suggests a strong non-linear relationship:
(a) [1 mark] Why can’t this model be fifit using a linear (regression) model?
(b) [2 marks] Write down the likelihood L(θ1, θ2, σ2 ; y, x) and the log-likelihood ` (θ1, θ2, σ2 ; y, x).
(c) [1 mark] Write an R function mylike() which evaluates the negative log-likelihood (i.e. −` (θ1, θ2, σ; y, x)) for any values of the three parameters.
(d) [3 marks] Use the R function nlm() in association with your function mylike() to numerically minimise the log-likelihood and report the maximum likelihood estimates for the model parameters. Provide some evidence of how you chose sensible starting values.
(e) [2 marks] Estimate the standard errors and construct 99% confifidence intervals for θ1 and θ2.
(f) [2 marks] Test the hypothesis that θ2 = 0.08 at the 10% signifificance level (not using the confifidence interval).
(g) [4 marks] Produce a plot of the associated mean relationship and the associated 95% prediction intervals on a scatter plot of y versus x. Comment on the appropriateness of the model.
The dataframe aids data relates to the number of quarterly AIDS cases in the UK, yi , from January 1983 to March 1994. The variable cases is yi and date is time, symbolised here as xi . A scatter plot of yi versus xi shows an increasing trend in cases:
In this question we consider two competing models to describe the trend in the number of cases. Model 1 is
Yi ∼ Pois(λi)
log(λi) = β0 + β1xi
and Model 2 is
Yi ∼ N(µi , σ2 )
log(µi) = γ0 + γ1xi
(a) [2 marks] Comment on whether the proposed models are sensible in terms of the distribution and the relationship of x with the mean.
(b) [3 marks] Fit the two models in R and plot the estimated trends from each model (ˆλi andˆµi) on top of the data with approximate 95% confifidence intervals around the mean. Comment on the validity of each model (based on the plot).
(c) [2 marks] Use an appropriate criterion to comment on which model is preferable.
(d) [2 marks] Produce the deviance residuals vs fifitted values (ˆλi andˆµi) plot for each model, comment appropriately and thus propose a way that the two models might be extended to improve the fifit.
(e) [4 marks] Implement the proposed extensions to each model, to arrive at a fifinal version for each of them (justifified by appropriate hypothesis tests).
(f) [8 marks] On the basis of your answers (a)-(d), but also on arguments of model fifit based on the deviance,comment on which (if any) of the two fifinal models in (e) you would choose as the best. Mention at least one reason why either model is not ideal.
(g) [4 marks] Further extend your fifinal Poisson model to a Negative Binomial model and comment on whether this model is preferable to the other two, on the basis of all the criteria used for comparison so far.