这是一篇来自澳洲的关于教育支出估算数据的线性模型代写
Education Expenditures Data
The file education.csv contains data collected for estimating the per capita expenditure on public education in 1970 in the United States (US), the data come from Table 5.13 in Chatterjee and Hadi (2012). The following table contains a list of all variables in this file.
Variable Description
STATE Postal abbreviation for the state.
Y Per capita expenditure on public education.
X1 Per capita personal income.
X2 Number of resident per thousand under 18 years of age.
X3 Number of people per thousand residing in urban areas.
Region 1 = Northeast, 2 = North Central, 3 = South, 4 = West.
- Read the data file education.csv into R calling it education. Use a scatterplot matrix to have a look at the data on all numerical variables. Add a loess curve to each sub-plot to help to see possible relationships. Decide what transformations, if any, are appropriate. On the basis of the scatterplots,which variables do you think will best predict the per capita expenditure on public education (Y)?
Any R output produced to answer this question along with the R commands should be placed in Appendix 1 which should be the first appendix at the end of all of your solutions.
- Obtain a frequency table of Region and decide whether any levels should be grouped. Decide also what the reference category should be.
Any R output produced to answer this question along with the R commands should be placed in Appendix 2 which should be the second appendix at the end of all of your solutions.
- Obtain a graph of Y vs X1 using different symbols for the four regions. Add fitted lines for the four regions and comment on the graph. Are X1 and Region promising predictors?
Any R output produced to answer this question along with the R commands should be placed in Appendix 3 which should be the third appendix at the end of all of your solutions.
- Now perform the regression of Y on Region. Check the model assumptions. Comment on the regression results. (Please select region 3 as the reference category and save the model you fit in R as m1.)
Any R output produced to answer this question along with the R commands should be placed in Appendix 4 which should be the fourth appendix at the end of all of your solutions.
- Obtain the regression of Y on all of the predictor variables. Obtain 95% confidence intervals for the regression coefficients. Is the overall regression significant, and what does this say about the situation?
Are any of the predictors significant in the model? Are all model assumptions met? (Please select region 3 as the reference category for Region and save the regression model you fit in R as m2.)
Any R output produced to answer this question along with the R commands should be placed in Appendix 5 which should be the fifth appendix at the end of all of your solutions.
- Following the last question, which observation has the largest standardised residual, the highest leverage, and the largest value of Cook’s distance? Do these values seem to be high enough to regard this observation as an outlier?
Any R output produced to answer this question along with the R commands should be placed in Appendix 6 which should be the sixth appendix at the end of all of your solutions.
Reference
Chatterjee, S. and Hadi, A.S. (2012) Regression Analysis by Example. 5th Edition, John Wiley & Sons.