Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data analysis wikipedia , lookup
Computer simulation wikipedia , lookup
Neuroinformatics wikipedia , lookup
Pattern recognition wikipedia , lookup
Corecursion wikipedia , lookup
Plateau principle wikipedia , lookup
Predictive analytics wikipedia , lookup
Generalized linear model wikipedia , lookup
Data assimilation wikipedia , lookup
Statistics 7110 Instructor: Athanasios C. Micheas, Ph.D. Final examination (in class) MDLBH 7, 12:30-2:30 p.m., Tuesday, December 11, 2012 Directions: Create a doc file named “’your name’ Final exam.docx” and enter there all your output, comments, plots etc. Email the file to the instructor at the end of class. Clearly mark your answers, and be sure to answer all questions. Make sure you include all your input (sas/R code) and output (output window and graphics windows). Answer all posed questions using SAS procedures or Rfunctions only, NOT proc insight or the ASSIST module. The dataset needed for the problems can be found in the class website at http://www.stat.missouri.edu/~amicheas/stat7110/datasets. Work on the problems alone; send the file to: [email protected]. Problem 1. (25 pts) Use the fitness.dat data for this problem. The variables are: age, wt (weight), oxy (oxygen measurement), runtime, rstpulse (pulse while resting), runpulse, maxpulse. a) (5 pts) Write a SAS program to read the data using CARDS. In addition the program should do the following. b) (5 pts) We wish to identify the records with runtime above 11.0 or below 9.0. Create a new variable, call it RANGE, taking the value 1 if runtime is below 9.0, 2 if runtime is between 9.0 and 11.0, and 3 otherwise. c) (5 pts) Print variables runtime and RANGE ONLY, with appropriate title and labels. d) (5 pts) Use PROC FREQ to compute the frequencies of the various runtimes. What percent of runtimes are above 11.0? e) (5 pts) Compute the average maxpulse for running times above 11.0. Problem 2. (25 pts) Use the mammals.dat data for this problem. The variables are mammal, body weight (in kilograms) and corresponding brain weight (in grams). a) (5 pts) Write a SAS program to read the data using CARDS, and create two additional variables to hold the logarithms of body and brain weight. In addition the program should do the following. b) (5 pts) Obtain the correlation between the two weight variables in the original and log scale and interpret the results. c) (5 pts) We wish to predict brain weight using body weight. Fit the regression model and check all its assumptions. Do you see any obvious problems? Comment on your findings. d) (5 pts) We now predict brain weight using body weight on the log scale. Fit the regression model and check all its assumptions. Do you still see problems with the model? Comment on your findings. e) (5 pts) Produce an overlay plot to show 95% confidence bounds and make sure you join the points (recall lecture 15) Problem 3. (25 pts) The following data are from a study examining the influence of a specific hormone on eating behavior. Three different drug doses were used, including a control condition (no drug), and the study measured eating behavior for males and females. The dependent variable was the amount of food consumed over a 48-hour period. Males Females No drug 1 6 1 1 1 Small dose 7 7 11 4 6 Large dose 3 1 1 6 4 0 3 7 5 5 0 0 0 5 0 0 2 0 0 3 a) (5 pts) Write a SAS program to enter these data using nested DO statements. b) (5 pts) Produce the interaction plot before we fit the anova model, showing two lines for each gender, with dragdose on the x-axis and amount on the yaxis. Should there be an interaction effect in the model? Justify your answers. c) (10 pts) Run a two-way analysis of variance model with interactions, on factors gender (male, female) and drugdose (none, small, large). If there significant interaction between the gender and drugdose factors? Are there significant main effects? Make sure you check the assumptions of the model you fit (normality, homogeneity of variance, independence of the normal errors) d) (5 pts) Conduct multiple comparisons of the means for each factor (LSD, Scheffe, Tukey) using a=.1 and comment on your findings. e) (5 pts) Now conduct multiple comparisons between the cell means, for each gender by drugdose combination. Comment on your findings. Problem 4. (25 pts) Write an R function (call it whatever you want) that takes as argument an object x and accomplishes the following (and in the order requested here): a) (5 pts) Checks to see if the argument is a matrix or data frame, and looks for at least two and at most five columns in x, otherwise exit the routine with the warning: “The data passed to the function is not valid”. Functions: is.matrix, is.data.frame, stop or return (to exit the function), nrow, ncol, if, as.matrix. b) (5 pts) Assigns the first column to a variable y and the remaining columns in a matrix z, and then fits and prints a summary of the regression of y (response) onto z (predictors) Functions: matrix, lm, print, summary. c) (10 pts) Using the regression object, the routine does the following: I. Produces scatter plots between all the columns of x (in one single graph device). Use appropriate x-y labels for the plots and put the response on the first row. The routine also produces side-by-side boxplots of the predictors. (Hint: use appropriate par and a double for loop) II. Produces diagnostics plots for the regression model (in one single graph). It should include residual vs predicted, residual vs order, qqplot of the residuals and histogram of the residuals. Use appropriate titles for the plots. Functions: qqnorm, hist, plot, residuals, fitted.values, par. d) (5 pts) Test the routine by defining x to be a matrix of three N(0,1) vectors, where each vector should have a size of 100.