Download Fall 12, Final

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data analysis wikipedia , lookup

Computer simulation wikipedia , lookup

Neuroinformatics wikipedia , lookup

Pattern recognition wikipedia , lookup

Corecursion wikipedia , lookup

Plateau principle wikipedia , lookup

Predictive analytics wikipedia , lookup

Generalized linear model wikipedia , lookup

Data assimilation wikipedia , lookup

Inverse problem wikipedia , lookup

Regression analysis wikipedia , lookup

Transcript
Statistics 7110
Instructor: Athanasios C. Micheas, Ph.D.
Final examination (in class)
MDLBH 7, 12:30-2:30 p.m., Tuesday, December 11, 2012
Directions: Create a doc file named “’your name’ Final exam.docx” and enter
there all your output, comments, plots etc. Email the file to the instructor at
the end of class. Clearly mark your answers, and be sure to answer all questions.
Make sure you include all your input (sas/R code) and output (output window and
graphics windows). Answer all posed questions using SAS procedures or Rfunctions only, NOT proc insight or the ASSIST module. The dataset needed for
the problems can be found in the class website at
http://www.stat.missouri.edu/~amicheas/stat7110/datasets. Work on the problems
alone; send the file to: [email protected].
Problem 1. (25 pts) Use the fitness.dat data for this problem. The variables are:
age, wt (weight), oxy (oxygen measurement), runtime, rstpulse (pulse while
resting), runpulse, maxpulse.
a) (5 pts) Write a SAS program to read the data using CARDS. In addition
the program should do the following.
b) (5 pts) We wish to identify the records with runtime above 11.0 or below
9.0. Create a new variable, call it RANGE, taking the value 1 if runtime is
below 9.0, 2 if runtime is between 9.0 and 11.0, and 3 otherwise.
c) (5 pts) Print variables runtime and RANGE ONLY, with appropriate title
and labels.
d) (5 pts) Use PROC FREQ to compute the frequencies of the various
runtimes. What percent of runtimes are above 11.0?
e) (5 pts) Compute the average maxpulse for running times above 11.0.
Problem 2. (25 pts) Use the mammals.dat data for this problem. The variables are
mammal, body weight (in kilograms) and corresponding brain weight (in grams).
a) (5 pts) Write a SAS program to read the data using CARDS, and create two
additional variables to hold the logarithms of body and brain weight. In
addition the program should do the following.
b) (5 pts) Obtain the correlation between the two weight variables in the
original and log scale and interpret the results.
c) (5 pts) We wish to predict brain weight using body weight. Fit the regression
model and check all its assumptions. Do you see any obvious problems?
Comment on your findings.
d) (5 pts) We now predict brain weight using body weight on the log scale. Fit
the regression model and check all its assumptions. Do you still see
problems with the model? Comment on your findings.
e) (5 pts) Produce an overlay plot to show 95% confidence bounds and make
sure you join the points (recall lecture 15)
Problem 3. (25 pts) The following data are from a study examining the influence
of a specific hormone on eating behavior. Three different drug doses were used,
including a control condition (no drug), and the study measured eating behavior for
males and females. The dependent variable was the amount of food consumed over
a 48-hour period.
Males
Females
No drug
1
6
1
1
1
Small dose
7
7
11
4
6
Large dose
3
1
1
6
4
0
3
7
5
5
0
0
0
5
0
0
2
0
0
3
a) (5 pts) Write a SAS program to enter these data using nested DO statements.
b) (5 pts) Produce the interaction plot before we fit the anova model, showing
two lines for each gender, with dragdose on the x-axis and amount on the yaxis. Should there be an interaction effect in the model? Justify your
answers.
c) (10 pts) Run a two-way analysis of variance model with interactions, on
factors gender (male, female) and drugdose (none, small, large). If there
significant interaction between the gender and drugdose factors? Are there
significant main effects? Make sure you check the assumptions of the model
you fit (normality, homogeneity of variance, independence of the normal
errors)
d) (5 pts) Conduct multiple comparisons of the means for each factor (LSD,
Scheffe, Tukey) using a=.1 and comment on your findings.
e) (5 pts) Now conduct multiple comparisons between the cell means, for each
gender by drugdose combination. Comment on your findings.
Problem 4. (25 pts)
Write an R function (call it whatever you want) that takes as argument an object x
and accomplishes the following (and in the order requested here):
a) (5 pts) Checks to see if the argument is a matrix or data frame, and looks for
at least two and at most five columns in x, otherwise exit the routine with the
warning: “The data passed to the function is not valid”. Functions: is.matrix,
is.data.frame, stop or return (to exit the function), nrow, ncol, if, as.matrix.
b) (5 pts) Assigns the first column to a variable y and the remaining columns in
a matrix z, and then fits and prints a summary of the regression of y
(response) onto z (predictors) Functions: matrix, lm, print, summary.
c) (10 pts) Using the regression object, the routine does the following:
I. Produces scatter plots between all the columns of x (in one single graph
device). Use appropriate x-y labels for the plots and put the response on
the first row. The routine also produces side-by-side boxplots of the
predictors. (Hint: use appropriate par and a double for loop)
II. Produces diagnostics plots for the regression model (in one single
graph). It should include residual vs predicted, residual vs order, qqplot
of the residuals and histogram of the residuals. Use appropriate titles for
the plots. Functions: qqnorm, hist, plot, residuals, fitted.values, par.
d) (5 pts) Test the routine by defining x to be a matrix of three N(0,1) vectors,
where each vector should have a size of 100.