Download here

Homework #1 APPM 4590/5590, Statistical Modeling, Spring 2016 Instructions: Answer the following questions and write your answers in a word processor (e.g., LATEX, word). Appropriate graphics should be included. Working in small groups is allowed, but it is important that you make an effort to master the material and hand in your own work. Due in class on Friday January 22, 2016 1. For this question, submit your code (with comments) and histograms, but not a print-out of the samples. (a) Generate 1000 samples, each of size n = 50, from your favorite (non-normal) distribution. (b) Calculate the mean of each sample. (c) Construct a histogram of the means. What do you notice? (d) Conduct a normality diagnostic (e.g., Q-Q plot, Shapiro-Wilk test) on the means. What can you conclude? (e) Can you give an intuitive explanation of the result? 2. stat500.csv is a data file containing student grades for a statistical modeling course at CU Boulder. Suppose that the 55 students included in the file were randomly chosen from the set of all students that took Statistical Modeling at CU Boulder between 2009 and 2015. (a) What is the population for this study? (b) Is this an observational study or experiment? (c) Do think it is likely that the sample is a good representation of the population? Why? (d) Based on the sampling mechanism, should we be able to generalize the results to the population? Why? (e) Create a histogram of the final variable. Comment on the distribution (e.g., is the distribution symmetric, skewed, etc?). (f) Assess whether the final data is normal by... i. adding a normal curve over the histogram. Interpret the results. ii. conducting the Shapiro-Wilk Normality test. State H0 and H1 and explain why you did or did not reject H0 . iii. constructing a Q-Q plot of the data against the quantiles of a normal distribution. Interpret this plot. (g) Standardize the midterm and final variable (so that the mean of each is 0 and the standard deviation is 1). Then, create a scatterplot of the standardized midterm vs final variables. Does anything standout to you? 3. Assume that the variables X and Y are related linearly in a population and that a sample of n data pairs, (xi , yi ), i ∈ {1, ...n}, have been measured. Prove the following results: (a) (b) (c) n X i=1 n X i=1 n X (xi − x̄) = 0. (x̄2 − xi x̄) = 0. (x̄ȳ − yi x̄) = 0. i=1 4. Explain whether you agree or disagree with the following statements: (a) Cov(Y, X) and Cor(Y, X) can take values between −∞ and ∞. (b) If Cov(Y, X) = 0 or Cor(Y, X) = 0, one can conclude that there is no relationship between Y and X. (c) The least squares line fitted to the points in the scatter plot of Y versus Ŷ has zero intercept and a unit slope. 5. Suppose that there is no good reason to believe that Y and X are correlated, and instead of fitting a simple linear regression model to your data, you fit Y = β0 + ε. (a) Show that the ordinary least squares estimate of β0 is βb0 = ȳ. (b) P Show that the least absolute value estimate of β0 , found by minimizing n i=1 |yi − β0 |, is the sample median, ỹ. (c) What is one advantage and one disadvantage of the mean as a measure of center? (d) What is one advantage and one disadvantage of the median as a measure of center?

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download here