Download here

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Homework #1
APPM 4590/5590, Statistical Modeling, Spring 2016
Instructions: Answer the following questions and write your answers in a word processor
(e.g., LATEX, word). Appropriate graphics should be included. Working in small groups
is allowed, but it is important that you make an effort to master the material and hand
in your own work. Due in class on Friday January 22, 2016
1. For this question, submit your code (with comments) and histograms, but not a
print-out of the samples.
(a) Generate 1000 samples, each of size n = 50, from your favorite (non-normal)
distribution.
(b) Calculate the mean of each sample.
(c) Construct a histogram of the means. What do you notice?
(d) Conduct a normality diagnostic (e.g., Q-Q plot, Shapiro-Wilk test) on the
means. What can you conclude?
(e) Can you give an intuitive explanation of the result?
2. stat500.csv is a data file containing student grades for a statistical modeling course
at CU Boulder. Suppose that the 55 students included in the file were randomly
chosen from the set of all students that took Statistical Modeling at CU Boulder
between 2009 and 2015.
(a) What is the population for this study?
(b) Is this an observational study or experiment?
(c) Do think it is likely that the sample is a good representation of the population? Why?
(d) Based on the sampling mechanism, should we be able to generalize the results
to the population? Why?
(e) Create a histogram of the final variable. Comment on the distribution (e.g.,
is the distribution symmetric, skewed, etc?).
(f) Assess whether the final data is normal by...
i. adding a normal curve over the histogram. Interpret the results.
ii. conducting the Shapiro-Wilk Normality test. State H0 and H1 and explain why you did or did not reject H0 .
iii. constructing a Q-Q plot of the data against the quantiles of a normal
distribution. Interpret this plot.
(g) Standardize the midterm and final variable (so that the mean of each is 0 and
the standard deviation is 1). Then, create a scatterplot of the standardized
midterm vs final variables. Does anything standout to you?
3. Assume that the variables X and Y are related linearly in a population and that
a sample of n data pairs, (xi , yi ), i ∈ {1, ...n}, have been measured. Prove the
following results:
(a)
(b)
(c)
n
X
i=1
n
X
i=1
n
X
(xi − x̄) = 0.
(x̄2 − xi x̄) = 0.
(x̄ȳ − yi x̄) = 0.
i=1
4. Explain whether you agree or disagree with the following statements:
(a) Cov(Y, X) and Cor(Y, X) can take values between −∞ and ∞.
(b) If Cov(Y, X) = 0 or Cor(Y, X) = 0, one can conclude that there is no
relationship between Y and X.
(c) The least squares line fitted to the points in the scatter plot of Y versus Ŷ
has zero intercept and a unit slope.
5. Suppose that there is no good reason to believe that Y and X are correlated, and
instead of fitting a simple linear regression model to your data, you fit Y = β0 + ε.
(a) Show that the ordinary least squares estimate of β0 is βb0 = ȳ.
(b) P
Show that the least absolute value estimate of β0 , found by minimizing
n
i=1 |yi − β0 |, is the sample median, ỹ.
(c) What is one advantage and one disadvantage of the mean as a measure of
center?
(d) What is one advantage and one disadvantage of the median as a measure of
center?