Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Statistical mechanics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Resampling (statistics) wikipedia , lookup
History of statistics wikipedia , lookup
German tank problem wikipedia , lookup
Statistical Inference: Estimation Jamie Monogan University of Georgia Introduction to Data Analysis Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 1 / 15 Objectives By the end of this meeting, participants should be able to: I Distinguish between point and interval estimates of population parameters. I Define Type I and Type II errors and explain why they should be avoided. I Explain the use of a t-distribution for performing inference on a mean. I Calculate and interpret a confidence interval. I Deduce appropriate sample sizes for proportions and means. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 2 / 15 Goal: Drawing Inferences About Population Parameters Two techniques: Point estimation What are some estimators we know? Desirable properties: Unbiased Efficient Consistent (MLE relies heavily on this.) Interval estimation Confidence interval = point estimate ± margin of error Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 3 / 15 Types of Error in Inference Analogy: Serving a verdict on a jury v. the truth. Typical form of a statistical hypothesis: H0 : β = 0 HA : β > 0 Type I Error: Incorrectly reject a true null hypothesis. The null hypothesis (that no relationship exists) is true. Our analysis, however, incorrectly leads us to conclude that a relationship exists. Type II Error: Incorrectly accept a false null hypothesis. The null hypothesis (that no relationship exists) is false. Our analysis, however, incorrectly leads us to conclude that no relationship exists. Researchers need to report the Type I error rate. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 4 / 15 Defining the Confidence Interval The Bernoulli PMF has mean π and standard deviation p π(1 − π). Assume the sample size is sufficiently large so we can use the normal approximation for π. We are interested in building a 1 − α confidence interval for the unknown π. Start with p̂ = Ȳ . p Define SE (p̂) = p̂(1 − p̂)/n. We can standardize using the z-score for p̂: z= Jamie Monogan (UGA) p̂ − π SE (p̂) Statistical Inference: Estimation POLS 7012 5 / 15 Defining the Confidence Interval I Now the confidence interval is defined by: 1 − α = P(−z ∗ ≤ z ≤ z ∗ ) = P(−z ∗ ≤ p̂ − π ≤ z ∗) SE (p̂) = P(−z ∗ SE (p̂) ≤ p̂ − π ≤ z ∗ SE (p̂)) = P(p̂ − z ∗ SE (p̂) ≤ π ≤ p̂ + z ∗ SE (p̂)) I And denoted: [p̂ − z ∗ SE (p̂), p̂ + z ∗ SE (p̂)]. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 6 / 15 Using R to Get the z ∗ Values m <- 50; n <- 10 Y <- rbinom(m,n,0.8) p.hat <- mean(Y)/n SE <- sqrt(p.hat*(1-p.hat)/m) alpha <- 0.05 z.star <- -qnorm(alpha/2) c(p.hat - SE*z.star, p.hat + SE*z.star) Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 7 / 15 I Which of these is the correct interpretation of a (1 − α) confidence interval? . An interval that has a 1 − α% chance of containing the true value of the parameter. . An interval that over 1 − α% of replications contains the true value of the parameter, on average. Confidence Intervals Interpreting Confidence e Coverage I Note: If you use Bayesian methods, you can make different kinds of statements. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 8 / 15 Confidence Intervals for the Population Mean, µ Start with a sample, X1 , X2 , . . . , Xn , where n is sufficiently large that we can rely on the CLT. So the z statistic has standard normal distribution: z= Jamie Monogan (UGA) X̄ − µ X̄ − µ √ = σ/ n SD(X̄ ) Statistical Inference: Estimation POLS 7012 9 / 15 Confidence Intervals for the Population Mean, µ, Cont. But we don’t know σ 2 for sure, so we use the sample variance as a substitute: n 1 X s2 = (Xi − X̄ ) n−1 i=1 Now we have a “t” statistic instead of a “z” statistic: t= X̄ − µ X̄ − µ √ = s/ n SE (X̄ ) which has the student’s-t distribution with n − 1 degrees of freedom (a robust statistic). Note that these are called pivotal quanties since we know the distribution. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 10 / 15 Comparing Normal and Student’s-t Distributions 0.2 0.0 0.1 norm.dens 0.3 0.4 Normal in green, along with t distributions with 1, 3, & 10 degrees of freedom ï4 ï2 0 2 4 ruler Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 11 / 15 Calculating the Confidence Interval for µ I The CI is just: CIµ = X̄ ± t ∗ SE (X̄ ) where t∗ is the CDF value of the student’s-t corresponding to the α of interest. I Find CDF values in R: qt(0.025,df=3) [1] -3.182446 qt(0.025,df=25) [1] -2.059539 Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 12 / 15 Sample Size Considerations I If you are designing your own study, you will likely want to consider how large a sample you need before you start collecting data. I If you have an exact margin of error in mind and a good guess of the true population variance, you can deduce the necessary sample size. I All you have to do is solve for n in the margin of error formula. I Related: power analysis. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 13 / 15 Power Analysis Focus: Significance Testing The seminal book: Cohen, Jacob. 1988. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum Associates. Notion: Before designing a study, choose a sample size that will shrink your Type II error rate below an acceptable level. We define: Probability of a Type I error: α=P(reject H0 |H0 is true) Probability of a Type II error: β=P(fail to reject H0 |H0 is false) power = 1 − β = 1−P(fail to reject H0 |H0 is false) Given three of the following, we can algebraically solve for the fourth: Sample size Effect size Type I error rate Power (1−Type II error rate) R: “pwr” library Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 14 / 15 For Next Time Review the objectives of prior classes and prepare for the midterm exam. (To be held March 7.) Come with questions about exam-related objectives (first priority), software usage, research projects, and anything else you’re concerned about in the class. If your questions do not fill-up class time, we will have a bonus lecture on graphing. Answer questions 5.14, 5.20, & 5.42. In the software of your choice: Open the 2004 National Election Study, http://monogan.myweb.uga.edu/teaching/us/nes2004.dta You want to draw inferences about the population mean of years of education (educ). What is the sample mean for years of education? What is the standard error of your estimate of the mean? What is the 90% confidence interval for the mean? Interpret your point and interval estimates of the population mean. Jamie Monogan (UGA) Statistical Inference: Estimation POLS 7012 15 / 15