Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

Transcript

Mathematics 241 Confidence intervals using the t-distribution October 11 Goals for the day: 1. Words: 2. R: t-distribution, degrees of freedom t.test 3. Big idea: confidence intervals for the mean are produced using the t-distribution, not the normal distribution Criminals The dataset that Student used in his 1908 paper to study the sampling distribution of X̄ came from measurements of 3,000 criminals. You can find that data at /home/stob/criminals.csv. The variable height gives the height in inches of the criminals. Student took the criminals as his population and chose samples of size 4 (by writing the heights on bits of cardboard and choosing them at random out of a pile). What is the population mean for this population? µ Describe the shape of the distribution of heights in this population. Choose a random sample of 4 criminals to study in this exercise, save their heights in a vector, and record the heights here. > four=sample(criminals$height,4) > four [1] 64 64 70 59 √ Write a 95% confidence interval for µ using the method we used last week (x̄ ± 1.96s/ n) and your own sample: Student’s confidence interval can be computed using t.test(): > t.test(four) One Sample t-test data: four t = 28.556, df = 3, p-value = 9.429e-05 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: 57.089 71.411 sample estimates: mean of x 64.25 (There’s a lot of stuff there but for now note that there is a confidence interval in the middle and x̄ is at the end.) Compute a confidence interval for µ based on your sample. Compare the two confidence intervals that you have constructed. Page 2 Student repairs last Thursday’s work Last week we used the normal distribution to write confidence intervals for µ. An approximate confidence interval for µ was given by s x̄ ± z ∗ √ n where z ∗ is the “critical value” determined from the normal distribution and the confidence level. (z ∗ = 1.96 for a 95% confidence level.) These confidence intervals relied on two approximations: 1. The central limit √ theorem says that X̄ has a distribution that is approximately normal with mean µ and standard deviation σ/ n. 2. For large n, the sample standard deviation, s is close to σ. In 1908, Student worried about the second assumption and showed us what to do to avoid it. The Central Limit Theorem (assumption 1 above) says that X̄ − µ √ σ/ n is approximately normal with mean 0 and standard deviation 1. (In fact it is true that if the population is normal, X̄ is exactly normal!) Student found the distribution of the following: T = X̄ − µ √ s/ n (Student assumed that the population was normal to do this.) The t-distribution The t-distribution has one parameter k called the degrees of freedom. The density function is the incredibly messy (but we don’t need to know it since R does.) Γ((k + 1)/2) √ kπΓ(k/2) 1+ x2 n (k+1)/2 −∞<x<∞ dt(x,k) computes the density curve for the t-distribution with k degrees of freedom. Page 3 0.00 0.10 0.20 0.30 dt(x, 3) > curve(dt(x,3),-4,4) −4 −2 0 2 4 To compare the t-distribution with the normal distribution, we compute some critical values: > qnorm(.975,0,1); qt(.975,29); qt(.975,9); qt(.975,3) [1] 1.96 [1] 2.0452 [1] 2.2622 [1] 3.1824 Why did we use .975 here? Here we graph a normal density and a t density (with 3 degrees of freedom) 0.3 0.2 0.1 0.0 dnorm(x, 0, 1) 0.4 > curve(dnorm(x,0,1),-4,4) > x=seq(-4,4,.01) > lines(x,dt(x,3),lty='dotted') −4 −2 0 2 4 Page 4 A confidence interval for µ is s x̄ ± t∗ √ n where t∗ is the appropriate critical value from the t-distribution with n − 1 degrees of freedom. The function t.test is all that we need to compute confidence intervals. Temperatures of Calvin students Compute a 95% confidence interval for the average temperature of Calvin students based on the sample that is in the variable normtemp$Temp. Speed of light Michelson and Morley did a series of experiments to measure the speed of light. The built-in R dataset morley has the 100 measurements (morley$Speed). The optional argument conf.level to t.test allows one to change the level of confidence from 95% to 99% by t.test( , conf.level=.99). Compute a 99% confidence level for the speed of light based on the Michelson-Morley data. The only thing that is different from the confidence interval constructed by the t.test from that we constructed last Thursday is the critical value. Write down the two different critical values used in constructing 99% confidence intervals for these data. z∗ t∗ Assumptions The t-distribution is exactly right if the population is exactly normal (as no populations ever are). If the sample size is large, one can safely use the t-distribution, even if the population is not normal. If the population is small, you should use the t-distribution only if the population is likely to be reasonably symmetric, unimodal, and without outliers. How small is too small depends on how badly these characterstics are violated.