Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Transcript
```Mathematics 241
Confidence intervals using the t-distribution
October 11
Goals for the day:
1. Words:
2. R:
t-distribution, degrees of freedom
t.test
3. Big idea:
confidence intervals for the mean are produced using the t-distribution, not the normal distribution
Criminals
The dataset that Student used in his 1908 paper to study the sampling distribution of X̄ came from measurements
of 3,000 criminals. You can find that data at /home/stob/criminals.csv. The variable height gives the height
in inches of the criminals. Student took the criminals as his population and chose samples of size 4 (by writing the
heights on bits of cardboard and choosing them at random out of a pile).
What is the population mean for this population? µ
Describe the shape of the distribution of heights in this population.
Choose a random sample of 4 criminals to study in this exercise, save their heights in a vector, and record the heights
here.
> four=sample(criminals\$height,4)
> four
 64 64 70 59
√
Write a 95% confidence interval for µ using the method we used last week (x̄ ± 1.96s/ n) and your own sample:
Student’s confidence interval can be computed using t.test():
> t.test(four)
One Sample t-test
data: four
t = 28.556, df = 3, p-value = 9.429e-05
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
57.089 71.411
sample estimates:
mean of x
64.25
(There’s a lot of stuff there but for now note that there is a confidence interval in the middle and x̄ is at the end.)
Compute a confidence interval for µ based on your sample.
Compare the two confidence intervals that you have constructed.
Page 2
Student repairs last Thursday’s work
Last week we used the normal distribution to write confidence intervals for µ. An approximate confidence interval
for µ was given by
s
x̄ ± z ∗ √
n
where z ∗ is the “critical value” determined from the normal distribution and the confidence level. (z ∗ = 1.96 for a
95% confidence level.)
These confidence intervals relied on two approximations:
1. The central limit
√ theorem says that X̄ has a distribution that is approximately normal with mean µ and standard
deviation σ/ n.
2. For large n, the sample standard deviation, s is close to σ.
In 1908, Student worried about the second assumption and showed us what to do to avoid it. The Central Limit
Theorem (assumption 1 above) says that
X̄ − µ
√
σ/ n
is approximately normal with mean 0 and standard deviation 1. (In fact it is true that if the population is normal,
X̄ is exactly normal!)
Student found the distribution of the following:
T =
X̄ − µ
√
s/ n
(Student assumed that the population was normal to do this.)
The t-distribution
The t-distribution has one parameter k called the degrees of freedom. The density function is the incredibly messy
(but we don’t need to know it since R does.)
Γ((k + 1)/2)
√
kπΓ(k/2)
1+
x2
n
(k+1)/2
−∞<x<∞
dt(x,k) computes the density curve for the t-distribution with k degrees of freedom.
Page 3
0.00 0.10 0.20 0.30
dt(x, 3)
> curve(dt(x,3),-4,4)
−4
−2
0
2
4
To compare the t-distribution with the normal distribution, we compute some critical values:
> qnorm(.975,0,1); qt(.975,29); qt(.975,9); qt(.975,3)
 1.96
 2.0452
 2.2622
 3.1824
Why did we use .975 here?
Here we graph a normal density and a t density (with 3 degrees of freedom)
0.3
0.2
0.1
0.0
dnorm(x, 0, 1)
0.4
> curve(dnorm(x,0,1),-4,4)
> x=seq(-4,4,.01)
> lines(x,dt(x,3),lty='dotted')
−4
−2
0
2
4
Page 4
A confidence interval for µ is
s
x̄ ± t∗ √
n
where t∗ is the appropriate critical value from the t-distribution with n − 1 degrees of freedom.
The function t.test is all that we need to compute confidence intervals.
Temperatures of Calvin students
Compute a 95% confidence interval for the average temperature of Calvin students based on the sample that is in
the variable normtemp\$Temp.
Speed of light
Michelson and Morley did a series of experiments to measure the speed of light. The built-in R dataset morley has
the 100 measurements (morley\$Speed). The optional argument conf.level to t.test allows one to change the level
of confidence from 95% to 99% by t.test( , conf.level=.99). Compute a 99% confidence level for the speed of
light based on the Michelson-Morley data.
The only thing that is different from the confidence interval constructed by the t.test from that we constructed
last Thursday is the critical value.
Write down the two different critical values used in constructing 99% confidence intervals for these data.
z∗
t∗
Assumptions
The t-distribution is exactly right if the population is exactly normal (as no populations ever are).
If the sample size is large, one can safely use the t-distribution, even if the population is not normal.
If the population is small, you should use the t-distribution only if the population is likely to be reasonably symmetric,
unimodal, and without outliers. How small is too small depends on how badly these characterstics are violated.
```
Related documents