• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia, lookup

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Transcript
```Math 160 - Cooley
Intro to Statistics
OCC
Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown
In Section 8.2, we learned how to determine a confidence interval for a population mean, μ, when the population
standard deviation, σ, is known. Recall:
Standardized Version of the Sample Mean
Suppose that a variable x of a population is normally distributed with mean μ and standard deviation σ. Then, for

samples of size n, the variable x is also normally distributed and has mean μ and standard deviation
.
n
Equivalently,
x 
z
,
/ n
has the standard normal distribution.
What if, as is usual in practice, the population standard deviation is unknown? Then we cannot base our
confidence-interval procedure on the standardized version of x . The best we can do is estimate the population
standard deviation, σ, by the sample standard deviation, s. So, by replacing σ with s, we now have the:
Studentized Version of the Sample Mean
Suppose that a variable x of a population is normally distributed with mean µ. Then, for samples of size n, the
variable
x
t
s/ n
has the t-distribution (or Student’s t-distribution) with n – 1 degrees of freedom.
William Sealy Gosset introduced the t-statistic in 1908. Gosset was a statistician as well as a chemist for the
Guinness brewery in Dublin, Ireland. The Guinness brewery had the policy of recruiting the best graduates from
Oxford and Cambridge, selecting from those who could provide applications of biochemistry and statistics to the
company’s established industrial processes. Gosset was one such graduate and in the process, devised the t-test.
It was originally envisioned as a way to monitor the quality of the stout (the dark beer the brewery produces) in
a cost effective way. Gosset published the test under the pen-name ‘Student’ in Biometrika circa 1908. The
reason for the pen-name was due to Guinness’ insistence, as the company wanted to keep their policy about
utilizing statistics as part of their ‘trade secrets’.
-1-
Math 160 - Cooley
Intro to Statistics
OCC
Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown
Basic Properties Of t-Curves
Property 1: The total area under any t-curve equals 1.
Property 2: The t-curve extends indefinitely in both directions, approaching, but never touching, the
horizontal axis as it does so.
Property 3: A t-curve is symmetric about 0
Property 4: As the number of degrees of freedom becomes larger, t-curves look increasingly like the standard
normal curve.
Note:
For df ≥ 1000, the t-density curves and the standard Normal curve are virtually indistinguishable. This
happens because s estimates σ more accurately as the sample size increases. So using s in place of σ
causes little extra variation when the sample is large.
An example of t-curves with different degrees of freedom and all with a 95% confidence level
-2-
Math 160 - Cooley
Intro to Statistics
OCC
Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown
INTERVAL PROCEDURE #2 – The One-Mean t-Interval Procedure
Purpose: To find a confidence interval for a population mean, μ.
Assumptions
1) Simple Random Sample
2) Normal population or large sample (n ≥ 30)
3) σ unknown
Step 1 – For a confidence level 1 – α, use Table IV to find t
2
with df = n – 1, where n is the sample size.
Step 2 – The confidence interval for µ is from
x  t 2
where t
2
s
s
to x  t 2
n
n
is found in Step 1, and x & s are computed from the sample data.
Step 3 – Interpret the confidence interval.
The confidence interval is exact for normal populations and is approximately correct for large samples from
non-normal populations.
Using the t procedures IN PRACTICE**
1.
Except in the case of small samples, the assumption that the data are an SRS from the population of
interest is more important than the assumption that the population distribution is Normal.
2.
Sample size less than 15: Use t procedures if the data appear close to Normal (symmetric, single peak,
no outliers). If the data are skewed or if outliers are present, do not use t.
3.
Sample size at least 15: The t procedures can be used except in the presence of outliers or strong
skewness.
4.
Large samples: The t procedures can be used even for clearly skewed distributions when the sample
size is large, roughly n ≥ 30.
-3-
Table IV - Critical Values of Student’s t-Distribution
df
0.10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.309
1.309
1.308
1.307
1.306
1.306
1.305
1.304
1.304
1.303
1.303
1.302
1.302
1.301
1.301
1.300
1.300
Amount of  in one tail.
0.05
0.025
0.01
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.696
1.694
1.692
1.691
1.690
1.688
1.687
1.686
1.685
1.684
1.683
1.682
1.681
1.680
1.679
1.679
1.678
12.706
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.040
2.037
2.035
2.032
2.030
2.028
2.026
2.024
2.023
2.021
2.020
2.018
2.017
2.015
2.014
2.013
2.012
31.821
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.453
2.449
2.445
2.441
2.438
2.434
2.431
2.429
2.426
2.423
2.421
2.418
2.416
2.414
2.412
2.410
2.408
Amount of  in one tail.
0.05
0.025
0.01
0.005
df
0.10
63.657
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.744
2.738
2.733
2.728
2.724
2.719
2.715
2.712
2.708
2.704
2.701
2.698
2.695
2.692
2.690
2.687
2.685
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
80
85
90
95
100
200
300
400
500
600
700
800
900
1000
2000
Z*
1.299
1.299
1.299
1.298
1.298
1.298
1.297
1.297
1.297
1.297
1.296
1.296
1.296
1.296
1.295
1.295
1.295
1.295
1.295
1.294
1.294
1.294
1.294
1.294
1.293
1.293
1.293
1.293
1.292
1.292
1.291
1.291
1.290
1.286
1.284
1.284
1.283
1.283
1.283
1.283
1.282
1.282
1.282
1.282
1.677
1.677
1.676
1.675
1.675
1.674
1.674
1.673
1.673
1.672
1.672
1.671
1.671
1.670
1.670
1.669
1.669
1.669
1.668
1.668
1.668
1.667
1.667
1.667
1.666
1.666
1.666
1.665
1.664
1.663
1.662
1.661
1.660
1.653
1.650
1.649
1.648
1.647
1.647
1.647
1.647
1.646
1.646
1.645
2.407
2.405
2.403
2.402
2.400
2.399
2.397
2.396
2.395
2.394
2.392
2.391
2.390
2.389
2.388
2.387
2.386
2.385
2.384
2.383
2.382
2.382
2.381
2.380
2.379
2.379
2.378
2.377
2.374
2.371
2.368
2.366
2.364
2.345
2.339
2.336
2.334
2.333
2.332
2.331
2.330
2.330
2.328
2.326
2.682
2.680
2.678
2.676
2.674
2.672
2.670
2.668
2.667
2.665
2.663
2.662
2.660
2.659
2.657
2.656
2.655
2.654
2.652
2.651
2.650
2.649
2.648
2.647
2.646
2.645
2.644
2.643
2.639
2.635
2.632
2.629
2.626
2.601
2.592
2.588
2.586
2.584
2.583
2.582
2.581
2.581
2.578
2.576
80%
90%
95%
98%
Confidence Level C
99%
2.011
2.010
2.009
2.008
2.007
2.006
2.005
2.004
2.003
2.002
2.002
2.001
2.000
2.000
1.999
1.998
1.998
1.997
1.997
1.996
1.995
1.995
1.994
1.994
1.993
1.993
1.993
1.992
1.990
1.988
1.987
1.985
1.984
1.972
1.968
1.966
1.965
1.964
1.963
1.963
1.963
1.962
1.961
1.960
0.005
-4-
Math 160 - Cooley
Intro to Statistics
OCC
Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown
 Exercises:
1)
According to Communications Industry Forecast, published by Veronis Suhler Stevenson of New York,
NY, the average person watched 4.47 hours of television per day in 2000. A random sample of 40
people gave the following number of hours of television watched per day for last year. Find a 98%
confidence interval for the amount of television watched per day last year by the average person.
(Note: x = 4.615 hr and s = 2.277 hr.)
2)
The paper “Correlations between the Intrauterine Metabolic Environment and Blood Pressure in
Adolescent offspring of diabetic mothers” (Journal of Pediatrics, Vol. 136, Issue 5, pp. 587-592) by
N. Cho et al. presented findings of research on children of diabetic mothers. Past studies showed that
maternal diabetes results in obesity, blood pressure, and glucose tolerance complications in the
offspring. Following are the arterial blood pressures, in millimeters of mercury (mm Hg), for a random
sample of 16 children of diabetic mothers. Assume that the arterial blood pressures are normally
distributed.
81.6
84.6
84.1
101.9
87.6
90.8
82.8
94.0
82.0
69.4
88.9
78.9
86.7
75.2
96.4
91.0
The mean and standard deviation from the above data are 85.99 and 8.08 mm Hg, respectively.
a)
If you were to find a 95% confidence interval for the mean arterial blood pressure of all children
of diabetic mothers, would you use the z-interval procedure or the t-interval procedure?
b)
Now, find that 95% confidence interval.
-5-
Math 160 - Cooley
Intro to Statistics
OCC
Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown
 Sample Test Multiple Choice Questions:
3)
What generally happens to the sampling error as the sample size is decreased? It gets
A) smaller
B) larger
C) more predictable
D) less predictable
4)
Find the value of α that corresponds to a confidence level of 93%.
A) 0.1762
B) 1.48
C) ‒1.48
D) 0.93
E) 0.07
AB) 0.8238
For a t-curve with df = 24, find t0.05
A) 1.711
B) 2.797
C) 2.064
5)
6)
D) 2.069
E) 1.714
AB) 2.807
For a t-curve with a sample size of 10, find t0.01
A) 1.372
B) 2.821
C) 1.383
D) 3.169
E) 2.764
AB) 3.250
7)
For a t-curve with df = 15, find the two t-values that divide the area under the curve into a middle 0.95
area and two outside areas of 0.025.
A) 0 & 2.131
B) –2.145 & 2.145
C) 0 & 2.145
D) –2.131 & 2.131
8)
Which of the following statements regarding t-curves is/are false?
I.
The total area under a t-curve with 20 degrees of freedom is greater than the area under the
standard normal curve.
II. The t-curve with 1000 degrees of freedom is distinctively flatter and wider than the
standard normal curve.
III. The t-curve with 20 degrees of freedom more closely resembles the standard normal curve
than the t-curve with 25 degrees of freedom.
A) I and II
B) I and III
C) II and III
D) I, II, and III
9)
Suppose that you wish to obtain a confidence interval for a population mean. Under the conditions
described below, should you use the z-interval procedure, the t-interval procedure, or neither?
- The population standard deviation is known.
- The population is normally distributed.
- The sample size is small.
- The sample is a simple random sample.
A) t-interval procedure
B) neither
C) z-interval procedure
10)
Suppose that you wish to obtain a confidence interval for a population mean. Under the conditions
described below, should you use the z-interval procedure, the t-interval procedure, or neither?
- The sample size is small.
- The population standard deviation is unknown.
- The population comes from a Binomial distribution.
- The sample is a simple random sample.
A) t-interval procedure
B) neither
C) z-interval procedure
11)
Suppose that you wish to obtain a confidence interval for a population mean. Under the conditions
described below, should you use the z-interval procedure, the t-interval procedure, or neither?
- The population standard deviation is unknown.
- The population comes from an unknown distribution.
- The sample is a simple random sample.
- The sample size is large.
A) t-interval procedure
B) neither
C) z-interval procedure
-6-
```
Related documents