Download Class Notes Mar 31

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

Transcript
Class Notes
Mar 31
12.4 General Confidence Interval Procedure for One Mean
As we recall, the general format of a confidence interval is:
sample estimate  multiplier  standard error.
When the parameter is the mean of a population, the sample estimate is
x (the sample mean) and the standard error is s.e.( x ) =
s
. Moreover, the
n
multiplier is denoted as t*. The value of t* is determined using a probability
distribution called the Student’s t-distribution or just t-distribution.
Note: When we studied the confidence interval for sample proportion, the
value of z* (multiplier) was determined using normal distribution. The
difference here is the different probability distributions we use for sample
mean and sample proportion.
A parameter called degree of freedom, abbreviated as df, is associated with
any t-distribution. For problems involving inference about a single mean,
df = n-1, where n is the sample size.
Features of t-distribution:
1. The t-distribution has a bell shape, centered at 0.
2. The t-distribution looks like a standard normal distribution except it is
more spread out than the standard normal distribution.
3. The t-distribution will be very close to a standard normal distribution
when its df is large enough.
Conditions to satisfy in order to use a “t” confidence interval:
1. The population is bell-shaped and the sample is a random sample.
That is, for small sample, the data should show no extreme skewness
and should not contain any outliers.
2. If the population is not bell-shaped, a large random sample (n  30)
will do. But if there are extreme outliers, it is better to have a larger
sample.
How to determine the t* Multiplier? Learn to use Table 12.1, on Pg. 451.
How to calculate a confidence interval for a population mean? We use 6
steps:
1. Make sure the appropriate condition apply;
2. Determine the sample mean and standard deviation ( x and s);
3. Calculate the standard error of the mean. s.e.( x ) =
s
;
n
4. Calculate df = n-1 and choose a confidence level.
5. Use Table 12.1 (or statistical software) to find t*;
6. The interval is x  t*  s.e.( x ) which is x  t* 
s
.
n
Question 1: Suppose we want to know if the average number of CDs that
PSU students own is smaller than 24. We draw a random sample of 250
students. The following is the Minitab output:
Descriptive Statistics: C1
Variable
C1
N
250
Mean
24.867
Median
25.459
TrMean
24.885
Variable
C1
Minimum
-6.503
Maximum
54.147
Q1
18.998
Q3
30.838
StDev
10.195
SE Mean
0.645
Use the 6 steps we mentioned before to find out a 95% confidence interval
for the average number of CDs PSU students own. Is 24 included in this
interval? How do you interpret the fact that 24 is (or not) included in this
confidence interval?
Interpretation of the confidence Intervals for sample mean: Each interval
indicates the range of values that probably covers the true average of the
population.
12.5 General Confidence Interval for the Difference Between
Two Means
Suppose we want to know if there is any difference between the average
number of CDs that the male students own and the average number of CDs
that the female students own. One way to express our question is to write
the null hypothesis as:
H0 : female - male = 0
(because female - male represents the difference between the average number
of CDs of 2 samples, one is male, one is female)
The general format of a confidence interval for the difference in two means
is:
difference in sample means  t*  standard error, where
“difference in sample means” is x1  x2 ;
standard error is
s12 s 22
( s1 , s2 are sample standard deviations for these 2

n1 n2
samples; n1, n2 are sample sizes).
In the example we have, in order to calculate the confidence interval for the
difference between the average number of CDs that the male students own
and the average number of CDs that the female students own, we
1. Check conditions on males and females separately to see if we can use
a confidence interval.
2. Calculate the sample means, x1 and x2 first, then compute the
difference in sample means x1  x2 ;
3. Identify the sample sizes and standard deviations for the male and
female samples (usually from Minitab output);
4. Find out t* using table 12.1 (we won’t do this because the df formula
is too complicated);
5. Calculate the confidence interval.
Note: We can use this method only for 2 independent samples.
Question 2: Suppose we want to compare the average GPA of male and
female Stat 200 students. Using the data we used on Thursday, try to apply
the six steps on this problem (the Minitab output is attached).
Two-Sample T-Test and CI: GPA, Gender
Two-sample T for GPA
Gender
female
male
N
257
156
Mean
3.094
2.950
StDev
0.510
0.566
SE Mean
0.032
0.045
Difference = mu (female) - mu (male )
Estimate for difference: 0.1438
95% CI for difference: (0.0348, 0.2527)
T-Test of difference = 0 (vs not =): T-Value = 2.60
P-Value = 0.010
DF = 301
We see the 95% confidence interval for the difference between the mean of
the males and the mean of females is (0.0348, 0.2527). Since “0” is not
included in this interval, we can say there is a difference between the mean
of males and the mean of the females.
Interpretation of our 95% confidence interval in this case: This 95%
confidence interval tells us that 95% of all those differences between those 2
means (if we repeat the same experiment a large amount of times and
compute a difference for each time) will fall in the range of (0.0348,
0.2527).
It is sometimes reasonable to assume the equal variance. This is when we
want to use pooled standard deviation. The equal variance assumption
comes when we have reason to believe the 2 populations that we are
interested in have the same variability. In practical, if the standard
deviations of both samples are about the same, we have a good reason to use
the pooled standard deviation.
Using the previous example, we see the standard deviations for both males
and females are about the same (how do you make this judgment?). Hence,
we try to use the pooled standard deviation. The output is the following:
Two-Sample T-Test and CI: GPA, Gender
Two-sample T for GPA
Gender
N
Mean
StDev
SE Mean
female
male
257
156
3.094
2.950
0.510
0.566
0.032
0.045
Difference = mu (female) - mu (male )
Estimate for difference: 0.1438
95% CI for difference: (0.0377, 0.2499)
T-Test of difference = 0 (vs not =): T-Value = 2.66
P-Value = 0.008
DF = 411
The result is very much the same as the one that we had using un-pooled
standard deviation except the df now is much bigger.
12.6 The Difference Between Two Proportions (Independent)
Again, we have the general format:
sample estimate  multiplier  standard error, where
the sample estimate is pˆ1  pˆ 2 ;
standard error is s.e.( pˆ1  pˆ 2 ) =
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
;

n1
n2
multiplier is denoted as z* and is determined using the standard normal
distribution (notice that t* is used for the difference between two means).
Conditions for a confidence interval for the difference in two proportions:
1. Sample proportions are available based on independent samples from
the two populations.
2. All of the quantities n1 pˆ 1, n2 pˆ 2, n1 (1  pˆ 1 ), and n2 (1  pˆ 2 ) are at least 10.
For example, if we want to see if there is any difference between the
proportions of right-handedness among male and female Stat 200 students,
we can use 95% confidence interval to solve this question. The Minitab
output is the following:
Test and CI for Two Proportions: Handed, Gender
Success = right-handed
Gender
female
male
X
236
129
N
258
156
Sample p
0.914729
0.826923
Estimate for p(female) - p(male): 0.0878056
95% CI for p(female) - p(male): (0.0193535, 0.156258)
Test for p(female) - p(male) = 0 (vs not = 0): Z = 2.51
P-Value = 0.012
The 95% confidence interval for the difference between the proportions of
males and females is (0.019, 0.156). Since “0” is not included in this
interval, we say there is a difference between the proportions of males and
females.
Interpretation of our 95% confidence interval: This interval tells us that
that 95% of all differences between those 2 proportions (if we repeat the
same experiment a large amount of times and compute a difference for each
time) will fall in the range of (0.019, 0.156).
Question 3: Decide if the following cases are doing a CI for one proportion,
the difference of 2 proportions, one mean, or the difference of 2 means.
1. In order to compare the proportions of males and females students
who have at least one tattoo, we draw a sample of 200 students for
each gender.
2. State A claims that they have a higher average income tax than the
income tax of State B.
3. To obtain the average time one needs driving from State College to
NYC, we ask 200 students to drive from State College to NYC on the
same day.
4. To see which gender is more likely to smoke, we use a sample of
males and a sample of females. We then compare the ratios of
smokers in each sample.
5. We want to see if a new drug has a cure rate of 65%.