Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Basic Practice of
Statistics
7th Edition
Lecture PowerPoint Slides
In Chapter 21, We Cover …
 Two-sample problems
 Comparing two population means
 Two-sample t procedures
 Robustness again
 Avoid the pooled two-sample t
procedures*
 Avoid inference about standard
deviations*
Two-Sample Problems
Suppose we want to compare the mean of some quantitative variable
for the individuals in two populations―Population 1 and Population 2.
Our parameters of interest are the population means µ1 and µ2.
The best approach is to take separate random samples from each
population and to compare the sample means.
We use the mean response in the two groups to make the
comparison. Here’s a table that summarizes these two situations:
3
Comparing Two Population Means
CONDITIONS FOR INFERENCE COMPARING TWO MEANS
 We have two SRSs from two distinct populations. The samples are
independent. That is, one sample has no influence on the other. We measure
the same response variable for both samples.
 Both populations are Normally distributed. The means and standard
deviations of the populations are unknown. In practice, it is enough that the
distributions have similar shapes and that the data have no strong outliers.
 Call the variable 𝑥1 in the first population and 𝑥2 in the second because the
variable may have different distributions in the two populations.
 Here is how we describe the two populations:
Population
Variable
Mean
Standard
deviation
1
𝑥1
𝜇1
𝜎1
2
𝑥2
𝜇2
𝜎2
Comparing Two Population Means
 Here is how we describe the two samples:
Population
Sample size
Sample mean
Sample
standard
deviation
1
𝑛1
𝑥1
𝑠1
2
𝑛2
𝑥2
𝑠2
 To do inference about the difference 𝜇1 − 𝜇2 between
the means of two populations, we start from the
difference 𝑥1 − 𝑥2 between the means of the two
samples.
Two-Sample t Procedures
 To take variation into account, we would like to standardize the
observed difference 𝑥1 − 𝑥2 by subtracting its mean, 𝜇1 − 𝜇2 , and
dividing the result by its standard deviation.
When the Independent condition is met, the standard deviation of the statistic
x1 - x 2 is :
s x -x =
1
2
s 12
n1
+
s 22
n2
 Because we don't know the population standard deviations, we estimate
them by the sample standard deviations from our two samples. The
result is the standard error, or estimated standard deviation, of the
difference in sample means:
𝑆𝐸𝑥1−𝑥2 =
𝑠12 𝑠22
+
𝑛1 𝑛2
Two-Sample t Procedures
 When we standardize the estimate by subtracting its mean, 𝜇1 − 𝜇2 ,
and dividing the result by its standard error, the result is the twosample t statistic:
𝑥1 − 𝑥2 − 𝜇1 − 𝜇2
𝑡=
𝑠12 𝑠22
+
𝑛1 𝑛2
 The two-sample t statistic has approximately a t distribution. It does not
have exactly a t distribution, even if the populations are both exactly
Normal. In practice, however, the approximation is very accurate.
There are two practical options for using the two-sample t procedures:
 We can use technology to determine degrees of freedom OR we can
use the smaller of n1 – 1 and n2 – 1 for the degrees of freedom.
Two-Sample t Procedures:
Confidence Interval for µ1 - µ2
THE TWO-SAMPLE t PROCEDURES
 Draw an SRS of size 𝑛1 from a large Normal population with unknown
mean 𝜇1 , and draw an independent SRS of size 𝑛2 from another large
Normal population with unknown mean 𝜇2 . A level C confidence
interval for 𝝁𝟏 − 𝝁𝟐 is given by
𝑥1 − 𝑥2 ± 𝑡 ∗
𝑠12 𝑠22
+
𝑛1 𝑛2
 Here, 𝑡 ∗ is the critical value for confidence level C for the t distribution
with degrees of freedom from either Option 1 (software) or Option 2
(the smaller of 𝑛1 − 1 and 𝑛2 − 1).
Example
 STATE: People gain weight when they take in more energy from food
than they expend. James Levine and his collaborators at the Mayo
Clinic investigated the link between obesity and energy spent on daily
activity with data from a study with 𝑛1 = 𝑛2 = 10 health volunteers; 10
who were lean, 10 who were mildly obese but still healthy. They
wanted to address the question: Do lean and obese people differ in
the average time they spend standing and walking?
 PLAN: Give a 90% confidence interval for 𝜇1 − 𝜇2 , the difference in
average daily minutes spent standing and walking between lean and
mildly obese adults.
 SOLVE: Examination of the data reveals all conditions for inference
can be (at least reasonably) assumed; the distributions are a bit
irregular, but with only 10 observations this is to be expected.
Example
 SOLVE: (cont’d) The descriptive statistics:
Group
𝑛
Mean, 𝒙
1 (lean)
10
525.751
107.121
2 (obese)
10
373.269
67.498
Std. Dev., s
 For using Option 2 (conservative degrees of freedom in absence of
technology), 𝑛1 − 1 = 𝑛2 − 1 = 9, and t* = 1.833, giving:

𝑥1 − 𝑥2 ± 𝑡
∗
𝑠12
𝑛1
+
𝑠22
𝑛2
= 525.751 − 373.269 ± 1.833
107.1212
10
+
67.4982
10
 = 152.482 ± 73.390= 79.09 to 225.87 minutes
 Software using Option 1 gives df = 15.174 and t* = 1.752, for a confidence
interval of 82.35 to 222.62 minutes—narrower because Option 2 is
conservative.
 CONCLUDE: Whichever interval we report, we are (at least) 90% confident
that the mean difference in average daily minutes spent standing and
walking between lean and mildly obese adults lies in this interval.
Two-Sample t Procedures:
Two-Sample t Test
THE TWO-SAMPLE t PROCEDURES
 To test the hypothesis 𝑯𝟎 : 𝝁𝟏 = 𝝁𝟐 , calculate the two-sample t
statistic:
𝑥1 − 𝑥2
𝑡=
𝑠12 𝑠22
+
𝑛1 𝑛2
 Find P-values from the t distribution with degrees of freedom from
either Option 1 (software) or Option 2 (the smaller of 𝑛1 − 1 and
𝑛2 − 1).
Two-Sample t Test
Two-Sample t Test for the Difference Between Two Means
Suppose the Random, Normal, and Independent conditions are met. To
test the hypothesis H 0 : m1 - m2 = hypothesized value, compute the t statistic
t=
(x1 - x2 )
s12 s22
+
n1 n2
Find the P-value by calculating the probabilty of getting a t statistic this large
or larger in the direction specified by the alternative hypothesis H a. Use the
t distribution with degrees of freedom approximated by technology or the
smaller of n1 -1 and n2 -1.
12
Example
COMMUNITY SERVICE AND ATTACHMENT TO FRIENDS
 STATE: Do college students who have volunteered for community
service work differ from those who have not? A study obtained data from
57 students who had done service work and 17 who had not. One of the
response variables was a measure of attachment to friends. Here are
the results:
Group
Condition
𝑛
𝒙
1
Service
57
105.32
14.68
2
No service
17
96.82
14.26
s
 PLAN: The investigator had no specific direction for the difference in
mind before looking at the data, so the alternative is two-sided. We will
test the following hypotheses:
𝐻0 : 𝜇1 = 𝜇2
𝐻𝑎 : 𝜇1 ≠ 𝜇2
Example
 SOLVE: The two-sample t statistic:
𝑡
=
𝑥1 − 𝑥2
𝑠12 𝑠22
+
𝑛1 𝑛2
=
105.32 − 96.82
14.682 14.262
+
17
57
= 8.5 3.9677 = 2.142
 Software (Option 1) says that the two-sided P-value is 0.0414.
 For using Option 2, 𝑛1 − 1 = 56, 𝑛2 − 1 = 16, and therefore comparing
our test statistic of 2.142 to two-sided critical values of a t(16)
distribution, Table C shows the P-value is between 0.05 and 0.04.
 CONCLUDE: The data give moderately strong evidence (P < 0.05) that
students who have engaged in community service are, on the average,
more attached to their friends.
Robustness Again
 The two-sample t procedures are more robust than the one-sample t
methods, particularly when the distributions are not symmetric.
 When the sizes of the two samples are equal and the two populations
being compared have distributions with similar shapes, probability
values from the t table are quite accurate for a broad range of
distributions when the sample sizes are as small as 𝑛1 = 𝑛2 = 5.
When the two population distributions have different shapes, larger
samples are needed.
 As a guide to practice, adapt the guidelines for one-sample t
procedures to two-sample procedures by replacing “sample size” with
the “sum of the sample sizes,” 𝑛1 + 𝑛2 .
 Caution: In planning a two-sample study, choose equal sample sizes
whenever possible. The two-sample t procedures are most robust
against non-Normality in this case, and the conservative Option 2
probability values are most accurate.
Avoid the Pooled Two-Sample t
Procedures*
 Many calculators and software packages offer a
choice of two-sample t statistics. One is often labeled
for “unequal” variances; the other for “equal”
variances.
 The “unequal” variance procedure is our two-sample t.
 Never use the pooled t procedures if you have
software or technology that will implement the
“unequal” variance procedure.
Avoid Inference About Standard
Deviations*
 There are methods for inference about the standard
deviations of Normal populations. The most common
such method is the “F test” for comparing the standard
deviations of two Normal populations.
 Unlike the t procedures for means, the F test for
standard deviations is extremely sensitive to nonNormal distributions.
 We do not recommend trying to do inference about
population standard deviations in basic statistical
practice.