Download Two Sample Inferential Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Basic Statistics
Inferences About Two
Population Means
STRUCTURE OF STATISTICS
TABULAR
DESCRIPTIVE
GRAPHICAL
NUMERICAL
STATISTICS
ESTIMATION
INFERENTIAL
TESTS OF
HYPOTHESIS
Research situation for independent
two-samples t-Test
A social psychologist wanted to determine if the
development of “generosity”was related to the
gender of children. As a pilot study the
psychologist obtained a random sample of 4-yearold boys and girls.
In a group setting, each child was given 16 small
pieces of candy and asked to “put some in a sack
for your very best friend.” The numbers of pieces
of candy set aside for friend by 10 girls and 10
boys are shown below:
The Research Design
In a group setting, each child was given 16
small pieces of candy and asked to “put some
in a sack for your very best friend.” The
numbers of pieces of candy set aside for the
friend by 10 girls and 10 boys are shown
below:
Gender
Boy
Girl
n=10
n=10
X= 10
X=12
DV:Generosity
Step by Step: The Two-Sample Test
of Hypothesis Using the t-test.
1. State Research Problem or
Question
2. Establish the Hypotheses
3. Establish Level of
Significance
4. Collect Data
5. Calculate Statistical Test
6. Interpret the Results
State the Research Hypothesis
There is a difference between 4-year-old boys and
girls in their levels of generosity.
Gender
Independent Variable
Boy
difference
Generosity
Dependent Variable
Girl
Research or Alternative Hypothesis
The research question is “Is there is a difference in
the development of generosity between boys and girls.
Therefore, the
research hypothesis is:
H a : boy   girl
Setting the Null Hypothesis
The null hypothesis is set by
“nullifying” the research hypothesis.
H 0 : boy   girl
Since μboy = μgirl can be written as μboy – μgirl = 0,
the null hypothesis can be written:
H 0 : boy   girl  0
Population
4-year-old
BOYS
?
Difference
4-year-old
GIRLS
GIRL
Research Hypothesis
Random sampling
Random Sample
Measurement of DV
Calculation of mean
n=10
n=10
boys
girls
Generosity
Generosity
X Boy
difference
X Girl
Identify the Test Statistic
We will be using Confidence Intervals to test
the hypotheses about the differences in two
population means.
 S 2 S 2
p
p 

CI  ( X 1  X 2 )  t

 n

n
1
2


We have already seen that we can estimate the
individual population means with sample means.
We have also seen that the null hypothesis can be
written as follows:
H 0 : 1  2 can be written 1  2  0
If we consider μ1 – μ2 the parameter we are
estimating, we can estimate it with: X 1  X 2
While we do not know the sampling distribution
of the difference, we do know it for both of the
sample means individually. We must find out
how to combine them. We will illustrate how it
might be done with a simple example.
Consider the following example:
We have the following
two distributions of
X1 and X2:
X1
f
1
2
1
1
X 1  1.5
X2
f
1
2
1
1
X 2  1.5
We are going to combine these two
distributions into one distribution of
(X1 - X2):
X1 - X2
1-2 = -1
1-1 = 0
2-2 = 0
2-1 = 1
-1
0
1
f
1
2
1
1
Mean of X1- X2 = 0
Range of X1 = 2 – 1 = 1
Range of X2 = 2 – 1 = 1
Range of (X1 – X2) = 1 – (-1) = 2
What do we know for the problem at hand?
1. From the CLT, we know that the sample means from the
population of boys and the population of girls (sampling
distributions) are distributed approximately normally.
2. We know that the means of the original distributions of boys
and girls (μboys and μgirls) have the same population means as
the sampling distributions.
3. We also know that the standard deviations of the sampling
distributions are the same as those in the original distributions
of boys and girls, except they are divided by the square roots of
the sample sizes.
4. Finally, we know from the demonstration on the previous
slide that the mean of the difference is the difference in the
means and variability of the difference is the sum of the
variability of the individual distributions.
Deriving a Sampling Distribution of
Mean Difference
XA  XB
0
 A  B  0
Standard error of difference
for independent-samples = S x1  x 2 
2
sp
n1

2
sp
n2
Calculating the “Pooled” Variance
s
2
pooled
 (n1 1)s  (n2 1)s
2
1
2
2
This variance is referred to as the
“pooled” variance since it contains the
appropriate (weighted by the sample
sizes) amount of information from each of
the two samples.
Testing with Confidence
Intervals and t-Test
• The formula for the confidence interval for
two independent samples is:
 S 2 S 2
p
p 
CI  ( X 1  X 2 )  t 

 n
n2 
1


• The formula for the two-sample t-test is
t 
( X 1  X 2 )  ( 1   2 )
Sp
2
n1

Sp
2
n2
Note that 1-2 is
hypothesized to be 0!
Conducting the Statistical Test: We will
use the 95% Confidence Interval
From our problem:
nboys = 10
ngirls = 10
X 1 = 10
X 2 = 12
Sboys = 2.5
Sgirls = 3.0
S x1  x 2 
s
2
p
2
sp
n1

2
sp
n2
=
 7.625   7.625 
 10    10   .7625  .7625  1.23
6.25(9)  9(9) 137.25


 7.625
10  10  2
18
The 95% Confidence Interval
 S 2 S 2
p
p 
95%CI  ( X 1  X 2 )  t 

 n
n2 
1


= 12-10 + 2.262(1.23) = 2 +2.78 = 4.78
and
= 2 – 2.78 = -0.78
We are 95% confident that the mean difference
between boy’s and girl’s generosity is between
–0.78 and 4.78. Since 0 is in the interval, we accept
the Null Hypothesis of no difference in generosity.
A Graphical Representation of Results
Sampling distribution of mean differences
X Boy  X Girl
0
95% Confidence Interval
-0.78
+4.78
The “Dependent” Samples
t-test
The previous example assumed independent
random sampling. What if the two samples
are dependent on each other?
An Example
• Assume that the government plans to
evaluate its campaign to conserve
gasoline. Twelve families are randomly
selected and their gasoline consumption is
measured before and after the campaign.
The data are presented on the next slide.
• This problem is on page 322 of your text
using the t-statistic. Compare the
answers!
The Data
Family
Before
After
Difference
Difference2
A
55
48
7
49
B
43
38
5
25
C
51
53
-2
4
D
62
58
4
16
E
35
36
-1
1
F
48
42
6
36
G
58
55
3
9
H
45
40
5
25
I
48
49
-1
1
J
54
50
4
16
K
56
58
-2
4
L
32
25
7
49
d = 35
d2 = 235
Total
Confidence Interval for
Dependent Samples
 Sd 
CI  d  t 

 n
In essence, we will treat the differences in
the two samples as if we were calculating a
one-sample confidence interval. We must
calculate the mean difference d and the
standard deviation of the differences Sd
Calculating the 95% Confidence
Interval
( d)2
35 2
235 
d  n
12  12.08
Sd 

n1
11
2
 Sd 
 12.08 
95%CI  d  t
  11.8  2.201
  11.8  7.68
 n
 12 
Thus, we estimate, at a 95% level of confidence, that
the real difference is between 4.12 and 19.48 gallons
and we reject the Null Hypothesis and conclude the
campaign did affect gas consumption. (see page 324)
Summary of Two Sample Tests
• We can use confidence intervals to test an
hypothesis about the difference in two
independent samples.
• We can also use confidence intervals to test
an hypothesis about the difference in two
dependent samples.
• The conclusions reached using confidence
intervals are exactly the same as using the tstatistic.
Related documents