Download Inferences about the Difference in Two Population Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Inferences about the Difference in Two Population Means
 Independent Samples
 Paired (or Related) Samples
 When you finish these notes, for each procedure, you should know:
a. When to use each
b. their requirements
c. How to determine if the requirements are met
d. How to test a hypothesis about the difference
e. How to estimate the difference in means
1. Inferences about Difference in Two Means for Independent Samples
1.1 Notes
1.1.1 When it should be used
o You wish to infer about the difference (on average) about two populations
o You have collected a sample from each population
Example of use:
Best Management Practices: (search for two sample)
https://my.sfwmd.gov/pls/portal/docs/PAGE/COMMON/NEWSR/rog_final_schedule_4_2_0
1_12_09.pdf
1.1.2 Requirements for t distribution
o
o
o
o
Same variance: Both populations have the same unknown variation
Independence: Random sample from population 1 and another independent random
sample from population 2
Normality: Both populations are normally distributed or large sample sizes
Use the tests from NCSS to check the assumptions of normality and equal variance
1.1.3 Estimate of difference in population means
o
Differences in sample means
1.1.4 Degrees of freedom
o
o
n1-1 from sample one and n2-1 from sample two
Total degrees of freedom = (n1+n2-2):
1.1.5 Variance Estimate
o
o
o
o
Since population variances are the same only one estimate is needed.
Use information from both sample variances
Weight the variances by the percent of information from each sample using the
degrees of freedom
Estimate is then a pooled variance:
 n1  1  2  n2  1  2
S p2  
 S1  
S2
 n1  n2  2 
 n1  n2  2 
or
 (n  1) S12  (n2  1)S22 
S p2   1

n1  n2  2


o
Example: the first sample of size 91 has a variance of 10 and the second sample of
size 11 has a variance of 30

total degrees of freedom = 100

first variance is multiplied by 0.90 and second by 0.10

pooled variance is then 0.90*10+0.10*30=12

1.1.6 Standard error of difference in sample means



Standard errors: standard deviation divided by square root of sample size
Here we have two samples and therefore two sample sizes
Estimate of standard error is
S( x1 x2 )  S p
1 1

n1 n2
1.2 Hypothesis test of differences in population averages – Used when you are testing a given value
of the difference
1.2.1 Approach







Overall: See if there is too much distance between the sample difference in means and
the hypothesized difference in number of standard errors.
Step 1: Determine what you wish to show; this goes in the alternative hypothesis.
Step 2: Determine the null hypothesis. This hypothesis must contain an equal sign
specifying the hypothesized difference.
Step 3: Determine what values of sample differences would reject the null and support
what you wish to show. Use a t-table to determine how far you would have to go to support
the null.
Step 4. Calculate how far the difference in sample means from the hypothesized value in
number of standard errors. Using step 3, decide if the data supports the alternative
Step 4 alternative. Calculate how far your sample difference is from the hypothesized
value in number of standard errors. If the likelihood of this is small (less than ), you can
reject the null and support the alternative.
Step 5: Restate you conclusion in terms of the problem.
1.2.2 Example

Is there a difference in average fill of two box filling machines? A random sample of 25
boxes from the first machine showed a mean of 379.5 ounces with a standard deviation of 15
while a random sample of 25 boxes from the second machine showed a sample average of
374.5 with a standard deviation of 14. The pooled standard deviation is then 14.509 Test at
the 0.05 level.

H0  = 0 (There is no difference in average fill between two machines)
H1:   0 (There is a difference in average fill between two machines)


This is a two-sided rejection region since only sample differences far above zero or far
below zero would cause you to reject the null and support the alternative.
Therefore degrees of freedom = 24+24
and t
= 2.0106
48, 0.025
Rejection Region:
t > 2.0106 or t < -2.0106

Test statistic t 

( X 1  X 2 )  ( 1   2 ) (379.5  374.5)  (0)

 1.22
S ( x1 x2 )
1
1
14.509

25 25
One of the two sides for the p-value is found by finding 1.22 in the t-table on row 48.
It falls between the 0.10 and 0.25 column. The p-value is then twice that or between
0.20 and 0.50
 Make statistical decision. Fail to reject that  - = 0.
 Conclusion: We do not have enough evidence to conclude that there is a difference in average
fill of two machines.
1.3 Confidence interval for difference in population averages
1.3.1 Notes
 Uses the same parts and requirements as a hypothesis test
1.3.2 Margin of error
 Knowledge: Sample sizes
 Confidence: t-table value
 Variance: pooled estimate of variance
1.3.3 Formula:
 difference in sample averages plus and minus the margin of error
1.3.4 Example : What is the difference in average fill of two machines that fill boxes of cereal? A
random sample of 25 boxes from the first machine showed a mean of 379.5 ounces with a standard
deviation of 15 while a random sample of 25 boxes from the second machine showed a sample
average of 374.5 with a standard deviation of 14. The pooled standard deviation is then 14.509 Use a
90% confidence level.

Formula: t  ( X 1  X 2 )  (t n1 n1 2 )S( x1  x2 )
o
t
o
S ( x1  x2 )  S p
48, 0.05
= 1.6772
1 1
1
1

 (14.509)

 4.10
n1 n2
25 25
(379.5  274.5)  1.6772(4.10) 
5  6.8

Substitution:

Conclusion: With 90% confidence we can say that the average fill of machine one is 5
ounces more than the average fill of machine two with a margin of error of 6.8 ounces.
2. Inferences about Difference in Two Means for Related Samples
1.1 Notes
2.1.1 When it should be used



You wish to infer about a difference in means
You have a random sample of a pair of values
Examples:
o exam 1 and exam 2 grades for each of 20 students
o restaurant 1 sales and restaurant 2 sales on the same 10 days
o Assessed value and sales prices for each of 15 houses
2.1.2 Requirements for t distribution



Random sample of pairs
Difference in paired values is normally distributed or a large number of pairs
Use NCSS to test the assumption of normality
2.1.3 Estimate of difference in population means

Average of the differences between paired values in the sample
2.1.4 Degrees of freedom

(n-1) where n is the number of pairs
2.1.5 Variance Estimate

Find the sample variance of the differences in paired values
2.1.6 Standard error of difference in sample means

Standard errors: standard deviation divided by square root of sample size
2.2 Hypothesis test of differences in population averages
2.2.1 Approach







Overall: See if there is too much distance between the sample average difference and from
the hypothesized difference in number of standard errors.
Step 1: Determine what you wish to show; this goes in the alternative hypothesis.
Step 2: Determine the null hypothesis. This hypothesis must contain an equal sign
specifying the hypothesized difference.
Step 3: Determine what values of sample average difference would reject the null and
support what you wish to show. Use a t-table with n-1 degrees of freedom to determine how
far you would have to go to support the null.
Step 4. Calculate how far you sample mean difference is from the hypothesized mean
difference in number of standard errors. Using step 3, decide if the data supports the
alternative
Step 4 alternative. Calculate how far your sample average difference is from the
hypothesized value in number of standard errors. If the likelihood of this is small (less than
), you can reject the null and support the alternative.
Step 5: Restate you conclusion in terms of the problem.
2.2.2 Example

Is there a difference in average number of customers served by two workers doing similar
jobs? Because the demand is not the same on different days, a random sample of ten days is
selected and the number of customers served is measure for both workers on those days. The
sample mean difference is 1.4 with a sample standard deviation of 3.5
Worker
1
2
Difference




1
20
18
2
2
19
21
-2
3
14
11
3
4
3
2
1
Day
5
24
14
10
6
14
15
-1
7
9
9
0
8
14
16
-2
9
11
10
1
H0  = 0 (The average difference between workers is zero)
H1   0 (There is a difference between workers on average.)
This is a two-sided rejection region since only sample averages far above zero or far
below zero would cause you to reject the null and support the alternative.
Degrees of freedom = 9 and t-table = 2.262

Rejection Region: Reject Ho if t < -2.262 or t > 2.262

Test statistic given the sample mean is 1.4 and the sample standard deviation is 3.47
X   (1.4  0)
t

 1.28
 s   3.47 

 

 n   10 



One of the two sides for the p-value is found by finding 1.28 in the t-table on row 9. It falls
between the 0.025 and 0.05 columns. The p-value is then between 0.05 and 0.10
10
18
16
2
 Make statistical decision.
 Conclusion: We do not have enough evidence to conclude that there a difference in average
number of customers served by two workers doing similar jobs
2.3 Confidence interval for difference in population averages
2.3.1 Notes
 Uses the same parts and requirements as a hypothesis test
2.3.2 Margin of error
 Knowledge: number of pairs
 Confidence: t-table value
 Variance: sample variance of differences
2.3.3 Formula:
 Sample average difference plus and minus the margin of error
2.3.4 Example :
 What is the difference in average number of customers served by two workers doing similar
jobs? The sample mean difference is 1.4 with a sample standard deviation of 3.5, Use a 99%
confidence level.

Formula: t  X  (t n 1 )
o
S
n
t = 3.250
9
o


 3.5 
1.4  3.25
 
 10 
Substitution:
1.4  3.6
Conclusion: With 99% confidence we can say that the number of customers served by
worker one is, on average, 1.4 more than worker two with a margin of error of 3.6 people.
3. Use in Business: Government reports on small business research
http://www.sba.gov/advo/research/rs205.pdf#search=%22t-tests%20business%20-.edu%22