Download Review of Basic Statistical Concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MGT 2120: Chapter 10
Statistical Inference about Means and Proportions with Two Populations
Two population means
Notations:
1 = Mean of population 1
1 = Standard deviation of population 1
2 = Mean of population 2
2 = Standard deviation of population 2
Sample 1 = Sample taken from population 1
n1 = Sample size of sample 1
π‘₯Μ…1 = Mean of sample 1
s1 = Standard deviation of sample 1
Sample 2 = Sample taken from population 2
n2 = Sample size of sample 2
π‘₯Μ…2 = Mean of sample 2
s2 = Standard deviation of sample 2
§10.1 Inferences about the difference between two population means: 1 and 2 are known
Parameter of interest = 1 – 2
Point estimate of 1 – 2 = π‘₯Μ…1 - π‘₯Μ…2
Expected value of π‘₯Μ…1 - π‘₯Μ… 2 = E(π‘₯Μ…1 - π‘₯Μ…2 ) = 1 – 2
𝜎2
𝜎2
1
2
Standard error of π‘₯Μ…1 - π‘₯Μ…2 = √ n1 + n2
If n1 and n2 are both large (i.e. > 30), then π‘₯Μ…1 - π‘₯Μ…2 will follow an approximately normal
distribution.
Confidence interval for 1 – 2:
𝜎2
𝜎2
1
2
Use formula 10.4, page 410: (π‘₯Μ…1 - π‘₯Μ…2) ± Z/2 √ n1 + n2
Hypothesis testing:
Ho: 1 – 2 β‰₯ D0
Ha: 1 – 2 < D0
One-tail (left) test
Test statistic zcalc =
Ho: 1 – 2 ≀ D0
Ha: 1 – 2 > D0
One-tail (right) test
Ho: 1 – 2 = D0
Ha: 1 – 2 β‰  D0
Two-tail test
(π‘₯Μ… 1 βˆ’π‘₯Μ… 2 )βˆ’π·0
2
2
n1
n2
𝜎
𝜎
√ 1+ 2
1 and 2 are rarely known if ever, so we will not work out an example.
§10.2 Inferences about the difference between two population means: 1 and 2 are unknown
Case 1: 1 β‰  2
Parameter of interest = 1 – 2
Point estimate of 1 – 2 = π‘₯Μ…1 - π‘₯Μ…2
Expected value of π‘₯Μ…1 - π‘₯Μ… 2 = E(π‘₯Μ…1 - π‘₯Μ…2 ) = 1 – 2
𝑆2
𝑆2
1
2
Standard error of π‘₯Μ…1 - π‘₯Μ…2 = √n1 + n2 with df = Formula 10.7, page 416
If n1 and n2 are both large (i.e. > 30), then π‘₯Μ…1 - π‘₯Μ…2 will follow an approximately normal
distribution.
Confidence interval for 1 – 2:
𝑆2
𝑆2
1
2
Use formula 10.6, page 416: (π‘₯Μ…1 - π‘₯Μ…2) ± t/2,df √n1 + n2
Note: df is given by the formula 10.7, page 416; Data Analysis will provide this number for us.
Hypothesis testing:
Ho: 1 – 2 β‰₯ D0
Ha: 1 – 2 < D0
One-tail (left) test
Test statistic tcalc =
Ho: 1 – 2 ≀ D0
Ha: 1 – 2 > D0
One-tail (right) test
Ho: 1 – 2 = D0
Ha: 1 – 2 β‰  D0
Two-tail test
(π‘₯Μ…1 βˆ’π‘₯Μ…2 )βˆ’π·0
2
2
n1
n2
𝑆
𝑆
√ 1+ 2
p-value:
T.DIST.RT(ABS(tcalc,df) for both the one-tail tests
T.DIST.2T(ABS(tcalc,df) for the two-tail test
See Formula 10.7, page 416, for df
We will use Excel Data Analysis command for determining the p-value.
Case 2: 1 = 2 = 
Parameter of interest = 1 – 2
Point estimate of 1 – 2 = π‘₯Μ…1 - π‘₯Μ…2
Expected value of π‘₯Μ…1 - π‘₯Μ… 2 = E(π‘₯Μ…1 - π‘₯Μ…2 ) = 1 – 2
Pooled variance estimate (𝑆𝑝2 ) =
(n1 βˆ’1)s12 +(n2 βˆ’1)S22
n1 +n2 βˆ’2
1
1
1
2
Then, the standard error of π‘₯Μ…1 - π‘₯Μ…2 = 𝑆𝑝 √n + n
with df = n1 + n2 - 2
If n1 and n2 are both large (i.e. > 30), then π‘₯Μ…1 - π‘₯Μ…2 will follow an approximately normal
distribution.
Confidence interval for 1 – 2:
1
1
1
2
Use the formula: π‘₯Μ…1 - π‘₯Μ…2 ± t/2,df 𝑆𝑝 √n + n
Hypothesis testing:
Ho: 1 – 2 β‰₯ D0
Ha: 1 – 2 < D0
One-tail (left) test
Test statistic tcalc =
with df = n1 + n2 - 2
Ho: 1 – 2 ≀ D0
Ha: 1 – 2 > D0
One-tail (right) test
Ho: 1 – 2 = D0
Ha: 1 – 2 β‰  D0
Two-tail test
(π‘₯Μ… 1 βˆ’π‘₯Μ…2 )βˆ’π·0
1
1
n1 n2
𝑆𝑝 √ +
p-value:
T.DIST.RT(ABS(tcalc,df) for both the one-tail tests
T.DIST.2T(ABS(tcalc,df) for the two-tail test
df = n1 + n2 - 2
We will use Excel Data Analysis command for determining the p-value.
§10.3 Inferences about the difference between two population means: Matched Samples
Sample 1 = Observations of a sample prior to the event
Sample 2 = Observations from the same subjects as sample 1 taken after the event
n = Sample size of samples 1 and 2
x1i = ith observation from sample 1
x2i = ith observation from sample 2
Define:
Sample difference = di = x1i – x2i
d = Average of the population of differences
d = Standard deviation of population of differences
βˆ‘π‘‘
𝑑̅ = Mean of the sample differences = 𝑛 𝑖
βˆ‘(𝑑𝑖 βˆ’π‘‘Μ…)2
sd = Standard deviation of sample differences = √
π‘›βˆ’1
Confidence interval for d: 𝑑̅ ± 𝑑𝛼/2 𝑆𝑑 β„βˆšπ‘› with df = n - 1
Hypothesis testing:
Ho: d β‰₯ 0
Ha: d < 0
One-tail (left) test
Ho: d ≀ 0
Ha: d > 0
One-tail (right) test
Ho: d = 0
Ha: d β‰  0
Two-tail test
𝑑̅ βˆ’πœ‡0
𝑑 β„βˆšπ‘›
Test statistic tcalc = 𝑆
p-value:
T.DIST.RT(ABS(tcalc,df) for both the one-tail tests
T.DIST.2T(ABS(tcalc,df) for the two-tail test
df = n - 1
We will use Excel Data Analysis command for determining the p-value.
§10.4 Inferences about the difference between two population proportions
Notations:
p1 = Proportion of β€œsuccess” in population 1
p2 = proportion of β€œsuccess” in population 2
Sample 1 = Sample taken from population 1
Sample 2 = Sample taken from population 2
n1 = Sample size of sample 1
n2 = Sample size of sample 2
𝑝̅1 = Proportion of β€œsuccess” in sample 1
𝑝̅2 = Proportion of β€œsuccess” in sample 2
Parameter of interest = p1 – p2
Point estimate of p1 – p2 = 𝑝̅1 - 𝑝̅2
Expected value of 𝑝̅1 - 𝑝̅2 = E(𝑝̅1 - 𝑝̅2 ) = p1 – p2
𝑝1 (1βˆ’π‘1 )
Standard error of 𝑝̅1 - 𝑝̅2 = p1 – p2 = √
n1
+
𝑝2 (1βˆ’π‘2 )
n2
𝑝̅1 (1βˆ’π‘Μ…1 )
Estimated standard error of 𝑝̅1 - 𝑝̅2 = Sp1 – p2 = √
n1
+
𝑝̅2 (1βˆ’π‘Μ…2 )
n2
𝑝̅1 - 𝑝̅2 will follow an approximately normal distribution if all the following four conditions are
true: n1p1 β‰₯ 5; n1(1- p1) β‰₯ 5; n2p2 β‰₯ 5; n2(1- p2) β‰₯ 5
Confidence interval for p1 – p2
𝑝̅1 (1βˆ’π‘Μ…1 )
Use the formula 10.13, page 430: 𝑝̅1 - 𝑝̅2 ± Z/2 √
Hypothesis testing:
Ho: p1 – p2 β‰₯ 0
Ha: p1 – p2 < 0
One-tail (left) test
n1
𝑝̅2 (1βˆ’π‘Μ…2 )
+
Ho: p1 – p2 ≀ 0
Ha: p1 – p2 > 0
One-tail (right) test
n2
Ho: p1 – p2 = 0
Ha: p1 – p2 β‰  0
Two-tail test
All three Ho includes p1 – p2 = 0, i.e. p1 = p2 = p, a pooled estimate for 𝑝̅ can be found for p using
the following formula.
Pooled estimate 𝑝̅ =
n1 𝑝̅1 +n2 𝑝̅2
n1 +n2
1
1
1
2
Estimated standard error of 𝑝̅1 - 𝑝̅2 = Sp1 – p2 = βˆšπ‘Μ…(1 βˆ’ 𝑝̅) (n + n )
Test statistic zcalc =
(𝑝̅1 βˆ’π‘Μ…2 )
1 1
n1 n2
βˆšπ‘Μ… (1βˆ’π‘Μ… )( + )
p-value:
1 - NORM.DIST(ABS(zcalc,1) for both the one-tail tests
2*(1 - NORM.DIST(ABS(zcalc,1)) for the two-tail test
Summary of formulas
Confidence interval
Values of 1 and
2 are known
Values of 1 and
2 are unknown
Hypothesis testing
𝜎12
𝜎22
1
2
(π‘₯Μ…1 - π‘₯Μ…2 ) ± Z/2 √ n + n
(π‘₯Μ…1 – π‘₯Μ…2) ± t/2,df √
𝑆12
n1
+
Z=
(π‘₯Μ… 1 βˆ’π‘₯Μ… 2 )βˆ’π·0
𝑆22
n2
t=
1
1
1
2
π‘₯Μ…1 - π‘₯Μ…2 = 𝑆𝑝 √n + n
2
n1
n2
and
(π‘₯Μ…1 βˆ’π‘₯Μ…2 )βˆ’π·0
2
2
n1
n2
df =
𝑆
𝑆
√ 1+ 2
df from results of Data Analysis
command
Values of 1 and
2 are unknown;
but 1 = 2 = 
2
𝜎
𝜎
√ 1+ 2
t=
(π‘₯Μ… 1 βˆ’π‘₯Μ…2 )βˆ’π·0
1
1
n1 n2
𝑆𝑝 √ +
Sp2 =
𝑑̅ ± 𝑑𝛼/2 𝑆𝑑 β„βˆšπ‘›
t=
 12  22

= Standard error of π‘₯Μ…1 - π‘₯Μ… 2
n1 n 2
 S12 S 22 οƒΆ

οƒ·οƒ·

 n1 n2 οƒΈ
1  S12

n1 ο€­ 1  n1
2
2
2
οƒΆ
1  S 22 οƒΆ
οƒ·οƒ· 
 οƒ·
n2 ο€­ 1  n2 οƒ·οƒΈ
οƒΈ
Sp = Pooled estimate of the common  if 1 = 2
=
df = n1 + n2 - 2
Matched sample
Comments
Where, = π‘₯Μ…1 - π‘₯Μ…2 = Point estimate for 1-2
(n1 βˆ’1)S12 +(n2 βˆ’1)S22
n1 +n2 βˆ’2
with df = n1 + n2 – 2
Where d = Mean of paired differences, and Sd =
Standard deviation of paired differences
𝑑̅ βˆ’πœ‡0
𝑆𝑑 β„βˆšπ‘›
df = n - 1
Two population
proportions
Z==
𝑝̅1 - 𝑝̅2 ± Z/2 √
𝑝̅1 (1βˆ’π‘Μ…1 )
n1
+
𝑝̅2 (1βˆ’π‘Μ…2 )
(𝑝̅1 βˆ’π‘Μ…2 )
1 1
n1 n2
βˆšπ‘Μ… (1βˆ’π‘Μ… )( + )
n2
where, 𝑝̅ =
n1 𝑝̅1 +n2 𝑝̅2
n1 +n2
Where,
p1 ο€­ p 2 = Point estimate for p1 – p2
1
1
1
2
βˆšπ‘Μ… (1 βˆ’ 𝑝̅ ) (n + n ) = Standard error of p1 ο€­ p 2
for hypothesis testing where we assume p1 = p2 =
p
Related documents