Download Sample Variance

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
MIS 331
Data Mining
2016/2017 Fall
Chapter 2
Sampliing Distribution
Confidence Interval Estimation
Hypothesis Testing
for Variance of a Population
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-1
Outline




Sampling Distributio of Sample Variances
Confidence Interval Estimation for the Variance
Tests of the Variance of a Normal Distribution
Tests of Equality of
Two Variances
6.4
Sampling Distributions of
Sample Variances
Sampling
Distributions
Sampling
Distributions
of Sample
Means
Sampling
Distributions
of Sample
Proportions
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Sampling
Distributions
of Sample
Variances
Ch. 6-3
Sample Variance

Let x1, x2, . . . , xn be a random sample from a
population. The sample variance is
n
1
2
s2 
(x

x
)

i
n  1 i1

the square root of the sample variance is called
the sample standard deviation

the sample variance is different for different
random samples from the same population
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-4
Sampling Distribution of
Sample Variances

The sampling distribution of s2 has mean σ2
E[s 2 ]  σ 2

If the population distribution is normal, then
4
2σ
Var(s 2 ) 
n 1
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-5
Chi-Square Distribution of
Sample and Population Variances

If the population distribution is normal then
χ
2
n 1
(n - 1)s

2
σ
2
has a chi-square (2 ) distribution
with n – 1 degrees of freedom
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-6
The Chi-square Distribution


The chi-square distribution is a family of distributions,
depending on degrees of freedom:
d.f. = n – 1
0 4 8 12 16 20 24 28
d.f. = 1

2
0 4 8 12 16 20 24 28
d.f. = 5
2
0 4 8 12 16 20 24 28
2
d.f. = 15
Text Appendix Table 7 contains chi-square probabilities
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-7




Expected value of a chi-square distribution with
degree of freedom v is v
E[2v] = v
Variance of achi-square distribution with degree
of freedom v is 2v
Var[2v] = 2v








Since (n-1)s2/2 has a chi-square distribution
with df: n-1
E[(n-1)s2/2] = n-1
(n-1)/2E[s2] = n-1
E[s2] = 2,
Similarly
Var[(n-1)s2/2] = 2(n-1)
(n-1)2/4)Var[s2] = 2(n-1)
Var[s2] = 24/(n-1)
Degrees of Freedom (df)
Idea: Number of observations that are free to vary
after sample mean has been calculated
Example: Suppose the mean of 3 numbers is 8.0
Let X1 = 7
Let X2 = 8
What is X3?
If the mean of these three
values is 8.0,
then X3 must be 9
(i.e., X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary
for a given mean)
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-10








Table 7 in Appandix
d.f. versus probabilities for critical values
P(210 < KL) = 0.05
KL = 3.940 hence
P(210 < 3.940) = 0.05
P(210 > KU) = 0.05
KU = 18.31 hence
P(210 > 18.31) = 0.05
Chi-square Example

A commercial freezer must hold a selected
temperature with little variation. Specifications call
for a standard deviation of no more than 4 degrees
(a variance of 16 degrees2).
 A sample of 14 freezers is to be
tested
 What is the upper limit (K) for the
sample variance such that the
probability of exceeding this limit,
given that the population standard
deviation is 4, is less than 0.05?
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-12
Finding the Chi-square Value
2
(n

1)s
χ2 
σ2

Is chi-square distributed with (n – 1) = 13
degrees of freedom
Use the the chi-square distribution with area 0.05
in the upper tail:
213 = 22.36 (α = .05 and 14 – 1 = 13 d.f.)
probability
α = .05
2
213 = 22.36
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-13
Chi-square Example
(continued)
213 = 22.36
So:
(α = .05 and 14 – 1 = 13 d.f.)
 (n  1)s2
2 

P(s  K)  P
 χ13   0.05
 16

2
(n  1)K
 22.36
16
or
so
K
(where n = 14)
(22.36)(16 )
 27.52
(14  1)
If s2 from the sample of size n = 14 is greater than 27.52, there is
strong evidence to suggest the population variance exceeds 16.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 6-14
7.5
Confidence Interval Estimation
for the Variance
Confidence
Intervals
Population
Mean
Population
Proportion
Population
Variance
(From a normally
distributed population)
σ2 Known
σ2 Unknown
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 7-15
Confidence Intervals for the
Population Variance
 Goal: Form a confidence interval for the
population variance, σ2

The confidence interval is based on the
sample variance, s2

Assumed: the population is normally
distributed
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 7-16
Confidence Intervals for the
Population Variance
(continued)
The random variable

2
n1
(n  1)s

2
σ
2
follows a chi-square distribution with (n – 1)
degrees of freedom
Where the chi-square value  n1,  denotes the number for which
2
P( χn21  χn21, α )  α
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 7-17





P(2n-1 > 2n-1,/2 ) = /2
P(2n-1 > 2n-1,1-/2 ) = 1 - /2 or
P(2n-1 < 2n-1,1-/2 ) = /2
Finally,
P(2n-1,1-/2 < 2n-1 < 2n-1,/2) = 1 - /2 - /2 =1






two numbers such that probability that chisquare with d.f. 6 is llaying between tham is
0.90
P(26,0.950 < 26 < 26,0.05) =0.90
The two numbers
26,0.950 = 1.635
26,0.05 = 12.932 hence
P(1.635 < 26 < 12.935) =0.90
Confidence Intervals for the
Population Variance
(continued)
The 100(1 - )% confidence interval for the
population variance is given by
(n  1)s 2
LCL  2
χ n1, α/2
(n  1)s
UCL  2
χ n1, 1 - α/2
2
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 7-20
Example
You are testing the speed of a batch of computer
processors. You collect the following data (in Mhz):
Sample size
Sample mean
Sample std dev
17
3004
74
Assume the population is normal.
Determine the 95% confidence interval for σx2
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 7-21
Finding the Chi-square Values


n = 17 so the chi-square distribution has (n – 1) = 16
degrees of freedom
 = 0.05, so use the the chi-square values with area
0.025 in each tail:
2
χn21, α/2  χ16
, 0.025  28.85
2
χn21, 1 - α/2  χ16
, 0.975  6.91
probability
α/2 = .025
probability
α/2 = .025
216 = 6.91
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
216 = 28.85
216
Ch. 7-22
Calculating the Confidence Limits

The 95% confidence interval is
2
(n  1)s 2
(n

1)s
2

σ
 2
2
χ n1, α/2
χn1, 1 - α/2
2
(17  1)(74) 2
(17

1)(74)
 σ2 
28.85
6.91
3037  σ 2  12680
Converting to standard deviation, we are 95%
confident that the population standard deviation of
CPU speed is between 55.1 and 112.6 Mhz
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 7-23
9.6
Tests of the Variance of a
Normal Distribution
 Goal: Test hypotheses about the population
variance, σ2 (e.g., H0: σ2 = σ02)
 If the population is normally distributed,

2
n1
(n  1)s

σ2
2
has a chi-square distribution with (n – 1) degrees
of freedom
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chap 11-24
Tests of the Variance of a
Normal Distribution
(continued)
The test statistic for hypothesis tests
about one population variance is
χ
2
n 1
(n  1)s 2

2
σ0
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Chap 11-25
Decision Rules: Variance
Population variance
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: σ2  σ02
H1: σ2 < σ02
H0: σ2 ≤ σ02
H1: σ2 > σ02
H0: σ2 = σ02
H1: σ2 ≠ σ02


χ n21,
χn21,1
Reject H0 if
χ
2
n1
χ
2
n1,1
Reject H0 if
χ n21  χ n21,
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
/2
/2
χ n21,1 / 2
χn21, / 2
Reject H0 if
or
χn21  χn21, / 2
χn21  χn21,1 / 2
Chap 11-26
Newbold 9.47






Test the hypothesis
H0:2 <=100 againts H1 2 >100
a) s2 = 165, n=25
b) s2 = 165, n=29
c) s2 = 159, n=25
d) s2 = 67, n=38
Solution
(n  1)s 2 24(165)
9.47 a. H 0 :   100; H 1 :   100;  
= 39.6,

2

100
 2(24,.025)  39.36,  2(24,.010)  42.98
2
2
Therefore, reject
2
H
0
at the 2.5% level but not at the 1% level of significance.
(n  1)s 2 28(165)

b. H 0 :   100; H 1 :   100;  
= 46.2,
2

100
 2(28,.025)  44.46,  2(28,.010)  48.28
2
Therefore, reject
H
2
0
2
at the 2.5% level but not at the 1% level of significance
Solution
(n  1) s 2 24(159)
c. H 0 :   100; H 1 :   100;  
= 38.16,

2

100
 2(24,.050)  36.42,  2(24,.025)  39.36
2
2
Therefore, reject
2
H
0
at the 5% level but not at the 2.5% level of significance.
2
(
n

1)
s
37(67)

d. H 0 :  2  100; H 1 :  2  100;  
= 24.79,
2
100
 2(37,.100)  48.36,  2(37,.05)  52.19
2
Therefore, do not reject
H
0
at any common level of significance.
Newbold 7.48






new safety device
random sample for 8 days
618 660 638 625 571 598 639 582
management concenrs about variability
test the null hypothesis variance less than 500
at a significance level of 10%
9.48
Solution
2
2
:


500;
:

H0
H 1  500; reject
H
0
if  2(7,.10) > 12.02
2
(
n

1)
s
7(933.982)
2
  2 
= 13.0757, Therefore, reject

500
H
0
at the 10% level
10.4
Tests for Two
Population
Variances
F test statistic
Tests of Equality of
Two Variances
 Goal: Test hypotheses about two
population variances
H0: σx2  σy2
H1: σx2 < σy2
H0: σx2 ≤ σy2
H1: σx2 > σy2
H0: σx2 = σy2
H1: σx2 ≠ σy2
Lower-tail test
Upper-tail test
Two-tail test
The two populations are assumed to be
independent and normally distributed
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 10-32
Hypothesis Tests for
Two Variances
(continued)
Tests for Two
Population
Variances
F test statistic
The random variable
2
x
2
y
s /σ
F
s /σ
2
x
2
y
Has an F distribution with (nx – 1)
numerator degrees of freedom and
(ny – 1) denominator degrees of
freedom
Denote an F value with 1 numerator and 2
denominator degrees of freedom by Fν 1,ν 2
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 10-33
Test Statistic
Tests for Two
Population
Variances
The critical value for a hypothesis test
about two population variances is
s
F
s
F test statistic
2
x
2
y
where F has (nx – 1) numerator
degrees of freedom and (ny – 1)
denominator degrees of freedom
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 10-34
Decision Rules: Two Variances
Use sx2 to denote the larger variance.
H0: σx2 = σy2
H1: σx2 ≠ σy2
H0: σx2 ≤ σy2
H1: σx2 > σy2
/2

0
Do not
reject H0
Reject H0
Fnx 1,ny 1,α
F
Reject H0 if F  Fnx 1,ny 1,α
0
Do not
reject H0
F
Reject H0
Fnx 1,ny 1,α / 2
rejection region for a twotail test is:

Reject H0 if F  Fnx 1,ny 1,α / 2
where sx2 is the larger of
the two sample variances
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 10-35
Example: F Test
You are a financial analyst for a brokerage firm. You
want to compare dividend yields between stocks listed
on the NYSE & NASDAQ. You collect the following data:
NYSE
NASDAQ
Number
21
25
Mean
3.27
2.53
Std dev
1.30
1.16
Is there a difference in the
variances between the NYSE
NASDAQ at the  = 0.10 level?
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
&
Ch. 10-36
F Test: Example Solution


Form the hypothesis test:
H0: σx2 = σy2 (there is no difference between variances)
H1: σx2 ≠ σy2 (there is a difference between variances)
Find the F critical values for  = .10/2:
Degrees of Freedom:
 Numerator
(NYSE has the larger
standard deviation):


nx – 1 = 21 – 1 = 20 d.f.
Fnx 1, ny 1, α / 2
 F20 , 24 , 0.10/2  2.03
Denominator:

ny – 1 = 25 – 1 = 24 d.f.
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 10-37
F Test: Example Solution
(continued)

The test statistic is:
H0: σx2 = σy2
H1: σx2 ≠ σy2
s2x 1.30 2
F 2 
 1.256
2
s y 1.16


F = 1.256 is not in the rejection
region, so we do not reject H0
/2 = .05
Do not
reject H0
Reject H0
F
F20 , 24 , 0.10/2  2.03
Conclusion: There is not sufficient evidence
of a difference in variances at  = .10
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall
Ch. 10-38
Related documents