Download H 0

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
4-1
Statistical Inference
• Statistical inference is to make decisions or draw conclusions about a population using
the information contained in a sample from the population.
Its two major areas:
1. Parameter Estimation
2. Hypothesis Testing
4-2
Point Estimation
• A point estimate is an observed value of a point estimator (a statistic) .
Point
Estimator
4-2` Interval Estimation - Confidence Interval
Note: L(lower confidence limit) and U (upper confidence limit) are statistic and hence
random variables.
Ex) Confidence level = 95% ( 1- =0.95)
 P(C.I. will contain the true parameter) = 0.95
95% of all the C.I. s will contain the true parameter
•The general formula for all confidence intervals is:
Point Estimate ± (Critical Value) (Standard Error)
Lower Confidence
Limit
L=Point Estimate
-Critical Value*S.E
Point Estimate
Width of
confidence interval
(=2*Critical Value*S.E.)
Upper Confidence
Limit
U=Point Estimate
+Critical Value*S.E
Confidence Interval for a Mean 
when the variance 2 is known
Assumptions
– Population standard deviation σ is known
– Population is normally distributed
– If population is not normal, use large sample (CLT)
 100(1-)% (two-sided) Confidence Interval for :
X  z/2
or
σ
n
(, where Zα/2 is the standardized normal distribution critical value for a probability of
α/2 in each tail)
Chap 8-6

100(1-)% Upper-Confidence Bound for 

100(1-)% Lower-Confidence Bound for 
Critical Value: Zα/2
Consider a 95% confidence interval: 1- =0.95
α
 .025
2
α
 .025
2
Z1- /2 = -1.96
0
Zα/2 = 1.96
Commonly used confidence levels are 90%, 95%, and 99%
Note: Z1-
/2
= - Zα/2
X units:
Chap 8-8
Example
A sample of 11 circuits from a normal population has a mean resistance of
2.20 ohms. We know from past testing that the population standard deviation
is 0.35 ohms. Determine a 95% confidence interval for the true mean
resistance of the population.
X  Z / 2
σ
 2.20  1.96 (0.35/ 11)
n
 2.20  .2068
(1.9932, 2.4068)
Chap 8-9
Confidence Interval for a Mean 
when the variance 2 is unknown
•
If the population standard deviation σ is unknown, we can substitute the sample
standard deviation, S
• This introduces extra uncertainty, since S is variable from sample to sample =>
Use the t distribution instead of the normal distribution
Assumptions
– Population standard deviation  is unknown
– Population is normally distributed
– If population is not normal, use large sample
• 100(1-)% Confidence Interval for  :
X  t /2, n -1
s
n
or
(,where t/2, n-1 is the critical value of the t distribution with n-1 d.f. and an
area of α/2 in each tail)
Chap 8-10

100(1-)% Upper-Confidence Bound for 

100(1-)% Lower-Confidence Bound for 
Student’s t Distribution
• T-distriburions are symmetric and bell shaped but have flatter tails
than normal
• The t value depends on degrees of freedom (d.f.)
• As d.f. goes infinity, t-distribution -> N(0,12)
Standard
Normal
(t with df = ∞)
t (df = 13)
t (df = 5)
0
t
Chap 8-12
Table of T-distiribution
Example
A random sample of n = 25 has the sample mean 50 and the sample variance
8. Form a 95% confidence interval for μ
– d.f. = n – 1 = 24, so
– The confidence interval is
(48.832, 51.168)
Chap 8-14
(16.457,17,483)
Confidence Intervals for the variance of a normal population
Chap 8-16

100(1-)% Confidence Interval for 2

100(1-)% Upper-Confidence Bound for 2

100(1-)% Lower-Confidence Bound for 2
19
Confidence Intervals for the Population Proportion, p
σ p̂ 
p(1  p)
n
Chap 8-20

100(1-)% Confidence Interval for p

100(1-)% Upper-Confidence Bound for p

100(1-)% Lower-Confidence Bound for p
[Example] A random sample of 100 people shows that 25 wear glasses. Form a
95% confidence interval for the true proportion of the population who wear glasses.
p̂  Z / 2 p̂(1  p̂)/n  25/100  1.96 0.25(0.75)/100
 0.25  1.96 (0.0433)
(0.1651 , 0.3349)
Note : We are 95% confident that the true percentage of people wearing glasses
in the population is between 16.51% and 33.49%. Although the interval from
.1651 to .3349 may or may not contain the true proportion, 95% of intervals
formed from samples of size 100 in this manner will contain the true
proportion.
Chap 8-22
4-3
Hypothesis Testing
4-3.1 Statistical Hypotheses
•
A (statistical) hypothesis is a statement or claim about a population parameter(not
about a sample statistic):
Ex) The mean electric bill per household of this city is μ = $132.
The proportion of adults in this city with full-time jobs is p =0.61.
•
Hypothesis testing is a procedure leading to a decision about a hypothesis based
on a random sample
•
Null Hypothesis (H0) states the assumption to be tested. A hypothesis testing
begins with the assumption that H0 is true
•
Alternative Hypothesis (H1) is the opposite of the null hypothesis. It is the
hypothesis that the researcher is trying to prove.
Ex) H0 : The mean age of smart phone users is 28. (H0: μ = 28)
H1 : The mean age of smart phone users is not 28. (H1: μ  28)
Example- Insight into the Hypothesis Testing
Suppose that we are interested in the burning rate of a solid propellant used to power
aircrew escape systems.
• Suppose that our interest focuses on the mean burning rate (a parameter of the
distribution of the burning rate).
• If we are interested in deciding whether or not the mean burning rate is 50
centimeters per second:
Two-sided Alternative Hypothesis
• If we are trying to prove that the mean burning rate is less than 50 centimeters
per second.
One-sided Alternative Hypotheses
H0 :  = 50cm/s
H1:  < 50cm/s
 Note: If H1:  < 50cm/s then we can write the null hypothesis as H0 :  = 50cm/s
or H0 :   50cm/s . Both expression lead to the same testing procedure and the same
decision .
4-3.2 Testing Statistical Hypotheses
• Hypothesis-testing procedures rely on using the information in a random sample from
the population of interest.
• If this information is consistent with the hypothesis, then we will conclude that the
hypothesis is true; if this information is inconsistent with the hypothesis, we will
conclude that the hypothesis is false.
• Sample the population and find sample mean.
• Suppose the sample mean age was
= 20.
• This is significantly lower than the claimed population mean 50.
• If the null hypothesis were true, the probability of getting such a different sample
mean would be very small, so you reject the null hypothesis . In other words, getting
a sample mean of 20 is so unlikely if the population mean was 50, thus you conclude
that the population mean must not be 50.
Sampling
Distribution of X
20
μ = 50
If H0 is true
X
The Test Statistic and Rejection Region
•
•
•
If the sample mean is close to the assumed population mean, the null hypothesis is
not rejected.
If the sample mean is far from the assumed population mean, the null hypothesis is
rejected.
How far is “far enough” to reject H0?
•Test statistic is a statistic computed from
the sample data to make a decision about the
hypothesis.
ex) sample mean, sample variance,
sample proportion etc.
Distribution of the test
statistic
Rejection
Region
• If the test statistic value falls in the
rejection region, we will reject H0.
• The boundaries that define the rejection
regions are called the critical values.
Critical
Values
Chap 9-28
How to decide the rejection region (critical values)?
The critical values are decided by
i) the distribution of the test statistic
ii) the significance level  ( see next page)
represents
critical value
/2
H0: μ = 50
H1: μ ≠ 50
/2
Rejection
region is
shaded
Two-tail test
0
H0: μ ≤ 50
H1: μ > 50
H0: μ ≥ 50
H1: μ < 50

Upper-tail
test
0
Lower-tail 
test
0
Errors in Decision Making
•
•
•
The conclusion from a hypothesis testing may be an error since it is based on a
random sample (random experiment).
Type I Error
 Rejecting the null hypothesis when it is true.
 The probability of a Type I Error is called the significance level or size of the
test, denoted by .
 The significance level is usually set by researchers in advance.
Type II Error
 Failing to reject the null hypothesis when it is false.
 The probability of a Type II Error is denoted by β.
 1- β is called the power of the test.
Actual Situation
Decision
H0 True
H0 False
Do Not
No Error
Reject H0 Probability 1 - α
Type II Error
Probability β
Reject H0 Type I Error
Probability α
No Error
Probability 1 - β
Hypothesis Testing procedure using Rejection Region
1. State the null hypothesis, H0 and the alternative hypothesis, H1
2. Choose the significance level, α.
3. Determine the test statistic to use / Convert Sample Statistic (ex. X) to Test
Statistic (ex. Z-statistic )
4. Find the critical values and determine the rejection region(s)
5. Collect data and compute the test statistic value from the sample result
6. Compare the test statistic to the critical value to determine whether the
test statistic falls in the region of rejection.
Make the statistical decision: Reject H0 if the test statistic falls in the
rejection region.
Chap 9-32
4-3.3 P-Values in Hypothesis Testing
The p-value is the probability of obtaining a test statistic equal to or more extreme
than the observed sample value when H0 is true.
• Sometimes referred to as “the observed level of significance” or “Smallest
value of  for which H0 can be rejected”
• The p-value measures the plausibility of the null hypothesis, H0.
“The smaller the p-value, the less plausible is the null hypothesis.“
Hypothesis Testing procedure using P-value
1. State the null hypothesis, H0 and the alternative hypothesis, H1
2. Choose the significance level, α.
3. Determine the test statistic to use / Convert Sample Statistic (ex. X) to
Test Statistic (ex. Z-statistic )
4. Collect data and compute the test statistic from the sample result
5. Obtain the p-value from a distribution table of test statistic (or by using
Excel, minitab etc)
6. Compare the p-value with 
• If p-value <  , reject H0
• If p-value   , do not reject H0
Chap 9-34
Hypothesis Testing on the Mean
4-4
Inference on the Mean of a Population,
Variance Known
Assumptions
4-4.1 Hypothesis Testing on the Mean, Variance Known
Ex: Hypothesis Testing: σ Known, two-sided
H0 : μ = μ o
H1 : μ ≠ μ o
X  μo
σ/ n
•
Convert sample statistic ( X ) to test statistic
•
Determine the critical Z values for a specified level of significance 
•
Decision Rule: If the test statistic falls in the rejection region, reject H0 ,
otherwise do not reject H0
Z0 
/2
/2
μo
Reject
H0 -Z
Lower
critical
value
Do not
reject H0
0
X
Reject
Z
+Z H
0
Upper
critical
value
Example To test the claim that the mean weight of chocolate bars manufactured in a
factory is 3 ounces, we weighed 100 chocolate bars and the average weight was
2.84. Suppose that, from past records, the standard deviation is known to be 0.8.
1) State the null and alternative hypotheses
H0: μ = 3 H1: μ ≠ 3 (two-sided test)
2) Choose the desired level of significance
Suppose that  =0 .05 is chosen for this test
3) Determine the test statistic
σ is known so this is a Z-test
Z0 
X μ
2.84  3
 0 .16


  2.0
0.08
σ/ n
0 . 8 / 100
4) Find the critical values and determine the rejection region(s)
For  = 0.05 , the critical Z-values are ±1.96
Reject H0 if z0 < -1.96 or z0 > 1.96
5) Reach a decision and interpret the result
Since z0 = -2.0 < -1.96, you reject the null hypothesis.
(That is, there is sufficient evidence that the mean weight of chocolate bars
is not equal to 3.)
Chap 9-39
Example -revisit To test the claim that the mean weight of chocolate bars
manufactured in a factory is 3 ounces, we weighed 100 chocolate bars and the
average weight was 2.84. Suppose that, from past records, the standard deviation
is known to be 0.8. Test at =0.05 using p-value.
X = 2.84 is translated to a Z score
Z0 
X μ
2.84  3
 .16


  2.0
.08
σ/ n
0 . 8 / 100
p-value = 2P(Z > lz0l ) =2P(Z>2.0)=2*0.0228=0.0456
p-value = 0.0456 <  (= 0.05)
Thus, we reject the null hypothesis.
/2 = .025
/2 = .025
.0228
.0228
-1.96
-2.0
0
1.96
Z
2.0
Example A phone industry manager thinks that customer monthly cell phone bills have
increased, and now average more than $52 per month. Past company records indicate that
the standard deviation is about $10. He collect a sample of n=64 and the sample mean was
53.1 Test this claim at  = 0.10
1) H0: μ ≤ 52 vs H1: μ > 52
2)
Test Statistic
Z0 
X μ
53.1  52

 0.88
σ/ n
10 / 64
3) Rejection Region: Critical Value = 1.28
If Z0>1.28 then reject H0
Reject H0
1- = .90

4) Since Z0=0.88 < 1.28,
= .10
we cannot reject H0
5) We cannot say that the mean bill
is greater than $52
0
1.28
Z0 = .88
Chap 9-41
P-value method: Let’s calculate the p-value and compare to 
53.1  52.0 

P(X  53.1)  P Z 
  P(Z  0.88)  1  0.8106  0.1894
10/ 64 

We do not reject H0 since p-value = 0.1894 >  (= .10)
p-value = 0.1894
Reject H0
 = .10
0
1.28
Z = .88
Chap 9-42
4-5 Inference on the Mean of a Population, Variance Unknown
Student’s t Distribution
•
•
•
T-distriburions are symmetric and bell
shaped but have flatter tails than normal
The t value depends on degrees of freedom
(d.f.)
As d.f. goes infinity, t-distribution ->
N(0,12)
4-5.1 Hypothesis Testing on the Mean, Variance Unknown
Assumptions
Population standard deviation is unknown
Population is normally distributed, If population is not normal, use large sample
Calculating the P-value
Example The mean cost of a hotel room in LA is said to be $168 per night. A
random sample of 25 hotels resulted in X = 172.50 and S = 15.40. Test at
the = 0.05 level Assuming the data are normally distributed.
H0: μ = 168 H1: μ 168
•  is unknown, so use a t-statistic
X μ
172.50  168
t0 

 1.46
S
15.40
n
25
• Critical Values: t0.025, 24 = ± 2.0639
• Reject H0 if t0>2.0639 or t0<-2.0639
• Since t0 does not fall in the rejection region, we cannot reject H0
Chap 9-46
Relationship between Tests of Hypotheses and Confidence Intervals
 The test of significance level  of the hypothesis
will lead to rejection of H0
 The hypothesized value 0 is not in the 100(1 - ) percent confidence interval [l, u].
 The test of significance level  of the hypothesis
<
will lead to rejection of H0
 The hypothesized value 0 is not in the 100(1 - ) percent confidence interval
[-, u].
 The test of significance level  of the hypothesis
>
will lead to rejection of H0
 The hypothesized value 0 is not in the 100(1 - ) percent confidence interval
[l, ].
4-6
Inference on the Variance of a Normal Population
4-6.1 Hypothesis Testing on the Variance of a Normal Population
4-7
Inference on Population Proportion
4-7.1 Hypothesis Testing on a Binomial Proportion
We will consider testing: