Download 6Hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Hypothesis testing
1
Hypothesis testing
A common aim in many studies is to check whether the data agree
with certain predictions. These predictions are hypotheses about
variables measured in the study.
A hypothesis is a statement about some characteristic of a variable
or a collection of variables.
A significance test is a way of statistically testing a hypothesis by
comparing the data to values predicted by the hypothesis. Data that
fall far from the predicted values provide evidence against the
hypothesis. All significance tests have five elements: assumptions,
hypotheses, test statistic, p-value, and conclusion.
All significance tests require certain assumptions for the tests to be
valid. These assumptions refer, e.g., to the type of data, the form of
the population distribution, method of sampling, and sample size.
2
Hypothesis testing
Exmple: a firm produces metal boxes and wants
to evaluete the production process. They want to
be sure that the longest side of the box is 368
mm. They keep a sample of 25 boxes. If the
length of the side would result different, the all
production process will need a correction.
3
Hypothesis testing
A significance test considers two hypotheses about the value of a
population parameter: the null hypothesis and the alternative
hypothesis.
The null hypothesis H0 is the hypothesis that is directly tested. This
is usually a statement that the parameter has value corresponding
to, in some sense, no effect. The alternative hypothesis Ha is a
hypothesis that contradicts the null hypothesis. This hypothesis
states that the parameter falls in some alternative set of values to
what null hypothesis specifies.
4
Hypothesis testing
A significance test analyzes the strength of sample evidence against the
null hypothesis. The test is conducted to investigate whether the data
contradict the null hypothesis, hence suggesting that the alternative
hypothesis is true. The alternative hypothesis is judged acceptable if
the sample data are inconsistent with the null hypothesis. That is, the
alternative hypothesis is supported if the null hypothesis appears to
be incorrect. The hypotheses are formulated before collecting or
analyzing the data.
The test statistics is a statistic calculated from the sample data to test
the null hypothesis. This statistic typically involves a point estimate of
the parameter to which the hypotheses refer.
Hypothesis testing
The sample distribution of the test statistics is divided into two regions:
•Region of rejection
•Region of acceptance
Decision rule:
Value of test statistics
Falls in region of acceptance
Falls in region of rejection
Null hypothesis cannot be rejected
Null hypothesis must be rejected
6
Hypothesis testing
To decide on the null hypothesis, we need to find the critic value
of the test statistics.
This is the value that divide the acceptance and rejection region
Rejection
region
Critic
value
Acceptance
region
Critic
value
Rejection
region
7
Hypothesis testing
The p-value is the probability, if H0 were true, that the test statistic would fall
in this collection of values.
The p-value is the probability, when H0 is true, of a test statistic value at least
as contradictory to H0 as the value actually observed.
The smaller the p-value, the more strongly the data contradict H0.
For example, a p-value such as 0.3 or 0.8 indicates that the observed data would
not be unusual if H0 were true. But a p-value such as 0.001 means that such
data would be very unlikely, if H0 were true. This provides strong evidence
against H0.
Test for the mean (known variance)
To verify that the mean of a population is equal to a certain
value μ, against the alternative hypotheasis of a value different
from it, if we know σ, we can use the test statistics Z:
X is distributied as a Normal => under H0, Z is distributed as a
standardised normal
If Z has values near 0 we can accept H0, else we refuse H0 (two side
test).
9
Test for the mean (known variance)
Critical value approach (level of significance of 0.05)
Decision rule:
Refuse H0
if Z>+1,96 or
if Z<-1,96
else
accept H0
Rejection
region
Critic
value
Acceptance
region
Rejection
region
Critic
value
10
Test per la media (varianza nota)
Example: a firm produces metal boxes and wants to evaluete the production process.
They want to be sure that the longest side of the box is 368 mm. They keep a sample
of 25 boxes. The standard devistion id 15 mm and the sample mean is 372,5 mm.
H0: μ = 368
H1: μ ≠ 368
With the value of the test
statistics, H0 cannot be refused.
Rejection
region
Acceptance
region
Rejection
region
11
P-value approach
Decision rule:
• if the p-value greather than or equal to , null hypothesis is accepted.
• if the p-value è is less than , the null hypothesis is rejected.
12
Test for the mean (unknown variance)
Usually we do not know σ and we estimate it through S.
In this case the test statistics to be use is t:
It has the Student’s t distribution with n − 1 degrees of freedom
if H0 is true.
Also in this case we can use the critic value approach or the p-value
one. The tables to be used are the t-Student’s ones.
13
Test for the mean (unknown variance)
Example: t with a level of significance 0.05 and 11 degree fo
freedom
Rejection
region
Critic
value
Rejection
region
Acceptance region
Critic
value
14
Test for the mean (unknown variance)
Example: The following data are the amounts in dollars in a random
sample of 12 sales invoices.
108.98 152.22 111.45 110.59 127.46 107.26
93.32 91.97 111.56 75.71 128.58 135.11
X
X
i 1
n
 X
n
n
i
S
 112.85
H0: μ=120 H1:μ≠120
i 1
i  X
n 1
2
 20.80
α=0.05
t
X   112.85  120

 1.19
S
20.80
n
12
15
…Example
Since -2.201<t=-1.19<2.201 we do not reject H0.
16