Download Basic principles of probability theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Psychometrics wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Elementary hypothesis testing
•
•
•
•
•
•
•
•
Purpose of hypothesis testing
Type of hypotheses
Type of errors
Critical regions
Significant levels
Power of tests
Hypothesis vs intervals
R commands for tests
Example
This example will be used throughout this lecture. This data set was taken from R (it
is “shoes” dataset in the package MASS)
The purpose of this experiment was to test wear of shoes of two materials A and B.
For this purpose 10 boys were selected and for each boy shoes were made using
materials A and B. One type of material was used for shoe for the right foot and
another for the left foot. This type of experiment is called paired design.
A: 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8 8.8 13.3
B: 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3 9.3 13.6
For simplicity let us consider difference between A and B vectors:
-0.8 -0.6 -0.3 0.1 -1.1 0.2 -0.3 -0.5 -0.5 -0.3
Sample mean is -0.41 and sample variance is 0.39
Purpose of hypothesis testing
Statistical hypotheses are in general different from scientific ones. Scientific
hypotheses deal with the behavior of scientific subjects such as interactions
between all particles in the universe. These hypotheses in general cannot be
tested statistically. Statistical hypotheses deal with the behavior of observable
random variables. These are hypotheses that are testable by observing some
set of random variables. They are usually related to the distribution(s) of
observed random variables.
For example if we have observed two sets of random variables x=(x1,x2,,,,xn) and
y=(y1,y2,,,,ym) then one natural question is: are means of these two sets
different? It is a statistically testable hypothesis. Another question may arise
do these two sets of random variables come from the population with the
same variance? Or do these random variables come from the populations with
the same distribution? These questions can be tested using observed samples.
Types of hypotheses
Hypotheses can in general be divided into two categories: a) parametric and b) nonparametric. Parametric hypotheses concern with situations when the
distribution of the population is known. Parametric hypotheses depend on the
value of one or several parameters of this distribution. Non-parametric
hypotheses concern with situations when none of the parameters of the
distribution is specified in the statement of the hypothesis. For example
hypothesis that two sets of random variables come from the same distribution
is non-parametric one.
Parametric hypotheses can also be divided into two families: 1) Simple hypotheses
are those when all parameters of the distribution are specified. For example
hypothesis that set of random variables comes from a population with normal
distribution with known variance and known mean is a simple hypothesis 2)
Composite hypotheses are those when some parameters of the distribution are
specified and others remain unspecified. For example hypothesis that set of
random variables comes from a population with normal distribution with a
given mean value but unknown variance is a composite hypothesis.
Errors in hypothesis testing
Hypothesis is usually not tested alone. It is tested against some alternative one.
Hypothesis being tested is called the null-hypothesis and denoted by H0 and
alternative hypothesis is denoted H1. Subscripts may be different and may
reflect the nature of the alternative hypothesis. Null-hypothesis gets “benefit
of doubt”. There are two possible conclusions: reject null-hypothesis or notreject null-hypothesis. H0 is only rejected if sample data contains sufficiently
strong evidence that it is not true. Usually testing of a hypothesis comes to
verification of some test statistic (a function of the sample points). If this
value belongs to some region w hypothesis is rejected.. This region is called
critical region. The region complementary to the critical region that is equal to
W-w is called acceptance region. By rejecting or accepting hypothesis we can
make two types of errors:
Type I error: Reject H0 if it is true
Type II error: Accept H0 when it is false.
Type I errors usually considered to be more serious than type II errors.
Type I errors define significance levels and Type II errors define power of the test.
In ideal world we would like to minimize both of these errors.
Power of a test
The probability of Type I error is equal to the size of the critical region, . The
probability of the type II error is a function of the alternative hypothesis (say
H1). This probability usually denoted by . Using notation of probability we
can write:
P( x  w | H 0 )  
P( x W  w | H1 )   or P( x  w | H1 )  1  
Where x is the sample points, w is the critical region and W-w is the acceptance
region. If the sample points belong to the critical region then we reject the
null-hypothesis. Above equations are nothing else than Type I and Type II
errors written using probabilistic language.
Complementary probability of Type II error, 1- is called the power of the test of
the null hypothesis against the alternative hypothesis.  is the probability of
accepting null-hypothesis if alternative hypothesis is true and 1- is the
probability of rejecting H0 if H1 is true
Power of the test is the function of , the alternative hypothesis - H1 and
probability distributions conditional on H0 and H1.
Critical region
Let us assume that we want to test: if some parameter of the population is equal to a
given value against alternative hypothesis. Then we can write (for example):
H 0 :   0 against H1 :   0
Test statistic is usually a point estimation for  or somehow related to it. If critical
region defined by this hypothesis is an interval (-;cu] then cu is called the critical
value. It defines upper limit of the critical interval. All values of the statistic to the
left of cu leads to rejection of the null-hypothesis. If the value of the test statistic is
to the right of cu this leads to not-rejecting the hypothesis. This type of hypothesis
is called left one-sided hypothesis. Problem of the hypothesis testing is either for
a given significance level find cu or for a given sample statistic find the observed
significance level (p-value).
Significance level
It is common in hypothesis testing to set probability of Type I error,  to some values
called the significance levels. These levels usually set to 0.1, 0.05 and 0.01. If null
hypothesis is true and probability of observing value of the current test statistic is
lower than the significance levels then hypothesis is rejected.
Consider an example. Let us say we have a sample from the population with normal
distribution N(,2). We want to test following null-hypothesis against alternative
hypothesis:
H0:  = 0 and H1:  < 0
This hypothesis is left one-sided hypothesis. Because all parameters of the distribution
(mean and variance of the normal distribution) have been specified it is a simple
hypothesis. Natural test statistic for this case is the sample mean. We know that
sample mean has normal distribution. Under null-hypothesis mean for this
distribution is 0 and standard deviation is /n. Then we can write:
  P ( X  cu )  P(
0
X  0 cu  0
c  0

)  P( Z  u
)
/ n / n
/ n
If we use the fact that Z is standard normal distribution (mean equal to 0 and variance
equal to 1) then using the tables of standard normal distribution we can solve this
equation.
Significance level: Cont.
Let us define:
z 
0  cu
/ n
Then we need to solve the equation (using standard tables or programs):
  P( Z   z )
Having found z we can solve the equation w.r.t cu.
0  cu
 z and cu  0  z / n
/ n
If the sample mean is less than this value of cu we would reject with significance level
. If sample mean is greater than this value then we would not reject nullhypothesis. If we reject (sample mean is smaller than cu) then we would say that if
the population mean would be equal to 0 then probability that we would observe
sample mean is .
To find the power of the test we need to find probability under condition that alternative
hypothesis is true.
Significance level: An example.
Let us have a look example we had in the beginning of the lecture. Consider differences between
A and B. The sample has a size of 10 and sample mean is -0.41. We assume that this
sample comes from the population with normal distribution with standard deviation 0.39..
We want to test the following hypothesis:
H0: =0, against H1: <0
We have 0 = 0. Let us set significance level to 0.05. Then from the table we can find that
z0.05=1.645 and we can find cu.
cu= 0 –z0.05 0.39/10 = 0-1.645 *0.39/3.16 = -0.2
Since the value of the sample mean (-0.41) belongs to the critical region (I.e. it is less than -0.2)
we would reject null-hypothesis with significance level 0.05 (as well as at the level 0.01).
Note that we could have used R function to get the same result:
qnorm(0.05,sd=0.39/sqrt(10))
Test we have performed was left one-sided test. I.e. we wanted to know if the value of the sample
mean is less than assumed value (0). Similarly we can build right one-sided tests and
combine these two tests and build two sided tests. Right sided tests would look like
H0: =0 against H1: >0
Then critical region would consist of interval [cl;). Where cl is the lower bound of the critical
region
And two sided test would look like
H0: =0 against H1: 0
Then critical region would consists combination of two intervals (-;cu] [cl;).
Power of a test
Power of a test depends on the alternative hypothesis.
To see the power of the current test let us use normal distribution. Null hypotheis: that true
mean is equal 0. Let us set alternative hypothesis: true mean is the observed difference
that is equal to -0.41. Again we assume that the probability distribution is normal with
known sd = 0.39. Sample size is 10. Significance level we want to test is 0.05 as in the
previous case. We found that the upper bound of the critical region is -0.2. To find the
power of the test we need to find P(x<-0.2 | 1=-0.41,=0.39). We can use R function to
calculate this:
pnorm(-0.2,mean=-0.41,sd=0.39/sqrt(10))
We divided standard deviation by the square root of 10 because of the sample size. This test
is very powerful - power is equal 0.96
Again if we do not know the standard deviation of the population then t distribution is more
appropriate (it is implemented in the command - power.t.test).
Note that if the distribution of the population (therefore each sample point and sample
statistics) is known then the power of the test with a given difference could be calculated
even before sample is available. Power of the test depends on alternative hypothesis. In
case of mean of normal distribution that is exprssed using the difference between
alternative and null hypothesis (1-0, where 0 is mean under null and 1 is under
alternative hypothesis), the sample size and standard deviations.
Power of test
Power of a test can be used before as well as after experimental data have been collected. Before
the experiment it is performed to find out the sample size to detect a given effect. It can be
used as a part of the design of an experiment.
After the experiment it uses the sample size, effect (e..g. observed difference between means),
standard deviation and calculates the power of the test.
For example if we want to detect difference between means equal to 1 (delta) in paired design
with power equal 0.9 at a significance level 0.05 in one sided test then we need around 10
observations.
It is done in R using the command
power.t.test(delta=1,sd=1,power=0.8,type='paired',alt='one.sided')
The result of R function:
Paired t test power calculation
n
delta
sd
sig.level
power
alternative
=
=
=
=
=
=
7.7276
1
1
0.05
0.8
one.sided
Critical regions and power
The table shows schematically relation between relevant probabilities under null
and alternative hypothesis.
do not reject
reject
Null hypothesis is true
1-
 (Type I error)
Null hypothesis is false
 (Type II error)
1- 
Composite hypothesis
In the above example we assumed that the population variance is known. It was simple
hypothesis (all parameters of the normal distribution have been specified). But in
real life it is unusual to know the population variance. If population variance is
not known the hypothesis becomes composite (hypothesis defines the population
mean but population variance is not known). In this case variance is calculated
from the sample and it replaces the population variance. Then instead of normal t
distribution with n-1 degrees of freedom is used. Value of z is found from the
table of the tn-1 distribution. If n (>100) is large then as it can be expected normal
distribution very well approximates t distribution.
Above example can be easily extended for testing differences between means of two
samples. If we have two samples from the population with equal but unknown
variances then tests of differences between two means comes to t distribution with
(n1+n2-2) degrees of freedom. Where n1 is the size of the first sample and n2 is the
size of the second sample.
If variances for both population would be known then test statistics for differences
between two means has a normal distribution.
P-value of the test
Usually instead of setting pre-defined significance level, observed p-value is reported. It
is also called observed significance level. Let us analyse it. Let us consider above
example when we had a sample of size 10 with the sample mean -0.41. We
assumed that we knew population variance – 0.39. P-value is calculated as
follows:
0.41 
P 0 (X  0.41)  P(Z 
/ n
0
)  P(Z  3.32)  0.0004
We would reject null-hypothesis with significance level 0.05, 0.01 etc. If the population
mean would be 0 with standard deviation 0.39 then observing -0.41 or less has the

probability
equal to 0.0004. In other words if would draw 10000 times sample of
size of 10 then around four times we would observe that mean value is less or
equal to -0.41.
In this example we assumed that we know the variance of the population. That is why
we used normal distribution. If we do not know the variance we should use t
(with degree of freedom equal to 10-1=9) distribution and p-value becomes 0.004.
Likelihood ratio test
Likelihood ratio test is one of the techniques to calculate test statistics. Let us assume that we
have a sample of size n (x=(x1,,,,xn)) and we want to estimate a parameter vector =( 1,2).
Both 1 and 2 can also be vectors. We want to test null-hypothesis against alternative one:
H 0 : 1  10 against H1 : 1  10
Let us assume that likelihood function is L(x| ). Then likelihood ratio test works as follows: 1)
Maximise the likelihood function under null-hypothesis (I.e. fix parameter(s) 1 equal to 10 ,
find the value of likelihood at the maximum, 2)maximise the likelihood under alternative
hypothesis (I.e. unconditional maximisation), find the value of the likelihood at the
maximum, then find the ratio:
ˆ ˆ
w  L( x | 10 ,ˆ2 ) / L( x | ˆ1 ,ˆ2 )
ˆ1 is the value of the paramater after constraine d (1  10 ) maximisati on
ˆ ˆ
ˆ1 ,ˆ2 are the values of the both parameters after unconstrai ned maximisati on
w is the likelihood ratio statistic. Tests carried out using this statistic are called likelihood ratio
tests. In this case it is clear that:
0  w 1
If the value of w is small then null-hypothesis is rejected. If g(w) is the the density of the
distribution for w then critical region can be calculated using:
c

 g (w)dw  
0
Hypothesis testing vs intervals
Some modern authors in statistics think that significance testing is overworked
procedure. It does not make much sense once we have observed the sample. Then
it is much better to work with confidence intervals. Since we can calculate
statistics related with the parameter we want to estimate then we can make
inference that where “true” value of the parameter may lie.
R commands produce predefined confidence intervals as well as p values. Usually pvalues are used in rejecting or not-rejecting a hypothesis.
R commands for tests
t.test - one, two-sample and paired t-test
var.test - test for equality of variances
power.t.test - to calculate power of t-test
Some other tests. These are nonparametric tests
wilcox.test - test for differences between means (works for one, two sample and paired
cases)
ks.test - Kolmogorov-Smirnov test for equality of distributions
Further reading
Full exposition of hypothesis testing and other statistical tests can be found in:
Stuart, A., Ord, JK, and Arnold, S. (1991) Kendall’s advanced Theory of statistics.
Volume 2A. Classical Inference and the Linear models. Arnold publisher,
London, Sydney, Auckland
Box, GEP, Hunter, WG, Hunter, JS (1978) Statistics for experimenters
Peter Dalgaard, (2008) Introductory statistics with R