Download Testing the hypothesis: unknown variance Testing the hypothesis

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Analysis of variance wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Testing the hypothesis: unknown variance
we have assumed so far that
the random sample (the result of the experiment) was
taken from a normal distribution
with known mean and standard deviation
therefore, the standard error of the mean was given by the
formula:
Y sample − Y
z=
sY
√
n
Testing the hypothesis: unknown variance (2)
we have already seen how to calculate it:
1
ssample = !
(N − 1)
"
#N
#%
$ (Y − Y )2
i
i=1
the standard error is therefore:
t=
Y sample − Y
ssample
√
n
what if the standard deviation was NOT in fact known?
this is a more realistic assumption
only, this time we cannot compare this with the zα values
we need to have an estimate, on the basis of the experiment
as we are not sure of the sY value„ we cannot say that t can be
estimated with a normal distribution
COMP106 - lecture 23 – p.1/19
in fact, it can be shown that t can be estimated with a t
distribution
COMP106 - lecture 23 – p.2/19
t tests
t distribution
the t distribution has a shape similar to the normal (like a bell),
but it has an extra parameter
no panic: the procedure is still the same!
only, instead of looking for zα values, we need to look for a tα
value in the table of values for the t distribution
it’s an integer number, which is called the degrees of freedom (DF)
depending on this parameter, the bell has longer or shorter tails
for example, these are the graphics of a t distribution with DF = 1, 10, 20 and 30 resp:
these tables are given for various degrees of freedom
the DF to consider is the size of the sample minus one (N − 1)
this is coherent with the idea that if N is big enough, then ssample is more and more
similar to sY (the value of the standard deviation of the normal distribution)
so, for N big enough, a t distribution with N − 1 DF approximates the normal, and we
are in fact using the normal distribution for our estimates
the decision is taken as usual: we reject the H0 hypothesis if
Rejection zone
for DF big enough (greater than 30), the t distribution if quite
similar to the normal
COMP106 - lecture 23 – p.3/19
What if we want to test the standard deviation?
H1 : Y > Y 0
H1 : Y < Y 0
H1 : Y %= Y 0
t ≥ tα (N − 1)
t ≤ −tα (N − 1)
|t| ≥ tα/2 (N − 1)
COMP106 - lecture 23 – p.4/19
The procedure is still the same
so far we have investigated changes in the mean
we need to choose H0 and H1
this is not the only parameter we may test
and we need to choose a confidence level α
for instance:
we perform our experiment with N users and we calculate the
standard deviation of the sample
the interface we are studying has an average time for completing a task with standard
deviation s0 = 7
this means that the number of errors is quite scattered around the mean
we want to test a new interface, to check if it’s more consistent, with the number of
errors clustering more around the mean
in this case the null hypothesis is H0 : sY = s0
1
ssample = !
(N − 1)
"
#N
#%
$ (Y − Y )2
i
i=1
now we need a formula that puts together s0 and ssample
while the alternative hypothesis is one of:
1. H1 : sY > s0
2. H1 : sY < s0
3. H1 : sY "= s0
it can be shown that the best test statistic in this case is:
χ2 =
COMP106 - lecture 23 – p.5/19
(N − 1)s2sample
s2o
COMP106 - lecture 23 – p.6/19
χ2 distribution
χ2 test
the χ2 or chi-square distribution is basically obtained when a
number of independent normal distributions are squared and
summed
the number of distributions is again called the degrees of freedom (DF) of the χ2
the shape of the χ2 changes considerably with different DFs
but you always have an asymmetrical shape, and all values are positive
for example, these are the graphics of a χ2 distribution with DF = 1, 2, 5 and 10 resp:
once again, we simply need to look for the appropriate χ2α
value in the right table
the degrees of freedom to look for are again N − 1
the only slight difference is that we cannot have negative
numbers, so the rejection zones take this into account
the decision is: we reject the H0 hypothesis if
Rejection zone
H1 : sY > s0
H1 : sY < s0
H1 : sY %= s0
χ2 ≥ χ2α (N − 1)
χ2 ≤ χ21−α (N − 1)
χ2 ≥ χ2α/2 (N − 1)
AND χ2 ≤ χ21−α/2 (N − 1)
COMP106 - lecture 23 – p.7/19
Back to DOE: single factor experiments
in single factor experiments we want to test the impact of one
input factor on the output variable
e.g. how the screen size affects the typing speed
we need to decide the number of "treatments" or "levels" we
want to study for the input factors
e.g. only two: Large screen and Small screen
Note: the examples given for the hypothesis test could be seen as single factor
experiment with only one treatment, as we compared the new interface feature
against the old, given one
we then perform our randomised experiment, with two groups
of users
let’s say we obtain that the average typing speed for the two groups is
8 keys per second with small screens
6 keys per second with large screens
COMP106 - lecture 23 – p.9/19
we should now test the null hypothesis: Ho : YL0 = YS0
against one of the alternative hypotheses:
1. H1 : YL0 > YS0
2. H1 : YL0 < YS0
3. H1 : YL0 "= YS0
assuming the standard deviation of the typing speed functions
is the same (this should be tested first) then we can use:
YL − YS
!
sYLS =
sYL + sYS
2
YL1 , YL2 , . . . YLN for the large screen group
&
so the average speed is YL = N1 N
i=1 YLi
this should be the estimate of the "real" average speed for large screen users, let’s
call it YL0
while the standard
!& deviation is:
N
2
sYL = √ 1
i=1 (YLi − YL )
(N −1)
and YS1 , YS2 , . . . YSN for the small screen group
&
so the average speed is YS = N1 N
i=1 YSi
this should be the estimate of the "real" average speed for small screen users, let’s
while the standard
!& deviation is:
N
2
sYS = √ 1
i=1 (YSi − YS )
(N −1)
COMP106 - lecture 23 – p.10/19
this is a t statistic, with degrees of freedom
DF = 2(N − 1) = 2N − 2
Rejection zone
H1 : YL0 > YS0
H1 : YL0 < YS0
H1 : YL0 %= YS0
t ≥ tα (2N − 2)
t ≤ −tα (2N − 2)
|t| ≥ tα/2 (2N − 2)
Important note 1: we have considered an experiment in
which both groups have the same number of participants
the formulae are a bit more complicated if this is not the case, but the overall procedure
is the same
2
N
where sYLS is a "combined" standard deviation, calculated as:
'
let’s say that the typing speeds obtained in the experiment are:
so we use the tα value table, and we decide to reject H0 (that
the typing speed is not affected by the screen size) if:
as usual, after choosing the confidence level α we need to find
a formula that combines the two items of investigation
sYLS
Comparison of two means
call it YS0
they are different, but are they significantly different statistically?
t=
COMP106 - lecture 23 – p.8/19
Important note 2: we have only considered two treatments,
or levels, for the input factor
for more than two levels (e.g. Screen size = 12in, 14in, 17in, 24in) the t test cannot be
used. A procedure called ANOVA (ANalysis Of VAriance) should be used instead
COMP106 - lecture 23 – p.11/19
COMP106 - lecture 23 – p.12/19
DOE techniques: Factorial Design
Factorial design
to study the effect of the three factors on the typing speed
means to estimate all the eight β coefficients in the formula:
when there are several factors to take into account, one needs
to consider all possible combinations of levels
Y
for instance: we want to establish if menu length, familiarity of the menu items, and
order of the menu items affect the search time
Independent variables:
· Menu length: 4 treatments (5, 10, 15 and 20 items per menu)
· Word familiarity: 2 treatments (familiar and unfamiliar words)
· Order of items: 2 treatments (alphabetical and random)
Dependent variable:
· search time
=
β0 + β1 Xlength + β2 Xfamil + β3 Xorder +
β12 Xlength Xfamil + β13 Xlength Xorder + β23 Xfamil Xorder +
β123 Xlength Xfamil Xorder
there are therefore seven possible null hypotheses, each that
could be tested against the usual alternative hypotheses
1. the mean, when considering length, does not change
2. the mean, when considering familiarity, does not change
a full factorial design considering all possibilities should lead to an experiment with
4 ∗ 2 ∗ 2 = 16 groups of people, as follows:
3. the mean, when considering order, does not change
4. the mean, when considering length and familiarity, does not change
groups
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
length
5
10
15
20
5
10
15
20
5
10
15
20
5
10
15
20
famil.
F
F
F
F
U
U
U
U
F
F
F
F
U
U
U
U
order
A
A
A
A
A
A
A
A
R
R
R
R
R
R
R
R
5. the mean, when considering length and order, does not change
6. the mean, when considering order and familiarity, does not change
7. the mean, when considering length, familiarity, and order, does not change
the entire experiment is performed with the ANOVA procedure
COMP106 - lecture 23 – p.13/19
Fractional factorial design
COMP106 - lecture 23 – p.14/19
Design techniques: Blocking and Screening
even if each factor only had two levels (say, High and Low),
the number of groups soon becomes very large
a fractional factorial experiment is factorial experiment in
which only an adequately chosen fraction of the treatment
combinations required for the complete factorial experiment is
selected to be run
in general, we pick a fraction such as 12 , 14 , etc. of the runs
determined by the full factorial
there are various techniques for choosing the combinations to
consider so that the result of the experiment is still significant
although of course the precision of the result will not be as
good as the full factorial
blocking is used to eliminate the influence of nuisance
factors when running an experiment
these are factors that may affect the measured result, but
are not of primary interest
for example, the specific machine on which the
experiment was run, the time of day the experiment was
run, etc.
the reason for blocking is to isolate a systematic effect and
prevent it from obscuring the main effects
blocking is a schedule for conducting treatment combinations
such that any effects on the experimental results due to a
known nuisance factor become concentrated in the levels of
the blocking variable
COMP106 - lecture 23 – p.15/19
the basic concept is to create homogeneous blocks in which
the nuisance factors are held constant and the factor of interest
is allowed to vary
within blocks, the effect of different levels of the factor of
interest is assessed without having to worry about variations
due to changes of the block factors
a randomized block experiment is a collection of completely
randomized experiments, each run within one of the blocks of
the total experiment.
screening is another technique aimed at finding the few
significant factors from a list of many potential ones
when the experimental goal is to eventually fit a model
(modelling experiment), the first experiment should be a
screening design when there are many factors to consider
special designs (e.g., Plackett-Burman designs) have been developed to screen such
large numbers of factors in an efficient manner, that is, with the least number of
observations necessary
COMP106 - lecture 23 – p.17/19
COMP106 - lecture 23 – p.16/19
Central Composite Design (CCD)
after deciding which are the important factors (e.g. with a
screening technique) you want to find more precisely the
factor values that produce the response you want
a CCD is a fractional two-level factorial design
i.e. a factorial design with each factors having two levels which is fractionalised to
eliminate some of the combinations
to which some more combinations are added:
center points: you add a "zero" level to all factors)
axial points: you consider the combinations where all factors but one are zero
centerpoint runs are not randomised:
they should begin and end the experiment, and should be dispersed as evenly as
possible throughout the experiment
this is because they are there as "guardians" against process instability and the best way
to find instability is to sample the process on a regular basis
as a rough guide, you should generally add approximately 3 to 5 centerpoint runs to a
full or fractional factorial design
COMP106 - lecture 23 – p.18/19