Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Testing the hypothesis: unknown variance we have assumed so far that the random sample (the result of the experiment) was taken from a normal distribution with known mean and standard deviation therefore, the standard error of the mean was given by the formula: Y sample − Y z= sY √ n Testing the hypothesis: unknown variance (2) we have already seen how to calculate it: 1 ssample = ! (N − 1) " #N #% $ (Y − Y )2 i i=1 the standard error is therefore: t= Y sample − Y ssample √ n what if the standard deviation was NOT in fact known? this is a more realistic assumption only, this time we cannot compare this with the zα values we need to have an estimate, on the basis of the experiment as we are not sure of the sY value„ we cannot say that t can be estimated with a normal distribution COMP106 - lecture 23 – p.1/19 in fact, it can be shown that t can be estimated with a t distribution COMP106 - lecture 23 – p.2/19 t tests t distribution the t distribution has a shape similar to the normal (like a bell), but it has an extra parameter no panic: the procedure is still the same! only, instead of looking for zα values, we need to look for a tα value in the table of values for the t distribution it’s an integer number, which is called the degrees of freedom (DF) depending on this parameter, the bell has longer or shorter tails for example, these are the graphics of a t distribution with DF = 1, 10, 20 and 30 resp: these tables are given for various degrees of freedom the DF to consider is the size of the sample minus one (N − 1) this is coherent with the idea that if N is big enough, then ssample is more and more similar to sY (the value of the standard deviation of the normal distribution) so, for N big enough, a t distribution with N − 1 DF approximates the normal, and we are in fact using the normal distribution for our estimates the decision is taken as usual: we reject the H0 hypothesis if Rejection zone for DF big enough (greater than 30), the t distribution if quite similar to the normal COMP106 - lecture 23 – p.3/19 What if we want to test the standard deviation? H1 : Y > Y 0 H1 : Y < Y 0 H1 : Y %= Y 0 t ≥ tα (N − 1) t ≤ −tα (N − 1) |t| ≥ tα/2 (N − 1) COMP106 - lecture 23 – p.4/19 The procedure is still the same so far we have investigated changes in the mean we need to choose H0 and H1 this is not the only parameter we may test and we need to choose a confidence level α for instance: we perform our experiment with N users and we calculate the standard deviation of the sample the interface we are studying has an average time for completing a task with standard deviation s0 = 7 this means that the number of errors is quite scattered around the mean we want to test a new interface, to check if it’s more consistent, with the number of errors clustering more around the mean in this case the null hypothesis is H0 : sY = s0 1 ssample = ! (N − 1) " #N #% $ (Y − Y )2 i i=1 now we need a formula that puts together s0 and ssample while the alternative hypothesis is one of: 1. H1 : sY > s0 2. H1 : sY < s0 3. H1 : sY "= s0 it can be shown that the best test statistic in this case is: χ2 = COMP106 - lecture 23 – p.5/19 (N − 1)s2sample s2o COMP106 - lecture 23 – p.6/19 χ2 distribution χ2 test the χ2 or chi-square distribution is basically obtained when a number of independent normal distributions are squared and summed the number of distributions is again called the degrees of freedom (DF) of the χ2 the shape of the χ2 changes considerably with different DFs but you always have an asymmetrical shape, and all values are positive for example, these are the graphics of a χ2 distribution with DF = 1, 2, 5 and 10 resp: once again, we simply need to look for the appropriate χ2α value in the right table the degrees of freedom to look for are again N − 1 the only slight difference is that we cannot have negative numbers, so the rejection zones take this into account the decision is: we reject the H0 hypothesis if Rejection zone H1 : sY > s0 H1 : sY < s0 H1 : sY %= s0 χ2 ≥ χ2α (N − 1) χ2 ≤ χ21−α (N − 1) χ2 ≥ χ2α/2 (N − 1) AND χ2 ≤ χ21−α/2 (N − 1) COMP106 - lecture 23 – p.7/19 Back to DOE: single factor experiments in single factor experiments we want to test the impact of one input factor on the output variable e.g. how the screen size affects the typing speed we need to decide the number of "treatments" or "levels" we want to study for the input factors e.g. only two: Large screen and Small screen Note: the examples given for the hypothesis test could be seen as single factor experiment with only one treatment, as we compared the new interface feature against the old, given one we then perform our randomised experiment, with two groups of users let’s say we obtain that the average typing speed for the two groups is 8 keys per second with small screens 6 keys per second with large screens COMP106 - lecture 23 – p.9/19 we should now test the null hypothesis: Ho : YL0 = YS0 against one of the alternative hypotheses: 1. H1 : YL0 > YS0 2. H1 : YL0 < YS0 3. H1 : YL0 "= YS0 assuming the standard deviation of the typing speed functions is the same (this should be tested first) then we can use: YL − YS ! sYLS = sYL + sYS 2 YL1 , YL2 , . . . YLN for the large screen group & so the average speed is YL = N1 N i=1 YLi this should be the estimate of the "real" average speed for large screen users, let’s call it YL0 while the standard !& deviation is: N 2 sYL = √ 1 i=1 (YLi − YL ) (N −1) and YS1 , YS2 , . . . YSN for the small screen group & so the average speed is YS = N1 N i=1 YSi this should be the estimate of the "real" average speed for small screen users, let’s while the standard !& deviation is: N 2 sYS = √ 1 i=1 (YSi − YS ) (N −1) COMP106 - lecture 23 – p.10/19 this is a t statistic, with degrees of freedom DF = 2(N − 1) = 2N − 2 Rejection zone H1 : YL0 > YS0 H1 : YL0 < YS0 H1 : YL0 %= YS0 t ≥ tα (2N − 2) t ≤ −tα (2N − 2) |t| ≥ tα/2 (2N − 2) Important note 1: we have considered an experiment in which both groups have the same number of participants the formulae are a bit more complicated if this is not the case, but the overall procedure is the same 2 N where sYLS is a "combined" standard deviation, calculated as: ' let’s say that the typing speeds obtained in the experiment are: so we use the tα value table, and we decide to reject H0 (that the typing speed is not affected by the screen size) if: as usual, after choosing the confidence level α we need to find a formula that combines the two items of investigation sYLS Comparison of two means call it YS0 they are different, but are they significantly different statistically? t= COMP106 - lecture 23 – p.8/19 Important note 2: we have only considered two treatments, or levels, for the input factor for more than two levels (e.g. Screen size = 12in, 14in, 17in, 24in) the t test cannot be used. A procedure called ANOVA (ANalysis Of VAriance) should be used instead COMP106 - lecture 23 – p.11/19 COMP106 - lecture 23 – p.12/19 DOE techniques: Factorial Design Factorial design to study the effect of the three factors on the typing speed means to estimate all the eight β coefficients in the formula: when there are several factors to take into account, one needs to consider all possible combinations of levels Y for instance: we want to establish if menu length, familiarity of the menu items, and order of the menu items affect the search time Independent variables: · Menu length: 4 treatments (5, 10, 15 and 20 items per menu) · Word familiarity: 2 treatments (familiar and unfamiliar words) · Order of items: 2 treatments (alphabetical and random) Dependent variable: · search time = β0 + β1 Xlength + β2 Xfamil + β3 Xorder + β12 Xlength Xfamil + β13 Xlength Xorder + β23 Xfamil Xorder + β123 Xlength Xfamil Xorder there are therefore seven possible null hypotheses, each that could be tested against the usual alternative hypotheses 1. the mean, when considering length, does not change 2. the mean, when considering familiarity, does not change a full factorial design considering all possibilities should lead to an experiment with 4 ∗ 2 ∗ 2 = 16 groups of people, as follows: 3. the mean, when considering order, does not change 4. the mean, when considering length and familiarity, does not change groups 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 length 5 10 15 20 5 10 15 20 5 10 15 20 5 10 15 20 famil. F F F F U U U U F F F F U U U U order A A A A A A A A R R R R R R R R 5. the mean, when considering length and order, does not change 6. the mean, when considering order and familiarity, does not change 7. the mean, when considering length, familiarity, and order, does not change the entire experiment is performed with the ANOVA procedure COMP106 - lecture 23 – p.13/19 Fractional factorial design COMP106 - lecture 23 – p.14/19 Design techniques: Blocking and Screening even if each factor only had two levels (say, High and Low), the number of groups soon becomes very large a fractional factorial experiment is factorial experiment in which only an adequately chosen fraction of the treatment combinations required for the complete factorial experiment is selected to be run in general, we pick a fraction such as 12 , 14 , etc. of the runs determined by the full factorial there are various techniques for choosing the combinations to consider so that the result of the experiment is still significant although of course the precision of the result will not be as good as the full factorial blocking is used to eliminate the influence of nuisance factors when running an experiment these are factors that may affect the measured result, but are not of primary interest for example, the specific machine on which the experiment was run, the time of day the experiment was run, etc. the reason for blocking is to isolate a systematic effect and prevent it from obscuring the main effects blocking is a schedule for conducting treatment combinations such that any effects on the experimental results due to a known nuisance factor become concentrated in the levels of the blocking variable COMP106 - lecture 23 – p.15/19 the basic concept is to create homogeneous blocks in which the nuisance factors are held constant and the factor of interest is allowed to vary within blocks, the effect of different levels of the factor of interest is assessed without having to worry about variations due to changes of the block factors a randomized block experiment is a collection of completely randomized experiments, each run within one of the blocks of the total experiment. screening is another technique aimed at finding the few significant factors from a list of many potential ones when the experimental goal is to eventually fit a model (modelling experiment), the first experiment should be a screening design when there are many factors to consider special designs (e.g., Plackett-Burman designs) have been developed to screen such large numbers of factors in an efficient manner, that is, with the least number of observations necessary COMP106 - lecture 23 – p.17/19 COMP106 - lecture 23 – p.16/19 Central Composite Design (CCD) after deciding which are the important factors (e.g. with a screening technique) you want to find more precisely the factor values that produce the response you want a CCD is a fractional two-level factorial design i.e. a factorial design with each factors having two levels which is fractionalised to eliminate some of the combinations to which some more combinations are added: center points: you add a "zero" level to all factors) axial points: you consider the combinations where all factors but one are zero centerpoint runs are not randomised: they should begin and end the experiment, and should be dispersed as evenly as possible throughout the experiment this is because they are there as "guardians" against process instability and the best way to find instability is to sample the process on a regular basis as a rough guide, you should generally add approximately 3 to 5 centerpoint runs to a full or fractional factorial design COMP106 - lecture 23 – p.18/19