Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bootstrapping (statistics) wikipedia , lookup
Inductive probability wikipedia , lookup
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Eigenstate thermalization hypothesis wikipedia , lookup
Foundations of statistics wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Measuring errors Hypothesis test Let’s start with an example: with the current interface for a system the average time to complete a task is t = 5sec a new interface is proposed that should improve on this to test the new interface, it was agreed that n = 200 users would try out the new system let’s say that Y is the average time in which the 200 users completed the same task we establish that if Y < 5 then the new interface is better than the old BUT: we must consider that, even if we have observed Y < 5 for these 200 users, the real average time could still be 5sec or more! OR: even if we have observed Y > 5 for these 200 users, the real average time could be less than 5sec in the first case, we accept the new interface, while we should have rejected it (this is called a Type I error) in the second case, we reject the new interface, while we should have accepted it (this is called a Type II error) we want to measure the probability that these errors occur A hypothesis test is a method for establishing whether a claim about an experiment is reasonable the first hypothesis we want to consider is that nothing has changed (i.e. the new and the old interfaces have exactly the same performance) this is called the null hypothesis and is indicated with H0 in the example H0 : t = 5sec the second hypothesis is that there has in fact be a change in performances this is called the alternative hypothesis and is indicated with H1 in the example H1 : t < 5sec but one could also test H1 : t > 5 or even H1 : t != 5 COMP106 - lecture 22 – p.1/11 Hypothesis test (2) COMP106 - lecture 22 – p.2/11 Significance Level the hypothesis we want to test is therefore: we reject H0 and accept H1 if Y < 5sec and the errors are then: Type I error: rejecting H0 and accepting H1 when H0 is true Type II error: accepting H0 and rejecting H1 when H0 is false the degree of certainty one requires in order to reject the null hypothesis in favor of the alternative is called significance level and is indicated with α this is a probability measure (so it’s a number from 0 to 1, or from 0% to 100%) for instance, we can say that the significance level must be of α = 5% = 0.05 of course we will never be able to tell for sure if an error has been made that is, we request that the probability p that a Type I error occurred must be less than 0.05 we can estimate this with a certain "degree of certainty" the smaller α the more you are "protected" so, the rule would be that we reject H0 if: Y < 5 and p < α, that is p < 0.05 COMP106 - lecture 22 – p.3/11 Power of the test COMP106 - lecture 22 – p.4/11 Confidence intervals the probability of rejecting the null hypothesis when it is in fact false is called the power of the test and is denoted by 1−β another output that can be given for a hypothesis test is the confidence interval the notation is based on the probability value that we accept the null hypothesis when it is in fact false, which is indicated with β we give an estimate of whether the value is included in a given range, or interval the more powerful a test is, the better β cannot be chosen by the user but it can be calculated, although only in some cases a decrease in α leads to an increase in β and viceversa but both α and β depend on the sample size intuitively: the more users we contact for our experiment, the better the test COMP106 - lecture 22 – p.5/11 that is, instead of giving the probability that the true value differs from the observed one this is more useful when you repeat the experiment many times, so the observed value can vary he interval estimate gives an indication of how much uncertainty there is in our estimate of the true value the narrower the interval, the more precise is our estimate confidence intervals are given again with an associated confidence level α COMP106 - lecture 22 – p.6/11 Yes, but... how do I go about it? Confidence intervals (2) so, if α = 5%, we say that we have a 95% confidence interval for the value in the example, this means that: if we repeat the experiment a sufficiently large number of times, each time with 200 users and each time we calculate the average time to perform a task and each time we calculate a confidence interval for the true average in the long run, about 95% of these confidence intervals will in fact contain the true average note: so, an 95% confidence interval does not mean that there is a 95% probability that the interval contains the true average... we need to make some assumptions to make life easier for instance, suppose we want to concentrate on the value of the mean (as in the example before) we assume that the data comes from a normal distribution, for which we know the standard deviation this is a reasonable assumptions, for the Central Limit Theorem, provided that the sample size is big enough (5 users would not really do...) the null hypothesis is of the form H0 : Y = Y 0 the alternative hypothesis is one of: H1 : Y > Y 0 (the mean has increased) H1 : Y < Y 0 (the mean has decreased) H1 : Y != Y 0 (the mean has changed, we don’t know if it has increased or decreased) the first two hypothesis are said to be single tailed, while the third one is said double tailed COMP106 - lecture 22 – p.7/11 Steps in testing the hypothesis 1. we specify H0 and H1 e.g. H0 : Y = 0.5sec; and H1 : Y < 0.5sec 2. we choose a significance level e.g. α = 0.05 3. we perform a randomised experiment, with a sample of users e.g. n = 100 4. we calculate the mean of this sample ! Y sample = n1 ni=1 Yi let’s suppose Ysample = 0.45 COMP106 - lecture 22 – p.8/11 Steps in testing the hypothesis (2) 5. we calculate the standard error of the mean suppose the average time comes from a normal (bell shaped) distribution: mean Y = 0.5 standard deviation sY = 1.5 the standard error of the mean is given by the formula: z= in our case: z= COMP106 - lecture 22 – p.9/11 Steps in testing the hypothesis (3) 6. we compare this with the normal value for z according to the level of significance chosen, zα these values can be found in given tables, for instance z0.05 = 1.645 if z ≤ −zα we reject H0 in our case −0.33 ≥ −1.645 so we have to accept H0 for H1 : Y > Y 0 the rejection zone is z ≥ zα for H1 : Y != Y 0 the rejection zone is |z| >= zα/2 where |z| is the value of z with a positive sign COMP106 - lecture 22 – p.11/11 Y sample − Y sY √ n 0.45 − 0.5 √1.5 100 = −0.33 COMP106 - lecture 22 – p.10/11