Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Hypothesis Testing Week 9 Objectives On completion of this module you should be able to: • understand and demonstrate the process required for hypothesis testing, • explain the difference between one- and two-tailed tests, • perform a hypothesis test for the mean and proportion and • consider ethical issues relating to hypothesis testing 2 Hypothesis testing methodology • We test a hypothesis (a theory, claim, assertion etc) about a parameter of a population (the mean or proportion). • The null hypothesis (denoted H0) is used to indicate the status quo. • It always contains the equals sign. • For example, if we assume that the mean income of accountants is $75,000 the null hypothesis is H0: μ = 75000 3 Hypothesis testing methodology • The alternative hypothesis (denoted H1 or Ha) must cover all cases where the null hypothesis is false. • It represents the conclusion reached if the null hypothesis is found to be false (based on sample information). • In our mean income of accountants example the alternative hypothesis is H1: μ 75000 4 Hypothesis testing methodology • We reject the null hypothesis when there is evidence in sample data that the alternative is far more likely to be true. • Failing to reject the null hypothesis does not prove that it is true, but rather that there is insufficient evidence in the sample to prove that it is not true. • Because the conclusion is based only on a sample, we never prove that the null hypothesis is true. 5 Hypothesis testing methodology • We reject (or fail to reject) the null hypothesis based on a test statistic (found using sample data) and on rejection regions. Critical Critical value value Region of Region of Region of rejection rejection nonrejection 6 Risks in decision making • Type I error – rejecting the null hypothesis when it should not be rejected. The probability of a type I error is . • Type II error – not rejecting the null hypothesis when it should be rejected. The probability of a type II error is . • Confidence coefficient – probability of not rejecting the null hypothesis when it should not be rejected: 1 – . • Power of a test – probability of rejecting null hypothesis when it should be rejected: 1 – . 7 Example 9-1 In the Australian legal system, the accused is considered innocent until proven guilty. Using this information, state the null and alternative hypotheses and discuss the type I and II errors (which should be the larger value, which should be the smaller value and why?). Solution • We assume innocence so this forms the null hypothesis. • We try to prove guilt, so this forms the alternative hypothesis. 8 Example 9-1 • So the hypotheses are: H0: the accused is innocent H1 : the accused is guilty • Type I and type II errors can best be understood with a picture… 9 Truth: Verdict: Guilty Not guilty Guilty Not guilty Guilty Not guilty Acceptable outcome Type II error Type I error Acceptable outcome 10 Example 9-1 • We want to minimise all errors and so keep both and small. • But, they are inversely related – as one increases, the other decreases. • Australian society normally demands minimising type I errors since we are less tolerant of convicting innocent people. • Consequence: a (hopefully small) portion of accused will be found not guilty when they are guilty. 11 Z test of hypothesis for the mean ( known) • If the population standard deviation, , is known, for large enough samples, the sampling distribution of the mean follows the normal distribution. • Then, the test statistic is given by: Z X n 12 Example 9-2 A recent graduate from a business degree is considering the benefits of working for various companies. A particular company, Touccancy Inc., has a reputation for treating its employees well. In fact, the company’s website gives some statistics about starting salaries for recent graduates. It claims (in large print) that the mean starting salary for recent graduates is $45,000. 13 Example 9-2 In much smaller print, buried in a report on statistics, the website states that the known standard deviation of starting salaries is $5000 and that in a random sample of fifty recently employed graduates, the mean salary was $39,000. (a) At the 5% level of significance, determine if there is any evidence that the claim given in large print is valid, based on the sample data. Use both the critical value and p-value approaches as part of your answer. 14 Solution 9-2 Following Exhibit 8.2 from the text (p. 339): 1. The null hypothesis is: H0: μ = 45000 2. and the alternative hypothesis is: H1: μ 45000 3. Level of significance is 5%, so = 0.05. 4. n = 50 (fifty recently employed graduates were sampled). 5. Because = 5000 is known, we can use the Z test. 15 The critical value approach • The hypothesis test in Example 9-2 is a twotailed test. • We will reject the null hypothesis if the sample mean is significantly different from $45000. • This could be if it is too big, or too small (hence the two tails). • To find the critical regions for a two-tailed test, we divide the level of significance in to two parts. 16 The critical value approach 2 = 0.025 – 1.96 Rejection region Critical value 0.95 2 0 Acceptance region 45000 = 0.025 + 1.96 Z Rejection region Critical value X 17 Solution 9-2 6. From this graph we can see that the decision rule is: Reject H0 if Z > +1.96 of if Z < –1.96. Z is the test statistic obtained from sample data. 7. Given that X 39000 , the test statistic is: Z X 39000 45000 8.49 5000 n 50 18 Solution 9-2 8. Since –8.49 < –1.96, the test statistic falls in the rejection region. 9. We reject the null hypothesis. 10.We can conclude that the mean starting salary is significantly different from $45,000. 19 10 steps of hypothesis testing 1. 2. 3. 4. 5. 6. State the null hypothesis. State the alternative hypothesis. Choose the level of significance. Find the sample size. Determine the appropriate test statistic. Set up critical values and define rejection region. 7. Compute test statistic. 8. Compare test statistic to rejection region. 9. Make statistical decision. 10.Express statistical decision in the context of the problem. 20 Example 9-2 • Using the p-value approach, steps one to six are the same as for the critical value approach. 7. The p-value is the probability of being more extreme than the test statistic: P Z 8.49 P Z 8.49 0 8. Determine if p-value is less than (in this case 0 < 0.05!!). 9. Since the p-value is less than , reject the null hypothesis. 10.Conclusion is as before… 21 10 steps of hypothesis testing: p-value approach 1. State the null hypothesis. 2. State the alternative hypothesis. 3. Choose the level of significance. 4. Find the sample size. 5. Determine the appropriate test statistic. 6. Compute sample value of test statistic. 7. Compute the p-value based on the test statistic. 8. Compare p-value to α. 9. Make statistical decision. 10.Express statistical decision in the context of the problem. 22 Example 9-2 (b) Determine the 95% confidence interval for the population mean starting salary and compare this to your answer to (a). Solution • The confidence interval is given by: 5000 X Z 39000 1.96 n 50 39000 1385.93 $37,614.07 $40,385.93 23 Solution 9-2 • The confidence interval does not contain the company’s claimed mean starting salary. • It provides further evidence that the company’s claim is incorrect based on this sample. 24 One-tail tests • With one-tail tests, the hypotheses focus on a particular direction. • The null hypothesis is rejected only if the test statistic is significantly large or significantly small (depending on the hypothesis). • The rejection region is only in one tail of the distribution. 25 Example 9-3 An investment advisory company has been having problems with the printery which is responsible for the printing of the company’s weekly stock reports. The company requires that the printing of the report be completed within twenty-four hours of receipt of the necessary files and documentation and specifies a standard deviation of two hours. They are prepared to employ a different printery if they can establish statistically that the printery is not meeting their requirements. 26 Example 9-3 A sample of thirty recent printing times for reports has a mean printing time of twenty-five hours. (a) If the company tests the hypothesis at the 1% level of significance, what decision would be made using the p-value approach to hypothesis testing? Interpret the meaning of the p-value in this problem. 27 Solution 9-3 Following the ten-step procedure… Note: this is what we are trying to prove. 1. H0: μ ≤ 24 We assume the printery is doing okay and try to prove otherwise. 2. H1: μ > 24 3. = 0.01 4. n = 30 5. = 2 is known so use Z-test. 6. Given X 25 the test statistic is: X 25 24 Z 2.74 2 n 30 28 Solution 9-3 7. The p-value is: P Z 2.74 1 0.9969 0.0031 8. 0.0031 < 0.01 9. Since the p-value is less than , we reject the null hypothesis. 0.99 = 0.01 0 Acceptance region Z Rejection region 29 Solution 9-3 10.We conclude that there is evidence that the printery is taking more than the company’s required 24 hours. Based on this sample data, the company appears to be justified in seeking another printery for their reports. (b) How would your answer in (a) change if the standard deviation had been three hours? 30 Solution 9-3 • If = 3, the test statistic would be: X 25 24 Z 1.83 3 n 30 • and the p-value would be: P Z 1.83 1 0.9664 0.0336 • Then, since 0.0336 0.01 we would not reject the null hypothesis at the 1% level of significance. • In this case, the company would not be justified in seeking another printery. 31 Solution 9-3 • Important note: we could still have rejected this null hypothesis at the 5% level of significance (since 0.0336 < 0.05). • When a test is significant at the 1% level it is described as highly significant. • When a test is significant at the 5% level it is described as significant. • A decision such as this one can therefore be based on how confident the company wanted to be that they were making the right decision (based on this sample data). 32 t test of hypothesis for the mean ( unknown) • If the population standard deviation, , is unknown, we estimate it using S, the sample standard deviation. • If the population is assumed to be normally distributed, the sampling distribution of the mean follows a t distribution with n-1 degrees of freedom. • Then, the test statistic is given by: X t S n 33 Example 9-4 A firm of accountants has been established in a small regional town for twenty years. Part of their service includes regular visits to the firms they service to ensure they keep a customer service focus. The accountants must still be contactable while they are away from their desks, so mobile phones are provided. 34 Example 9-4 They have discovered, however, that the useful life of the phones is very much reduced by the shortness of the phone’s battery life. In the past, the batteries gave an average talk time of twenty hours, after which time the phone needed recharging. A sample of forty batteries recently revealed an average talk time of eighteen hours, with a standard deviation of four hours. 35 Example 9-4 (a) At the 0.05 significance level, is there evidence that the mean talk time of the batteries has changed from twenty hours? (b) What assumptions did you make regarding the population distribution in answering (a)? Explain how you would test these assumptions if you had the talk time data. 36 Solution 9-4 Following the 10 steps of hypothesis testing: 1. and 2. H0: μ ≥ 20 H1: μ < 20 3. Given =0.05. 4. A random sample of n=40 batteries has been drawn. 37 Solution 9-4 5. If we assume the population of talk times of the batteries is normally distributed, then the t test is appropriate since the population standard deviation of the talk times is unknown (we’re given the sample value). 6. Given a sample size of n, the test statistic follows a t distribution with n-1 degrees of freedom. At =0.05, the critical value will be t39,0.05 = –1.6849 and so the rejection rule is: Reject H0 if t < –1.6849, otherwise do not reject H0. 38 Solution 9-4 0.05 0.95 – 1.6849 Region of rejection 0 t11 Region of non-rejection 39 Solution 9-4 7. The test statistic is X 18 20 t 3.16 S 4 n 40 Note that we are using the critical value approach for this hypothesis test. Check the p-value approach for yourself. 8. Since -3.16 < -1.6849, the test statistic is in the rejection region. 40 Solution 9-4 9. Reject the null hypothesis. 10.The data provides sufficient evidence to conclude that the mean talk time provided by the batteries is significantly less than 20 hours. 41 Solution 9-4 (b) Assumed – random sample of talk times of batteries comes from a normally distributed population. If sample size is reasonably large and population is not too skewed, then (for unknown ) the t distribution is a good approximation to the sampling distribution of the mean. We test the assumption of normality by examining the sample data. If it appears normal we infer that the population is likely to be normally distributed. 42 Solution 9-4 We can test the assumption of normality by producing: • a histogram or stem-and-leaf plot and checking for the bell shape • a normal probability plot and checking for departures from the straight line • a box-and-whisker plot and checking for symmetry 43 Z test of hypothesis for the proportion • We often want to test hypotheses regarding the population proportion using the sample proportion ps=X/n. • If the number of successes (X) and the number of failures (n-X) are each at least five, the sampling distribution of the a proportion approximately follows the standardised normal distribution. pS p Z • The test statistic is p 1 p n 44 Z test of hypothesis for the proportion • If we were to conduct a Z test for the number of successes, the test statistic is X np Z np 1 p 45 Example 9-5 For many years, women have been under represented in management positions. A twenty year old study revealed that only 18% of companies had at least one woman in their management team. A recent study was conducted to discover whether this situation had changed. A survey was sent out to 500 of the largest companies across Australia. 46 Example 9-5 Of these, 23% (115) indicated that they had at least one woman in their management team. At the 5% level of significance, can you state that there has been an increase in the proportion of companies who include women in their management teams? Solution H0: p ≤ 0.18 H1: p > 0.18 47 Solution 9-5 At =0.05, the critical value will be Z=1.64 and so the rejection rule is: Reject H0 if Z > 1.64 otherwise do not reject H0. The test statistic is Z pS p p 1 p n 0.23 0.18 0.18 0.82 500 2.91 48 Solution 9-5 • Note we have used the critical value approach. • Check for yourself that the p-value approach gives the same conclusion. • Since 2.91 > 1.64, the test statistic is in the rejection region and so we reject the null hypothesis. • Therefore, the data provides sufficient evidence to conclude that there has been an increase in the proportion of companies who include women in their management teams. 49 Pitfalls and ethical issues • It is always advisable to consult an experienced statistician in the planning stage of a study. This assists in avoiding biased results (due to poor planning, faulty sampling frame etc). • Poor research methods, however, are not necessarily an indication of unethical behaviour. • Unethical behaviour involves intentional manipulation of analysis and results. 50 Pitfalls and ethical issues To avoid ethical problems consider: • Data collection method – ensure appropriate randomisation techniques are used in selecting a sample. • Informed consent from human respondents – individuals who are subjected to a treatment should be made aware of the research purpose, and any potential side-effects and give consent to their involvement. 51 Pitfalls and ethical issues • Choosing a one- or two-tailed test: – If you are only interested in differences (and not whether result is smaller or larger) then use two-tailed test. – If you are interested in showing a result is too large or too small then use one-tailed test. • Choice of significance level () – select before data is collected. • It is good practice to include the p-value in results. 52 Pitfalls and ethical issues • Data snooping – Don’t perform a test and then change one/two-tailed or significance level to get a desired result! – Don’t discard outliers to change the test result! – Always decide on hypotheses, whether a test is one- or two-tailed and the significance level, before collecting data. 53 Pitfalls and ethical issues • Data cleansing – Prior to analysis, check unusual observations for validity or special causes. – Only remove data where you can prove there is an error or unusual behaviour unrelated to the study (this is rare). – Decide on rules for data cleansing prior to data collection. 54 Pitfalls and ethical issues • Reporting findings: – Report good and bad results. – Note that if a null hypothesis is not rejected, this does not prove it is true, just that there is insufficient evidence to prove that it is not true. • Statistical significance does not imply practical significance in the field of application – discuss results with the experts in the field!!! 55 After the lecture each week… • Review the lecture material • Complete all readings • Complete all of recommended problems (listed in SG) from the textbook • Complete at least some of additional problems • Consider (briefly) the discussion points prior to tutorials 56