* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
1 i 8.Hypothesis Testing. Hypothesis testing is a process in which sample statistics are used to determine whether a statement about the value of a population parameter should be accepted or rejected. A statement about a population parameter is called a statistical hypothesis. Traditionally, two hypotheses are considered: 𝐻0 – null hypothesis and 𝐻𝑎 - alternative hypothesis. Definition. A null hypothesis 𝑯𝟎 contains a statement of equality, such as ≤, =, or ≥. The alternative hypothesis 𝑯𝒂 is the complement of the null hypothesis, and it contains a statement of strict inequality, such as >, ≠, or <. The conclusion that 𝐻𝑎 is true can be made if the sample data indicate that 𝐻0 is false. Generally, a hypothesis test about the values of the population mean 𝜇 may take one of the following three forms: 𝐻0 : 𝜇 ≤ 𝜇0 𝐻0 : 𝜇 ≥ 𝜇0 𝐻0 : 𝜇 = 𝜇0 𝐻𝑎 : 𝜇 > 𝜇0 𝐻𝑎 : 𝜇 < 𝜇0 𝐻𝑎 : 𝜇 ≠ 𝜇0 Similar statements can be formulated to test other population parameters. No matter which of the three forms of hypothesis test is used, you always begin by assuming that the equality condition in the null hypothesis is true. After performing the hypothesis test, you will take one of two decisions: 1. Reject the null hypothesis, or 2. Accept (fail to reject) the null hypothesis. Since the decision is based on sample information, it is always possible to make a wrong decision. The null hypothesis might be rejected when it is actually true, or it might be accepted when it is actually false. The following two kinds of errors can be made in hypothesis testing. Definition. The type I error is made if the null hypothesis is rejected when it is true. The type II error is made if the null hypothesis is not rejected when it is false. 2 i Thus, we have the following four results of a hypothesis testing. State of Things Decision: Do not reject 𝐻0 Reject 𝐻0 𝐻0 is true 𝐻0 is false correct decision Type II error Type I error correct decision Although we cannot eliminate the possibility of errors while performing a hypothesis test, we can indicate the probability of their occurrence. The common notations are: 𝛼 - the probability of making a Type I error. 𝛽 - the probability of making a Type II error. Most applications of hypothesis testing control for the probability of making a Type I error and do not always control for the probability of making Type II error. That is why, in order to avoid the risk of making a Type II error, the statement “Do not reject” is used instead of “Accept”. Definition. The maximum allowable probability of making a Type I error is called level of significance for the test. Common choices for the level of significance are 𝛼 =0.1, 𝛼 =0.05, 𝛼 =0.01. The lower the level of significance, the smaller is the probability of rejecting a true null hypothesis. After stating the null and alternative hypotheses, and specifying the level of significance, the next step is selecting a random sample from the population and calculating the sample statistics. The statistic used to estimate the parameter in the null hypothesis is called the test statistic. The following table shows the population parameters and corresponding test statistics. Population Test Parameter Statistic 𝜇 𝑥̅ Standardized Test Statistic z ( n≥30 ) or t ( n<30) 3 i p 𝑝̅ z 𝜎2 𝑠2 𝜒2 One way to decide whether to reject the null hypothesis is to check if the standardized test statistic falls within a rejection region of the sampling distribution. Definition. A rejection region (or critical region) of the sampling distribution is the range of values of the test statistic for which the null hypothesis is rejected. The value of test statistic that establishes the boundary of the rejection region, is called the critical value. * Tests about a Population Mean. The following is the procedure of conducting a hypothesis test for a population mean (largesample case). 1. State the null and alternative hypotheses 𝐻0 and 𝐻𝑎 . 2. Specify the level of significance 𝛼. 3. Use the Standard Normal Table, to determine the critical value 𝑧0 (values ±𝑧0 ). 4. Sketch the graph of the normal curve and indicate the rejection region. If the rejection region is placed in only lower tail (or upper tail) of the sampling distribution, then we say the test is a left-tailed (or right-tailed) hypothesis test. If the rejection region is placed in both the lower and the upper tails of the sampling distribution, then we say the test is a two-tailed hypothesis test. 5. Find the standardized test statistic (for the population mean it is called z-statistic): 𝑧= 𝑥̅ −𝜇0 𝜎 ⁄√𝑛 if 𝜎 is known, or 𝑧 = 𝑥̅ −𝜇0 𝑠⁄√𝑛 if 𝜎 is unknown. 6. Make a decision: if 𝑧 is in the rejection region, reject 𝐻0 , otherwise fail to reject 𝐻0 . Assume now, that the sample size is small (𝑛 <30) and the population standard deviation 𝜎 is unknown. If the population has a normal distribution, you can use the 𝑡-distribution to make inferences about the population mean. The test statistic for the mean is 4 i 𝑡= 𝑥̅ −𝜇0 𝑠⁄√𝑛 (called t-statistic), where 𝑠 is the sample standard deviation. This statistic has t- distribution with n−1 degrees of freedom. The procedure of conducting a hypothesis test for a population mean for the small-sample case is as follows. 1. State the null and alternative hypotheses 𝐻0 and 𝐻𝑎 . 2. Specify the level of significance 𝛼. 3. Identify the degrees of freedom, d.f.= 𝑛 −1. 3. Use the T-Distribution Table, to determine the critical value 𝑧0 (values ±𝑧0 ). 4. Determine the rejection region. 5. Find the standardized test statistic (t-statistic): 𝑡= 𝑥̅ −𝜇0 . 𝑠⁄√𝑛 6. Make a decision: if 𝑡 is in the rejection region, reject 𝐻0 , otherwise fail to reject 𝐻0 . * Tests about a Population Proportion. The three forms for a hypothesis test about a population proportion 𝑝 are as follows. 𝐻0 : 𝑝 ≤ 𝑝0 𝐻0 : 𝑝 ≥ 𝑝0 𝐻0 : 𝑝 = 𝑝0 𝐻𝑎 : 𝑝 > 𝑝0 𝐻𝑎 : 𝑝 < 𝑝0 𝐻𝑎 : 𝑝 ≠ 𝑝0 If 𝑛𝑝̅ ≥ 5 and 𝑛(1 − 𝑝̅ ) ≥ 5, then the sampling distribution of 𝑝̅ is approximately normal with an expected value of 𝐸𝑝̅ = 𝑝 and a sample standard deviation of 𝜎𝑝̅ = √ 𝑧-test about a population proportion can be used: 𝑧= 𝑝̅ −𝑝0 . 𝜎𝑝 ̅ 𝑝̅ (1−𝑝̅ ) . 𝑛 Consequently, the following 5 i * Inference about Means and Proportions with two Populations. You will learn further how to test a claim about the difference between the same parameters from two populations. For instance, you may want to conduct a hypothesis test to find whether there is any difference between educational qualities provided at two high schools, or test the difference between the proportions of defective parts supplied by two factories. The type of test to be used is determined by the sizes of the samples selected from the two populations, as well as by the fact of dependence or independence of the respective samples. Definition. Two samples are called independent if they are selected from two different populations and are not related one to another. Two samples are called dependent or matched if each element of one sample corresponds to an element of the other sample. For instance, if you select randomly 100 graduates from university A and 90 graduates from university B, and test their qualification level, you obtain two independent samples. But if you select randomly 70 freshmen from a university and measure their qualification level, then, after 3years, test the same sample of students for their qualification level , then you have dependent (or matched) samples. For a claim about two population parameters 𝜇1 and 𝜇2 , the possible pairs of null and alternative hypotheses are 𝐻0 : 𝜇1 = 𝜇2 𝐻0 : 𝜇1 ≥ 𝜇2 𝐻0 : 𝜇1 ≤ 𝜇2 𝐻𝑎 : 𝜇1 ≠ 𝜇2 𝐻𝑎 : 𝜇1 < 𝜇2 𝐻𝑎 : 𝜇1 > 𝜇2 Difference between the means of two populations. Independent samples. Let us consider the hypotheses tests about the difference between the means of two populations for independent samples. If each sample size is at least 30, then the sampling distribution of the difference of the sample means, 𝑥̅1 − 𝑥̅2 , can be approximated by a normal probability distribution with mean and standard deviation as follows. 𝐸(𝑥̅1 − 𝑥̅2 ) = 𝜇1 − 𝜇2 𝜎𝑥̅1 −𝑥̅2 = √𝜎𝑥̅1 2 + 𝜎𝑥̅2 2 . Since the sampling distribution of 𝑥̅1 − 𝑥̅2 is normal, we can use the 𝑧-test with the standardized test statistic of the form 6 i 𝑧= (𝑥̅1 −𝑥̅2 )−(𝜇1 −𝜇2 ) . √𝜎𝑥̅1 2 +𝜎𝑥̅2 2 In real life it is often impractical to collect samples of large size. To test the difference between the means of two small independent samples we assume that both populations have normal probability distribution. With this condition, the sampling distribution for the difference of sample means 𝑥̅1 − 𝑥̅2 is approximated by a 𝑡-distribution with mean 𝜇1 − 𝜇2 .The standard deviation and the degrees of freedom depend on whether the population standard deviations 𝜎1 and 𝜎2 are equal or not. The t-statistic for the difference between two population means 𝜇1 and 𝜇2 is 𝑡= (𝑥̅1 −𝑥̅2 )−(𝜇1 −𝜇2 ) , where 𝑠𝑥̅1 −𝑥̅2 is the standard deviation of 𝑥̅1 − 𝑥̅2 . 𝑠𝑥̅1 −𝑥̅2 If the population variances are equal, then 1 1 𝑠𝑥̅1 −𝑥̅2 = √𝑠 2 (𝑛 + 𝑛 ), where 𝑠 2 is a weighted average of the two sample variances 𝑠1 2 and 1 𝑠2 2 : 𝑠 2 = 2 (𝑛1 −1)𝑠1 2 +(𝑛2 −1)𝑠2 2 𝑛1 +𝑛2 −2 , and d.f.= 𝑛1 + 𝑛2 − 2. If the population variances are not equal, then the standard error is 𝑠 2 𝑠𝑥̅1 −𝑥̅2 = √( 𝑛1 + 1 𝑠2 2 ) 𝑛2 and d.f.= 𝑚𝑖𝑛(𝑛1 − 1, 𝑛2 − 1). Difference between the means of two populations. Matched samples. To perform a two-sample hypothesis test with dependent samples, we must use a different technique. First find the difference 𝑑𝑖 for ∑𝑑 each data pair. Then determine the mean of that differences: 𝑑̅ = 𝑛 𝑖. If both populations are normally distributed, then the sampling distribution of 𝑑̅ is approximated by a t-distribution with 𝑛 − 1 degrees of freedom, where 𝑛 is the number of data pairs. Let us denote by 𝜇𝑑 the mean of difference values in the population and formulate the null and alternative hypotheses as follows. 𝐻0 : 𝜇𝑑 = 0 𝐻𝑎 : 𝜇𝑑 ≠ 0 or 𝐻0 : 𝜇𝑑 ≥ 0 𝐻𝑎 : 𝜇𝑑 < 0 or 𝐻0 : 𝜇𝑑 ≤ 0 𝐻𝑎 : 𝜇𝑑 > 0. 7 i ∑𝑑 The sample mean and sample standard deviation for the difference values are 𝑑̅ = 𝑖 and 𝑠𝑑 = 𝑛 ∑(𝑑𝑖 −𝑑̅ )2 √ 𝑛−1 . To test the null hypothesis, we will use the t-statistic 𝑑̅ −𝜇𝑑 𝑑 ⁄√𝑛 𝑡=𝑠 with 𝑛 − 1 degrees of freedom. Difference between the proportions of two populations. The difference between two population proportions 𝑝1 and 𝑝2 can be tested using a sample proportion from each population. The following pairs of null and alternative hypotheses are considered. 𝐻0 : 𝑝1 = 𝑝2 𝐻0 : 𝑝1 ≥ 𝑝2 𝐻0 : 𝑝1 ≤ 𝑝2 𝐻𝑎 : 𝑝1 ≠ 𝑝2 𝐻𝑎 : 𝑝1 < 𝑝2 𝐻𝑎 : 𝑝1 > 𝑝2 . If the samples are randomly selected, independent and large enough to use a normal sampling distribution, that is 𝑛1 𝑝1 ≥ 5, 𝑛1 𝑞1 ≥ 5, 𝑛2 𝑝2 ≥ 5, and 𝑛2 𝑞2 ≥ 5, then the sampling distribution for the difference between the sample proportions, 𝑝̅1 − 𝑝̅2 , is a normal distribution with mean 𝐸(𝑝̅1 − 𝑝̅2 ) = 𝑝1 − 𝑝2 and standard error 1 1 𝜎𝑝̅1 −𝑝̅2 = √𝑝̅ 𝑞̅ (𝑛 + 𝑛 ), where 𝑝̅ is the weighted mean of the sample proportions, that is 𝑝̅ = 1 2 𝑛1 𝑝̅1 +𝑛2 𝑝̅2 𝑛1 +𝑛2 and 𝑞̅ = 1 − 𝑝̅ . If the sampling distribution for 𝑝̅1 − 𝑝̅2 is normal, you can use the following 𝑧-statistic to test the difference between two population proportions. 𝑧= (𝑝̅1 −𝑝̅2 )−(𝑝1 −𝑝2 ) 𝜎𝑝 ̅ 1 −𝑝 ̅2 .