Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 i 8.Hypothesis Testing. Hypothesis testing is a process in which sample statistics are used to determine whether a statement about the value of a population parameter should be accepted or rejected. A statement about a population parameter is called a statistical hypothesis. Traditionally, two hypotheses are considered: π»0 β null hypothesis and π»π - alternative hypothesis. Definition. A null hypothesis π―π contains a statement of equality, such as β€, =, or β₯. The alternative hypothesis π―π is the complement of the null hypothesis, and it contains a statement of strict inequality, such as >, β , or <. The conclusion that π»π is true can be made if the sample data indicate that π»0 is false. Generally, a hypothesis test about the values of the population mean π may take one of the following three forms: π»0 : π β€ π0 π»0 : π β₯ π0 π»0 : π = π0 π»π : π > π0 π»π : π < π0 π»π : π β π0 Similar statements can be formulated to test other population parameters. No matter which of the three forms of hypothesis test is used, you always begin by assuming that the equality condition in the null hypothesis is true. After performing the hypothesis test, you will take one of two decisions: 1. Reject the null hypothesis, or 2. Accept (fail to reject) the null hypothesis. Since the decision is based on sample information, it is always possible to make a wrong decision. The null hypothesis might be rejected when it is actually true, or it might be accepted when it is actually false. The following two kinds of errors can be made in hypothesis testing. Definition. The type I error is made if the null hypothesis is rejected when it is true. The type II error is made if the null hypothesis is not rejected when it is false. 2 i Thus, we have the following four results of a hypothesis testing. State of Things Decision: Do not reject π»0 Reject π»0 π»0 is true π»0 is false correct decision Type II error Type I error correct decision Although we cannot eliminate the possibility of errors while performing a hypothesis test, we can indicate the probability of their occurrence. The common notations are: πΌ - the probability of making a Type I error. π½ - the probability of making a Type II error. Most applications of hypothesis testing control for the probability of making a Type I error and do not always control for the probability of making Type II error. That is why, in order to avoid the risk of making a Type II error, the statement βDo not rejectβ is used instead of βAcceptβ. Definition. The maximum allowable probability of making a Type I error is called level of significance for the test. Common choices for the level of significance are πΌ =0.1, πΌ =0.05, πΌ =0.01. The lower the level of significance, the smaller is the probability of rejecting a true null hypothesis. After stating the null and alternative hypotheses, and specifying the level of significance, the next step is selecting a random sample from the population and calculating the sample statistics. The statistic used to estimate the parameter in the null hypothesis is called the test statistic. The following table shows the population parameters and corresponding test statistics. Population Test Parameter Statistic π π₯Μ Standardized Test Statistic z ( nβ₯30 ) or t ( n<30) 3 i p πΜ z π2 π 2 π2 One way to decide whether to reject the null hypothesis is to check if the standardized test statistic falls within a rejection region of the sampling distribution. Definition. A rejection region (or critical region) of the sampling distribution is the range of values of the test statistic for which the null hypothesis is rejected. The value of test statistic that establishes the boundary of the rejection region, is called the critical value. * Tests about a Population Mean. The following is the procedure of conducting a hypothesis test for a population mean (largesample case). 1. State the null and alternative hypotheses π»0 and π»π . 2. Specify the level of significance πΌ. 3. Use the Standard Normal Table, to determine the critical value π§0 (values ±π§0 ). 4. Sketch the graph of the normal curve and indicate the rejection region. If the rejection region is placed in only lower tail (or upper tail) of the sampling distribution, then we say the test is a left-tailed (or right-tailed) hypothesis test. If the rejection region is placed in both the lower and the upper tails of the sampling distribution, then we say the test is a two-tailed hypothesis test. 5. Find the standardized test statistic (for the population mean it is called z-statistic): π§= π₯Μ βπ0 π ββπ if π is known, or π§ = π₯Μ βπ0 π ββπ if π is unknown. 6. Make a decision: if π§ is in the rejection region, reject π»0 , otherwise fail to reject π»0 . Assume now, that the sample size is small (π <30) and the population standard deviation π is unknown. If the population has a normal distribution, you can use the π‘-distribution to make inferences about the population mean. The test statistic for the mean is 4 i π‘= π₯Μ βπ0 π ββπ (called t-statistic), where π is the sample standard deviation. This statistic has t- distribution with nβ1 degrees of freedom. The procedure of conducting a hypothesis test for a population mean for the small-sample case is as follows. 1. State the null and alternative hypotheses π»0 and π»π . 2. Specify the level of significance πΌ. 3. Identify the degrees of freedom, d.f.= π β1. 3. Use the T-Distribution Table, to determine the critical value π§0 (values ±π§0 ). 4. Determine the rejection region. 5. Find the standardized test statistic (t-statistic): π‘= π₯Μ βπ0 . π ββπ 6. Make a decision: if π‘ is in the rejection region, reject π»0 , otherwise fail to reject π»0 . * Tests about a Population Proportion. The three forms for a hypothesis test about a population proportion π are as follows. π»0 : π β€ π0 π»0 : π β₯ π0 π»0 : π = π0 π»π : π > π0 π»π : π < π0 π»π : π β π0 If ππΜ β₯ 5 and π(1 β πΜ ) β₯ 5, then the sampling distribution of πΜ is approximately normal with an expected value of πΈπΜ = π and a sample standard deviation of ππΜ = β π§-test about a population proportion can be used: π§= πΜ βπ0 . ππ Μ πΜ (1βπΜ ) . π Consequently, the following 5 i * Inference about Means and Proportions with two Populations. You will learn further how to test a claim about the difference between the same parameters from two populations. For instance, you may want to conduct a hypothesis test to find whether there is any difference between educational qualities provided at two high schools, or test the difference between the proportions of defective parts supplied by two factories. The type of test to be used is determined by the sizes of the samples selected from the two populations, as well as by the fact of dependence or independence of the respective samples. Definition. Two samples are called independent if they are selected from two different populations and are not related one to another. Two samples are called dependent or matched if each element of one sample corresponds to an element of the other sample. For instance, if you select randomly 100 graduates from university A and 90 graduates from university B, and test their qualification level, you obtain two independent samples. But if you select randomly 70 freshmen from a university and measure their qualification level, then, after 3years, test the same sample of students for their qualification level , then you have dependent (or matched) samples. For a claim about two population parameters π1 and π2 , the possible pairs of null and alternative hypotheses are π»0 : π1 = π2 π»0 : π1 β₯ π2 π»0 : π1 β€ π2 π»π : π1 β π2 π»π : π1 < π2 π»π : π1 > π2 Difference between the means of two populations. Independent samples. Let us consider the hypotheses tests about the difference between the means of two populations for independent samples. If each sample size is at least 30, then the sampling distribution of the difference of the sample means, π₯Μ 1 β π₯Μ 2 , can be approximated by a normal probability distribution with mean and standard deviation as follows. πΈ(π₯Μ 1 β π₯Μ 2 ) = π1 β π2 ππ₯Μ 1 βπ₯Μ 2 = βππ₯Μ 1 2 + ππ₯Μ 2 2 . Since the sampling distribution of π₯Μ 1 β π₯Μ 2 is normal, we can use the π§-test with the standardized test statistic of the form 6 i π§= (π₯Μ 1 βπ₯Μ 2 )β(π1 βπ2 ) . βππ₯Μ 1 2 +ππ₯Μ 2 2 In real life it is often impractical to collect samples of large size. To test the difference between the means of two small independent samples we assume that both populations have normal probability distribution. With this condition, the sampling distribution for the difference of sample means π₯Μ 1 β π₯Μ 2 is approximated by a π‘-distribution with mean π1 β π2 .The standard deviation and the degrees of freedom depend on whether the population standard deviations π1 and π2 are equal or not. The t-statistic for the difference between two population means π1 and π2 is π‘= (π₯Μ 1 βπ₯Μ 2 )β(π1 βπ2 ) , where π π₯Μ 1 βπ₯Μ 2 is the standard deviation of π₯Μ 1 β π₯Μ 2 . π π₯Μ 1 βπ₯Μ 2 If the population variances are equal, then 1 1 π π₯Μ 1 βπ₯Μ 2 = βπ 2 (π + π ), where π 2 is a weighted average of the two sample variances π 1 2 and 1 π 2 2 : π 2 = 2 (π1 β1)π 1 2 +(π2 β1)π 2 2 π1 +π2 β2 , and d.f.= π1 + π2 β 2. If the population variances are not equal, then the standard error is π 2 π π₯Μ 1 βπ₯Μ 2 = β( π1 + 1 π 2 2 ) π2 and d.f.= πππ(π1 β 1, π2 β 1). Difference between the means of two populations. Matched samples. To perform a two-sample hypothesis test with dependent samples, we must use a different technique. First find the difference ππ for βπ each data pair. Then determine the mean of that differences: πΜ = π π. If both populations are normally distributed, then the sampling distribution of πΜ is approximated by a t-distribution with π β 1 degrees of freedom, where π is the number of data pairs. Let us denote by ππ the mean of difference values in the population and formulate the null and alternative hypotheses as follows. π»0 : ππ = 0 π»π : ππ β 0 or π»0 : ππ β₯ 0 π»π : ππ < 0 or π»0 : ππ β€ 0 π»π : ππ > 0. 7 i βπ The sample mean and sample standard deviation for the difference values are πΜ = π and π π = π β(ππ βπΜ )2 β πβ1 . To test the null hypothesis, we will use the t-statistic πΜ βππ π ββπ π‘=π with π β 1 degrees of freedom. Difference between the proportions of two populations. The difference between two population proportions π1 and π2 can be tested using a sample proportion from each population. The following pairs of null and alternative hypotheses are considered. π»0 : π1 = π2 π»0 : π1 β₯ π2 π»0 : π1 β€ π2 π»π : π1 β π2 π»π : π1 < π2 π»π : π1 > π2 . If the samples are randomly selected, independent and large enough to use a normal sampling distribution, that is π1 π1 β₯ 5, π1 π1 β₯ 5, π2 π2 β₯ 5, and π2 π2 β₯ 5, then the sampling distribution for the difference between the sample proportions, πΜ 1 β πΜ 2 , is a normal distribution with mean πΈ(πΜ 1 β πΜ 2 ) = π1 β π2 and standard error 1 1 ππΜ 1 βπΜ 2 = βπΜ πΜ (π + π ), where πΜ is the weighted mean of the sample proportions, that is πΜ = 1 2 π1 πΜ 1 +π2 πΜ 2 π1 +π2 and πΜ = 1 β πΜ . If the sampling distribution for πΜ 1 β πΜ 2 is normal, you can use the following π§-statistic to test the difference between two population proportions. π§= (πΜ 1 βπΜ 2 )β(π1 βπ2 ) ππ Μ 1 βπ Μ 2 .