Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TEST OF HYPOTHESIS FOR MEANS AND PROPORTIONS Up to this point we have used sample means and proportions to estimate population means (and proportions). Some inference about the population mean can be made by a confidence interval. Testing a statistical hypothesis is a different approach to making inference about a parameter on the basis of a random sample. Instead of estimating the population parameter, a claim is made regarding its value. This claim is called a Null Hypothesis. For example, a property developer makes a claim that the average annual rent income per room in student accommodation is at most £ 5.000. That is, the mean is equal or less than 5.000: μ≤5.000. The Null Hypothesis (H0) is a claim or conjecture made about a population characteristic. To test the null hypothesis a random sample is selected from the population under investigation. The sample mean is calculated, and if it is sufficiently close to the claimed population mean we may conclude that there is evidence to support the Null Hypothesis. To determine what we mean by sufficiently close will we use the most extreme value of μ in the statement of the null hypothesis, id est μ = μH0= 5.000, to test the null hypothesis. We will consider two cases: 1) Sample sizes are 30 or more. In this case sample means are Normally distributed, according to the Central Limit Theorem; 2) Sample sizes are less than 30. Providied that the variable of interest is Normally distributed in the population and σ is known, then sample means are Normally distributed. For large samples, if σ is unknown, it may estimated by s; the sampling distribution of 𝑥̅ is approximately Normal 𝑁(𝜇𝑥̅ = 𝜇; 𝑠𝑥̅ = 𝑠 √𝑛 . The rational for hypothesis testing is summarized graphically as follows: State the Null Hypothesis Select a random sample, n Calculate 𝑥̅ and s (if σ is not known) Calculate 𝑍𝑥̅ = 𝑥̅ −𝜇𝐻0 𝜎𝑥̅ If 𝑃(𝑍 ≥ 𝑍𝑥̅ )𝑖𝑠 𝑠𝑚𝑎𝑙𝑙 reject H0 If 𝑃(𝑍 ≥ 𝑍𝑥̅ )𝑖𝑠 𝑙𝑎𝑟𝑔𝑒 accept H0 The alternative hypothesis The null hypothesis is usually a statement of the status quo and its equation always the equality. The alternative hypothesis is sometimes called the research hypothesis. An alternative hypothesis (H1) is the complement of the null hypothesis, for example if H0 μ≤5000 then H1 μ≥5000. EXAMPLE The same property developer claim that the average rental income per room in student accommodation is at most £5.000 per year. The mean rent paid by a sample of 36 student is £5200 per year, with a population Nstandard deviation of £735. Do the sampleresults support the investor’s claim? H0 μ= 5000 H1 μ≥5000 Calculate the probability of selecting the present sample or more extreme, assuming H0 is true. Since n=36, according to the central limit theorem the distribution of the sample means 𝑥̅ ∼ 𝑁 (𝜇, 𝜎𝑥̅ = 𝜎 √𝑛 ). In our example the standard error of mean is 122,5. In our example 𝑥̅ = 5200. A more extreme sample is a sample that lends less credibility at H0, a sample whose mean is greater than 5200. Calculate the probability that 𝑥̅ ≥ 5200 given that the null hypothesis is true. The probability 𝑃(𝑥̅ ≥ 5200) is calculated by the usula Normal probability method. 𝑃(𝑥̅ ≥ 5200) = 𝑃(𝑍 ≥ 𝑍𝑥̅ ) = 𝑃(𝑍 ≥ 𝑥̅ −𝜇𝐻0 𝜎𝑥̅ = 𝑃(𝑍 ≥ 5200−5000 122,5 )= 𝑃(𝑍 ≥ 1,63) = 0,0515 This probability is called the p-value of the test. It is an indication of the support for the null hypothesis provided by the sample The probability of selecting the given sample or a more extreme sample assuming H0 is true is called the p-value of the test. The classical method testing hypothesis. (Significance testing) The classical method og testing hypothesis involves declearing in advance that the null hypothesis will be rejected if the probability of selecting the present sample is less than a given value, usually 0,05 or 0,01. In other words the null hypothesis will be rejected if the p-value is less than a given probability. The given probability is referred as , and is called the level of significance of the test. Graphically, is the tail(s) area under the Normal curve. If the probability that the difference between the sample mean and the hypothesised population mean is less than , then H0 is rejected and the difference is significant. EXAMPLE The same hypothesis than the previous example, = 0,05. For = 0,05, Z = 1,6449. H0 = μ≤5000 H1 = μ≥5000 The test statististic is 𝑍 = 𝑥̅ −𝜇𝐻0 𝜎𝑥̅ = 5200−5000 122,5 = 1,63 So the null hypothesis is not rejected because the test statistic is less than critical value (1,6449). Type I and Type II errors Having stated that the decision rule for the test (based on a given value of ) if a sample mean falls within the acceptance region then we say there is no significance difference between the sample mean and the population mean. Hence the difference between the sample mean and the population mean is attributed to sampling error due to the chance random variation of sample mean about the population mean. On the other hand, if a sample mean falls in the rejection region then we say that there is a significant difference between the sample mean and the population mean, that is the difference is too great to attribute to chance. When the true population mean is 5000, then 95% of the sample means will be less than 5202. The remaining 5% of sample means will fall in the upper tail of the distribution. But, according to the decision rule, when a sample mean falls in the critical region the null hypothesis will be rejected. Since 5% of the sample means fall in the critical region, the null hypothesis will be rejected when it is true, 5% of the time. Rejecting H0 when it is true is called Type I error. In a classical test of hypothesis, the level of significance is . But, since = the area of the region of rejection, then = the probability of rejecting H0 when it is true. Hence is the probability of making a Type I error. The level of significance of the test () is the maximum probability of making Type I error = P (Reject H0/ H0 is true). The size of the level of significance of the test is decided by the researcher who takes into account the consequences of rejecting a true null hypothesis. Type II error The error made by rejecting the alternative hypothesis when it is true is called Type II error. The probability of making a Type II error is denoted by β. The relationship between Type I and Type II error is summarized as follows Accept H0 Accept H1 H0 is true Correct 1- = P(Accept H0/H0 is true Type I error α= P(Reject H0/H0 is true) H1 is true Type II error β= P(Reject H1/H1 is true Correct 1-β= P(Accept H1/H1 is true) The Power of the test is the probability of accepting the true alternative hypothesis, or, alternatively, it may be defined as the probability of rejecting a false null hypothesis (1-β) = P (Reject H0/H0 is false). One- and two-sided tests of hypothesis If the researcher is investigating whether the population is equal to 5000 or not then H0 will be rejected when the sample means are either too high or too low. Hence the test will have 2 critical regions. Such test is called a two-tailed test or a two-sided test. In two-sided tests the level of significance, , is divided equally between the upper and lower tail. The critical Z values are referred as ±Z/2 for two sided tests. EXAMPLE H0: μ = 5.000 H1: μ≠ 5.000 = 0,05 The size of of each rejection region is /2 = 0,025 so Z/2 =1,96. Accept the null hypothesis if the sample Z is equal or between -1,96 and 1,96. Reject H0 if the sample Z is less than -1,96 or greater than 1,96. The sample statistic is 𝑍 = 𝑥̅ −𝜇𝐻0 𝜎𝑥̅ n= 36, σ= 735, 𝑥̅ = 5200 𝑍𝑋̅ = Where 𝜎𝑥̅ = 𝜎 √𝑛 5200 − 5000 = 1,63 122,5 . The sample Z = 1,63 is in the acceptance region, therefore accept the null hypothesis. In two sided tests the p value is the probability that a sample mean will be differ from μH0 by more than 200 units in either direction. Hence the p value = 𝑃(𝑥̅ ≥ 5200) + 𝑃(𝑥̅ ≤ 4800) = P( Z≥1,96) + P(Z≤-1,96) = 2(0,0515) = 0,1030. In conclusion, we are reasonably confident that the claim that the mean rental income is £5000 is true. There is a 10,26% chance of getting the present sample or a more extreme sample when μ= 5.000. HYPOTHESIS TESTS FOR PROPORTIONS The distribution of sample proportions, p (n≥30) was stated to be approximately Normal with a mean, μp = π, standard error 𝜎𝑝 = √ 𝜋(1−𝜋) 𝑛 . Hence the test statistic for proportion is: 𝑍𝑝 = 𝑝 − 𝜋𝐻0 𝜎𝑝 Since the population proportion is unknown, it s estimated by the sample proportion. It follows that the standard error must be estimated by 𝑠𝑝 = √ 𝜋𝐻0 (1 − 𝜋𝐻0 ) 𝑛 EXAMPLE A budget airline claims that 96% of its flights depart on time. A researcher working for a competitor records departure informations for 80 randomly selected flights and discovers that five departed late. Test the airline claim at 5% level of significance. p = 5/80 = 0,0625. H0 : π = 0,04 H1 : π ≠ 0,04 = 0,05 Two sided test, Z/2 = 1,96 Test statistic 𝑍𝑝 = 𝑠𝑝 = √ 𝑝 − 𝜋𝐻0 𝑠𝑝 𝜋𝐻0 (1 − 𝜋𝐻0 ) 0,04(1 − 04) =√ = 0,0219 𝑛 80 𝑍𝑝 = 0,0625 − 0,04 = 1,0274 0,0219 Since the Z is in the acceptance region we accept the null hypothesis. The difference between the sample proportion and the hypothesized proportion is |0,04 – 0,0625| = 0,0225. The p-value may be described as the probability that the difference between πH0 and p will be equal to or greater than 0,0225 in both direction. 0,0175≤ p≤0,0625 The p-value = P(p≥0,0625) + P(p≤0,0175) = P( Z≥1,03) + P(Z≤-1,03) =2 (0,1515) = 0,3030. In conclusion the null hypothesis that the 4% of the airline’s departure are late is accepted. The p-value = 0,3030 staes that there is a chance of 30,30% of selecting the given sample or a more extreme sample when the true percentage is 4. HYPOTHESIS TESTS FOR THE DIFFERENCE BETWEEN MEANS AND PROPORTIONS The sampling distribution for the difference between sample means (for independent samples, n ≥30) is (𝑋̅1 − 𝑋̅2 )~𝑁(𝜇𝑋̅1 −𝑋̅̅̅̅2 = 𝜇1 − 𝜇2; 𝜎𝑋2̅1 −𝑋̅2 𝜎12 𝜎22 = + 𝑛1 𝑛2 Since the sampling distribution is Normal, the test statistic used to test the difference between two population means is 𝑍𝑋̅1 −𝑋̅̅̅̅2 = (𝑥̅1 − 𝑥̅2 ) − 𝐻0 𝑐𝑙𝑎𝑖𝑚 𝜎𝑋2̅1 −𝑋̅2 The sample variances may be used when the population variances are unknown for n1, n2 ≥30. Hence, the test statistics is 𝑍𝑋̅1 −𝑋̅̅̅̅2 = (𝑥̅1 − 𝑥̅2 ) − (𝜇1 − 𝜇2 ) 𝑠𝑋2̅1 −𝑋̅2 The sampling distribution for the difference between two samples proportions ( for n ≥ 30) is 𝑝1 − 𝑝2 ~𝑁 (𝜇𝑝1 −𝑝2 = 𝜋1 − 𝜋2; 𝜎𝑝21 −𝑝2 = 𝜋1 (1 − 𝜋1 ) 𝜋2 (1 − 𝜋2 ) + ) 𝑛1 𝑛2 Hence the test statistic for the difference between two population proportions: 𝑍𝑝𝐴−𝑝 = 𝐵 𝑃𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − (𝐻0 𝑐𝑙𝑎𝑖𝑚) 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 𝑍𝑝𝐴−𝑝 = 𝐵 (𝑝1 − 𝑝2 ) − (𝐻0 𝑐𝑙𝑎𝑖𝑚) 𝜎𝑝1 −𝑝2 Since the population proportions are unknown, it follows that the standard error must be estimated by 𝑠𝑝1 −𝑝2 = √ 𝑝1 (1 − 𝑝1 ) 𝑝2 (1 − 𝑝2 ) +√ 𝑛1 𝑛2 Hence the statistic is (𝑝1 − 𝑝2 ) − (𝐻0 𝑐𝑙𝑎𝑖𝑚) 𝑠𝑝1 −𝑝2 EXAMPLE The sample mean and standard deviation for rental expenditure are £5200 and £735 respectively for a sample of 36 students. A sample of 45 young professional was surveyed. The sample mean and standard deviation for annual expenditure on rent are £4920 and 225, respectively. Is there a difference in the average rental expenditures between the two groups? The hypothesis is HA = B or A-B = 0 H1 : A ≠B Level of significance = 0,05. The test statistic is 𝑍𝑋̅1 −𝑋̅̅̅̅2 = (𝑥̅1 − 𝑥̅2 ) − (𝜇1 − 𝜇2 ) 𝑠𝑋2̅1 −𝑋̅2 𝑠𝑋̅1 −𝑋̅2 = √ (735)2 36 +√ (225)2 45 = 127,0089. Hence 𝑍𝑋̅1 −𝑋̅̅̅̅2 = (5200 − 4920) − (0) = 2,2046 127,0089 The value of Z0,025 is 1,96 and the null hypothesis is rejected. EXAMPLE He results of polls on the voting intentions of voters were recorded as follows Costituency A Costituency B Sample size 20 0 160 Vote for Green party 88 54 a) Calculate a combined estimate of the proportion of votes for the Green party for the sample data from constituencies A an B. Hence calculate the sample standard error for the difference between proportions. b) Test the hypothesis that i) the support for the Green party is the same in both constituencies; (1% level of significance). ii) the support for the Green party is 5% higher in constituency A. a) pc = 𝑇𝑜𝑡𝑎𝑙 𝑣𝑜𝑡𝑒𝑟𝑠 𝑓𝑜𝑟 𝐺𝑟𝑒𝑒𝑛 𝑇𝑜𝑡𝑎𝑙 𝑣𝑜𝑡𝑒𝑟𝑠 𝑝𝑜𝑙𝑙𝑒𝑑 b) 𝑠𝑝1 −𝑝2 = √ 0,44(1−0,44) 200 = 88+54 200+160 +√ = 142 360 = 0,3944 0,3375(1−0,3375) 160 = 0,0518 The difference between the sample proportions is 0,44-0,3375 = 0,1025. The claim was that the two percentage were equal. The test is 𝑍𝑝1 −𝑝2 = = 0,1025 0,0513 (𝑝1 − 𝑝2 ) − (𝐻0 𝑐𝑙𝑎𝑖𝑚) 𝑠𝑝1 −𝑝2 = 1,9981. For = 0,01 (two side) Z is equal to 2,5758. The null hypothesis is not rejected. iii) If pA is at least 5% greater than pB then pA-pB ≥ 0,05 𝑍𝑝1 −𝑝2 = (𝑝1 − 𝑝2 ) − (𝐻0 𝑐𝑙𝑎𝑖𝑚) 𝑠𝑝1 −𝑝2 = 0,1025 − 0,05 = 1,0233 0,0513 The value of Z for =0,01 (one side) is 2,33 so H0 is accepted.