Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Part (09): Testing Hypotheses (Statistical Inference) A hypothesis is a prediction about a bout some aspect of a variable or a collection of variables. Hypotheses are derived from theory and they serve guides to research. When a hypothesis can stated in terms of one or more parameters of the appropriate population distribution(s), statistical methods can be used to test its validity. ﺍﻟﻔﺮﺿﻴﺔ ﻫﻲ ﺍﻟﺘﻨﺒﺆ ﻋﻦ ﻧﺎﺣﻴﺔ ﻣﻌﻴﻨﺔ ﳌﺘﻐﲑ ﺃﻭ ﳎﻤﻮﻋﺔ ﻣﻦ ﺍﳌﺘﻐﲑﺍﺕ ﻭﺗﺸﺘﻖ ﻣﻦ ﺍﻟﻨﻈﺮﻳﺔ ﻭﳑﻜﻦ ﺃﻥ ﺗﻮﺿﻊ ﺑﺸﻜﻞ ﻣﻌﻠﻢ ﺃﻭ ﻣﻌﺎﱂ ﺇﺣﺼﺎﺋﻴﺔ .ﻟﻠﻤﺠﺘﻤﻊ ﺍﻹﺣﺼﺎﺋﻲ ﺍﳌﻼﺋﻢ ﻭﳝﻜﻦ ﺃﻥ ﺗﺴﺘﺨﺪﻡ ﺍﻷﺳﺎﻟﻴﺐ ﺍﻹﺣﺼﺎﺋﻴﺔ ﺍﳌﺨﺘﻠﻔﺔ ﳌﻌﺮﻓﺔ ﻣﺪﻯ ﻭﻗﺘﻬﺎ A statistical test involves comparing what is expected according to the hypothesis with what is actually observed in the data. .ﺍﻟﻔﺮﺽ ﺍﻹﺣﺼﺎﺋﻲ ﻳﺘﻀﻤﻦ ﻣﻘﺎﺭﻧﺔ ﻣﺎ ﻫﻮ ﻣﺘﻮﻗﻊ ﻣﻦ ﺍﻟﻔﺮﺿﻴﺔ ﻭﺍﻟﻮﺍﻗﻊ ﺍﻟﺬﻱ ﻳﺘﻢ ﻣﻼﺣﻈﺘﻪ ﰲ ﺍﻟﺒﻴﺎﻧﺎﺕ ﺍﻷﻭﻟﻴﺔ Elements of a statistical test :ﻋﻨﺎﺻﺮ ﺍﻟﻔﺤﺺ ﺍﻹﺣﺼﺎﺋﻲ There are five basic elements of statistical tests of hypotheses about a parameter; Assumptions, Hypothesis, test statistic, attained significance level and conclusion. 1. Assumptions: All statistical tests are based upon certain assumptions that must be met in order for the tests to be valid. These assumptions usually entail considerations such as the following: a. The assumed scale of measurement of the variable: as with other statistical procedures, each test is specifically designed for a certain level of measurements. b. The form of the population distribution: for many tests, the variable must be continuous or even normally distributed in the population. c. The method of sampling: the formulas for nearly every test we consider require random sampling. d. The sample size: many rely on results similar to the central limit Theorem and require a certain minimum sample size in order to be valid. 2. Hypotheses: A statistical test focuses on two hypotheses about the value of a parameter. The null hypothesis is the hypothesis that is usually tested. The alternative hypothesis is accepted when the test results in rejection of the null hypothesis. It is consists of an alternative set of parameter values to those given in the null hypothesis. -1- University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Notation: the symbol H o represents the null hypothesis and the symbol H a represents the alternative hypothesis. 3. Test statistics: After obtaining the sample, we form some sample statistic with a known sampling distribution to help us test the null hypothesis. The testing procedure is such that the null hypothesis can be either rejected or accepted. If the null hypothesis is rejected, then the alternative hypothesis is accepted. The purpose of the test is to analyze, in a probabilistic terms, how strong the sample evidence is for rejecting the null hypothesis and hence accepting the alternative hypothesis. 4. Attained significance level: The alternative significance level is defined to be the probability that the test statistic would occur in this collection of values, if H o were true. The attained significance level is the probability that we would have obtained a value of the test statistic as favorable or more favorable to H a than the actual observed value of the test statistic, if H o were true. Notation: the attained significance level is denoted by P and is sometimes referred to as the P value of the test. The smaller the value of P, therefore, the more contradictory the sample results are to H o . 5. Conclusion: If the attained significance level is sufficiently small, we might decide to reject H o , and therefore accept H a . . H a ﻭ ﻗﺒﻮﻝH o ﻛﻠﻤﺎ ﻗﻠﺖ ﺩﺭﺟﺔ ﺍﳌﻌﻨﻮﻳﺔ ﻛﻠﻤﺎ ﺯﺍﺩﺕ ﺩﺭﺟﺔ ﻗﺮﺍﺭﻧﺎ ﺑﺮﻓﺾ To illustrate, we might decide to reject H o if the stained significance level P<0.05 and thus conclude that there is not enough evidence to reject H o if P ≥ 0.05 . The value 0.05 would then be referred to as α-level of the test. The α-level is a number such that H o is rejected if the attained significance level is less than its value. .0.01 ،0.05 ،0.10 ﻣﻦ ﻗﺒﻞ ﺍﻟﺒﺎﺣﺜﲔ ﲝﻮﺍﱄα-level ﲣﺘﺎﺭ ﻋﺎﺩﺓ The smaller the α-level is chosen to be, the stronger the evidence must be before rejecting H o . -2- University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Table (9 – 1): Possible conclusions in a test of a hypothesis with α-level 0.05 Attained significance level (0.05 = α-level) conclusion Ho Ha P<0.05 Reject Accept P ≥ 0.05 Do not reject Do not accept The collection of values that would lead a researcher to reject H o at a particular αlevel is referred to a rejection region. For example the rejection region for a test of level α = 0.05 is the set of values for the test statistic that produce P<0.05. Large Sample Test of Hypothesis about µ Table (9 – 2): One tailed test (one sided) H o : µ = µo H a :µ 〈 µo (Or H a : µ 〉 µ o ) Test statistic : X − µo N −n σ ∗ ; σX = z= N −1 σX n Rejection region: z cal 〉 zα reject H o , accept H a Two tailed test (two sided) H o : µ = µo H a :µ ≠ µo Test statistic: X − µo N −n σ ∗ ; σX = z= N −1 σX n Rejection region: z cal 〉 zα / 2 reject H o , accept H a Where zα is chosen so that P(z > zα ) = α Where zα / 2 is chosen so that P(z > zα / 2 ) = α/2 area zα = (0.5 – α) zα / 2 = (0.5 - α/2) Two tailed: H o : µ = µ o = 0 ()ﺍﻟﻔﺮﻕ ﺑﻴﻨﻬﻤﺎ ﺻﻔﺮ H a :µ ≠ µo ≠ 0 Right one tailed: H o : µ = µo = 0 H a :µ 〉 µo ≠ 0 Left one tailed: H o : µ = µo = 0 -3- University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman H a :µ 〈 µo ≠ 0 Example (9 – 1): The mean number of prior conviction in a country in 1970 was 2.0; the police chief believes that the mean has increased. Then a small investigation has been organized to check hunch, a random sample of 36 records is examined out of the set of all conviction records involved a country in 1978 and is summarized that the X = 2.8 and σ = 2.0will 99 % confidence interval ﻭﻟﻜﻦ، ﻛﺎﻥ ﻣﺘﻮﺳﻂ ﺍﳉﺮﳝﺔ ﺍﻟﻌﺎﻡ ﻋﺒﺎﺭﺓ ﻋﻦ ﺟﺮﳝﺘﲔ ﻟﻜﻞ ﺳﺎﻋﺔ1970 ﺗﺸﲑ ﺍﻹﺣﺼﺎﺀﺍﺕ ﺍﻟﺴﺎﺑﻘﺔ ﰲ ﻗﺴﻢ ﺍﻟﺸﺮﻃﺔ ﻹﺣﺪﻯ ﺍﳌﻨﺎﻃﻖ ﺃﻧﻪ ﻋﺎﻡ ( ﻭﻭﺟﺪ1978) ﻣﻠﻒ ﻋﺸﻮﺍﺋﻴﹰﺎ ﻭﰎ ﻓﺤﺼﻬﺎ36 ﺭﺋﻴﺲ ﺍﻟﻘﺴﻢ ﻛﺎﻥ ﻟﻪ ﺍﻋﺘﻘﺎﺩ ﺷﺨﺼﻲ ﺃﻥ ﻫﺬﺍ ﺍﳌﺘﻮﺳﻂ ﻗﺪ ﺍﺯﺩﺍﺩ ﻭﺑﻨﺎ ًﺀ ﻋﻠﻴﻪ ﰎ ﺃﺧﺬ ﻋﻴﻨﺔ ﻣﻦ ﻫﻞ ﻫﺬﻩ ﺍﻟﺒﻴﺎﻧﺎﺕ ﻛﺎﻓﻴﺔ ﺃﻭ ﻫﻞ ﻳﺴﺘﻄﻴﻊ ﺭﺋﻴﺲ ﻗﺴﻢ،2.0 ﻭﺍﻻﳓﺮﺍﻑ ﺍﳌﻌﻴﺎﺭﻱ ﻋﺒﺎﺭﺓ ﻋﻦ،2.8 ﺃﻥ ﻣﺘﻮﺳﻂ ﺍﳉﺮﳝﺔ ﰲ ﺍﻟﻌﻴﻨﺔ ﻗﺪ ﺍﺯﺩﺍﺩ ﺇﱃ . ﻣﺴﺘﻮﻯ ﺛﻘﺔ ﻋﻠﻰ ﺃﻥ ﻧﺴﺒﺔ ﺍﳉﺮﳝﺔ ﺍﺯﺩﺍﺩﺕ ﺃﻭ ﻻ% 99 ﺍﻟﺸﺮﻃﺔ ﺃﻥ ﳚﺰﻡ ﺑﻨﺴﺒﺔ 1 – α = confidence level ()ﻣﺴﺘﻮﻯ ﺍﻟﺜﻘﺔ α = significance level ()ﺩﺭﺟﺔ ﺍﳌﻌﻨﻮﻳﺔ µ =2.0, X =2.8, σ = 2.0, n = 36 The estimated standard error of the sampling distribution of X is: σ 2.0 σX = = = 0.33 36 n H o : µ = µ o = 2.0 H a : µ 〉 µ o ≠ 2.0 The value of test statistic is therefore: X − µ o 2.8 − 2.0 z cal = = = 2.4 σX 0.33 One tailed = α (0.99 = 1- α, α = 0.01) z tab = 0.5 – α = 0.5 – 0.01 = 0.49 z = 2.33 ⇒ z cal 〉 z tab As a result; accept H a and reject H o -4- University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Accept H o Reject H a Reject H o Accept H a α z tab z cal = 2. 33 = 2.4 One tailed one tailed ﺍﶈﺴﻮﺑﺔ ﰲ ﺣﺎﻟﺔ ﺍﻟـz ﺍﳉﺪﻭﻟﻴﺔ ﺗﺄﺧﺬ ﺇﺷﺎﺭﺓz ﺇﺷﺎﺭﺓ:ﻣﻼﺣﻈﺔ Important: H a :µ 〉µo Accept H o Reject H o Reject H a Accept H a z tab H a :µ 〈 µo -5- One tailed University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Reject H o Accept H a Accept H o Reject H a - z tab One tailed H a :µ ≠ µo Reject H o Reject H o Accept H o Reject H a Accept H a - zα / 2 Accept H a zα / 2 Tow tailed ﻋﻠﻰ ﻓﺮﺽ ﺃﻥ ﺭﺋﻴﺲ ﻗﺴﻢ ﺍﻟﺸﺮﻃﺔ ﱂ ﻳﺼﺮﺡ ﺑﺄﻥ ﻧﺴﺒﺔ ﺍﳉﺮﳝﺔ ﻗﺪ ﺯﺍﺩﺕ ﻋﻦ ﻫﺬﺍ ﺍﳌﻌﺪﻝ ﻭﺇﳕﺎ ﺻﺮﺡ ﺑﺄﻧﻪ ﻻ ﻳﺴﺘﻄﻴﻊ ﺃﻥ ﳛﺪﺩ ﺑﺎﻟﻀﺒﻂ ﻫﻞ ﻫﻲ ﺃﻛﱪ ﺃﻡ ﺃﻗﻞ؟ H o : µ = µ o = 2.0 H a :µ ≠ µo ≠ 2 Two tailed test; -6- University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness z cal = X − µo 2.8 − 2.0 = 2.4 0.33 = σX P ( z α / 2 ) = 0 .5 − Agricultural Statistic (605150) Dr. Amer Salman α 2 = 0 .5 − 0.01 = 0.495 ⇒ z tab = 2.575 2 zα / 2 = ± 2.575 z cal ?〉 zα / 2 2.4 〉 2.575 2.4 〈 2.575 z cal 〈 z tab Accept H o , reject H a Reject H o Accept H a Reject H o Accept H a Accept H o Reject H a z tab = −2.575 2.4 z tab = 2.757 . H o ﻭﺗﺰﺩﺍﺩ ﺩﺭﺟﺔ ﺍﻟﻘﺒﻮﻝ ﻟـH a ﻛﻠﻤﺎ ﺯﺍﺩﺕ ﻗﻴﻤﺔ ﺍﳋﻄﺄ ﻭﺑﺎﻟﺘﺎﱄ ﻳﺰﺩﺍﺩ ﺩﺭﺟﺔ ﺍﻟﺮﻓﺾ ﻟـα ﻛﻠﻤﺎ ﺯﺍﺩﺕ:ﻣﻼﺣﻈﺔ In the previous example suppose that X = 2.8, σ = 2.0 had been calculated from a sample size n = 50 instead of n = 36. H o : µ = 2.0 H a:µ ≠ 2 σX = σ n = 2.0 50 = 0.283 -7- University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness z cal = X − µo σX = Agricultural Statistic (605150) Dr. Amer Salman 2.8 − 2.0 = 2.83 0.283 Two tailed test at 99 % confidence level (1 – α) = confidence level α = significance level 0.01 1− = 0.495 2 z cal 〉 zα / 2 2.83 〉 2.575 Reject H o , accept H a . ﻛﻠﻤﺎ ﺯﺍﺩ ﺣﺠﻢ ﺍﻟﻌﻴﻨﺔ ﻛﻠﻤﺎ ﺯﺍﺩﺕ ﺍﻟﺪﻗﺔ ﰲ ﺣﺴﺎﺑﺎﺕ ﺍﻟﻔﺮﺿﻴﺔ ﻭﺫﻟﻚ ﻷﻥ ﺍﻻﳓﺮﺍﻑ ﺍﳌﻌﻴﺎﺭﻱ ﻳﻘﻞ:ﻣﻼﺣﻈﺔ Reject H o Reject H o Accept H o Reject H a Accept H a Accept H a z scale 2.575 2.757 2.83 X scale 1.269 2.731 -8- 2.83 University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Confidence interval = µ o ± z tab Agricultural Statistic (605150) Dr. Amer Salman s n 2.0 X = 2.0 ± 2.575 = 2.0 ± 0.731 50 = µ o ± z tab σ n 1.269 〈 X 〈 2.731 Example (9 – 2): Suppose that a firm wants to test if it can claim that the light bulbs produced last 1000 burning hours ( µ ). The firm takes a random sample of n = 100 bulbs and finds that the sample mean ( X ) = 980 hr and the sample standard deviation (s ) = 80 hr. if the firm wants to conduct the test at 5 % level of significance to show that the light bulbs last different than 1000 hr. Solution: It could proceed as follows. Since µ could be equal to, larger than or smaller then 1000 hr the firm should set the null hypothesis and the alternative hypothesis as: H o : µ = 1000 H a : µ ≠ 1000 Since n > 30, the sampling distribution of the mean is approximately normal (and we can use s as an estimate of σ ). The acceptance region of the test at the 5 % level of significance is within ± 1.96 (95%) under the standard normal curve and the rejection region is in both tails we have a two tailed test. The third step is to find the z value corresponding to X : 0.5 − z cal = X − µo σX = X − µo σ/ n = α 2 = 0.5 − 0.025 = 0.475 ⇒ z tab = ±1.96 X − µo s/ n = 980 − 1000 80 / 100 z cal 〉 zα / 2 2.5 〉 1.96 → reject H o , accept H a -9- = − 20 = −2.5 8 University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Reject H o Accept H a Agricultural Statistic (605150) Dr. Amer Salman Reject H o Accept H a Accept H o Reject H a z scale − 1.96 1.96 X scale 980 984.32 Confidence interval for X = µ o ± z tab 1015.68 s n = 1000 ± 1.96 × 80 = 1000 ± 15.68 10 984.32 〈 X 〈 1015.68 980 not in interval, rejection of bulbs. Example (9 – 3): An army recruiting center from past experience that the weight of army recruits is normally distributed with a mean µ of 80 kg (about 176 lb) and a standard deviation σ of 10 kg. The recruiting center wants to test at 1% level of significance if the average weight of this year recruits is above 80 kg, to do this, it takes a sample of 25 recruits and finds that the average for this sample is 85 kg. How can this test be performed? Solution: Since the center interested in testing the µ > 80 kg, it sets up the following hypotheses H o : µ = 80 kg H a : µ > 80 kg - 10 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman With H a : µ > 80 kg we have right tail test with the region to the right of z = 2.33 at the level of 1% level of significance then; 0.5 – 0.01 = 0.49 (area), z = 2.33 z cal = X − µo σX = X − µo σ/ n = 85 − 80 10 / 25 = 2.5 one tailed ﰲ ﺣﺎﻟﺔzcal ﺇﺷﺎﺭﺓzα ﺩﺍﺋﻤﹰﺎ ﺗﺘﺒﻊ ﺇﺷﺎﺭﺓ:ﻣﻼﺣﻈﺔ z cal 〉 zα / 2 2.5 〉 2.33 → reject H o , accept H a Since the calculated value z falls within the rejection region, we reject H o and accept H a (that µ > 80 kg ). This means that if µ = 80 kg the probability of getting a random sample from this population that gives X = 85 kg is less than 1%. That would be an unused sample indeed. Thus we reject H o at the 1% level of significance (i.e. we reject 99% confident of making the right decision). Reject H o Accept H a 2.33 Example (9 – 4): A producer of steel cables wants to test if the cables produced have a breaking strength of 5000 lb. a breaking strength of less than 5000lb would not be adequate breaking strength and to produce steel cables with breaking strength of more than 5000 lb would unnecessarily increase production costs, and the production takes a random sample of 64 pieces and finds that the average breaking strength is 5100 lb and the sample standard deviation is 480 lb. should the producer accept the hypothesis that its steel cable has a breaking strength of 5000 lb at 5% level of significance? - 11 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Solution: Since µ could be equal to, larger than or smaller than 5000 lb we set up the null and alternative hypotheses as follows: H o : µ = 5000 lb H a : µ ≠ 5000 lb Since n > 30 and the acceptance region of the test at the 5% level of significance is within ( ± 1.96 ) under the standard normal curve and the rejection region or critical region is outside, since the rejection region is in both tails, we have a two tail test, the third step is to find the z value corresponding to X : X − µ o X − µ o X − µ o 5100 − 5000 = = = = 1.67 z cal = σX σ/ n s/ n 480 / 64 z cal 〈 zα / 2 1.67 〉 1.96 → accept H o , reject H a Since the calculated value of z falls within the acceptance region, the producer should accept the null hypothesis and reject H a at 5% level of significance (or with 95% level of confidence). Note that this does not prove that µ is indeed equal to 5000 lb, it can only proves that there is no statistical evidence that µ is not equal to 5000 lb at the 5% level of significance. Confidence interval: C.I = µ o ± z tab σ X = µ o ± z tab σ n = 5000 ± 1.96 × 480 64 = 5000 ± 117.6 4882.4 〈 X 〈 5117.6 - 12 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Reject H o Accept H a Agricultural Statistic (605150) Dr. Amer Salman Reject H o Accept H a Accept H o Reject H a z scale − 1.96 1.67 1.96 X scale 4882.4 5117.6 Example (9 – 5): Assume that a population is composed of 900 elements with a mean of 20 units and standard deviation of 12. Find the mean and standard deviation and confidence interval of the sampling distribution of the mean for a sample size of 36 units, at 1% level of significance. N = 900, µ = 20, s = 12, n = 36, α = 1% µ X = µ = 20 σ 12 σX = = =2 n 6 (Note: we use correction factor if n ≥ 0.05 N , 36? ≥ 0.05(900), 36 < 45) P ( zα / 2 ) = 0.2 – α/2 = 0.495 Confidence interval = µ o ± z tab σ X = 20 ± 2.575 × 2 = 20 ± 5.15 14.85 〈 X 〈 25.15 - 13 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Example (9 – 6): A random sample of 144 with a mean of 100 and s = 60 is taken from a population (N) of 1000. Compute the 95% confidence interval for the unknown population mean. Solution: N = 1000, n = 144, X = 100, s = 60, α = 5% Confidence interval = µ o ± z tab σ X µ = X ± 1.96 ×σ X = 100 ± 1.96 × = 100 ± 9.11 60 144 90.89 〈 µ 〈 109.11 1 – α = confidence interval Large – Sample Estimation of a Population Mean The sample mean X represents a point estimation of the population mean µ . How can we asses the accuracy of this point estimation? zσ X ± zσ X = X ± z n Definition 1: an interval estimator is a formula that tells us how to use sample data to calculate an interval that estimates a population parameter. Definition 2: the confidence coefficient is the probability that an interval estimator encloses the population parameter if the estimator is used repeatedly a very large number of times. The confidence level is the confidence coefficient expressed as a percentage. Large sample 100(1 – α) % confidence interval for µ X ± zα / 2 σ X Where: zα / 2 the z is value with an area α/2 to its right and σ X = σ , σ is the standard n deviation of the sampled population and n is the sample size. When σ unknown (as is almost always the case) and n is large (say n ≥ 30 ), the value of σ can be approximated by the sample standard deviation, s. - 14 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman To illustrate, for a confidence coefficient of 0.90, (1 – α) = 0.90, α = 0.10, α/2 = 0.05 and z 0.05 is the value that locates 0.05 in one tail of the sampling distribution. Confidence level (100 (1 – α) α 90 % 95 % 99 % α/2 zα / 2 0.10 0.05 1.645 0.05 0.025 1.96 0.01 0.005 2.575 Example (9 – 7): Unoccupied seats on flights cause the airline to lose revenue. Suppose a large airline wants to estimate its average number of unoccupied seats per flight over the past year. To accomplish this, the records of 225 flights are randomly selected and the number of unoccupied seats is noted for each of the sampled flights, the sample mean and standard deviation are X =11.6 seats, s = 4.1 seats. Estimate µ the mean number of unoccupied seats per flight during the past year using a 90 % confidence interval. Solution: The general form of the large – sample 90 % confidence interval for a population mean is: α = 0.1, α/2 = 0.05 X ± zα / 2 σ X = X ± z 0.05 σ X ⎛ σ ⎞ = X ± 1.645 ⎜⎜ ⎟⎟ ⎝ n⎠ For the 225 records sampled, we have ⎛ 4.1 ⎞ ⎟⎟ = 11.6 ± 0.45 11.6 ± 1.645 ⎜⎜ ⎝ 225 ⎠ Or from 11.15 to 12.05, that is the airline can be 90 % confident that the number of unoccupied seats per flight was between 11.15 and 12.05 during the sampled year. Sampling distribution of the mean: If we take repeated random samples from a population and measure the mean of each sample, we find that most of these sample means, X s, differ from each other. The probability distribution of these sample means is called sampling distribution of the mean. However the sampling distribution of the mean itself has a mean, given by the symbol µ X and a standard deviation or standard error σ X . - 15 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Two important theorems relate the sampling distribution of the mean to the parent population. Theorem 1: if we take repeated random samples of size n from a population: µX = µ → (4.1) σX = And σ n or σX = σ n × N −n N −1 (4.2 a, b) Where equation (4.2 b) is used for finite populations of size N when n ≥ 0.05 N . Theorem 2: as the samples’ size is increased (that is as n → ∞ ), the sampling distribution of the mean approaches the normal distribution regardless of the shape of the parent population. The approximation is sufficiently good for n ≥ 30 . This is the central limit theorem. We can find the probability that a random sample has a mean X in a given interval by first calculation the z values for the interval, where: X − µX And then look up these values from the z-table. z cal = σX Note: the greater is n, the smaller is the spread or standard error of the mean, σ X if the parent population is normal, the sampling distribution of the mean are also normally distributed, even in small samples. According to the central limit theorem, even if the parent population is not normally distributed, the sampling distributions of the mean are approximately normal for n ≥ 30 . - 16 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Sampling distribution of the mean, n=20 Sampling distribution of the mean, n=5 X scale µ µX X scale Example (9 – 8): Assume that a population is composed of 900 elements with a mean of 20 units and a standard deviation of 12. The mean and standard error of the sampling distribution of the mean for a sample size of 36 is: µ X = µ = 20 σX = σ = n n ≥ 0.05 N 12 36 =2 36 〉 0.05(900) 36 〈 45 If n had been 64 instead of 36 (so that n ≥ 0.05 N ), then σX = σ n × N −n 12 900 − 64 12 836 = × = × = (1.5) × (0.96) = 1.44 N −1 900 − 1 8 899 64 Instead of σ X =1.5 without the correction factor. - 17 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Example (9 – 9): The probability that the mean of a random sample, X of 36 elements from the population in the pervious example falls between 18 and 24 units is computed as follows: z1 = X 1 − µX σX = X 2 − µ X 24 − 20 18 − 20 = −1 And z 2 = = =2 σX 2 2 Looking up z1 and z 2 in the z-table we get P(18 〈 X 〈 24) = 0.3413 + 0.4772 = 0.8158 Or 81.85% X Scale 18 -1 20 0 24 z scale 2 Estimation using the Normal Distribution: A point estimate: is a single number. Such a point estimate is unbiased if in repeated random sampling from a population, the expected or mean value of the corresponding statistic is equal to the population parameter. For example, X is an unbiased (point) estimate of µ because µ X = µ , where µ X is the unexpected value of X . The sample standard deviation is an unbiased estimate of σ and a sample population P is an unbiased estimate of P. - 18 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman An Interval estimate: refers to the range of values together with the probability, or confidence level that the interval includes the unknown population parameter. Given the population standard deviation or its estimate and given the population is normal or that the random sample is equal to or larger than 30, we can find the 95 % confidence interval for the unknown population mean as: P ( X − 1.96 σ X 〈 µ 〈 X + 1.96 σ X ) = 0.95 This means that we expect that 95 out of 100 intervals include the unknown population mean and that our confidence interval is one of these. A confidence interval can be constructed similarly for the population proportion where: µP = σP = µ n = P (The proportion of success in the population) P(1 − P ) (The standard error of the proportion) n Example (9 – 10): A random sample of 144 with a mean of 100 and a standard deviation of 60 is taken from a population of 1000. The 95 % confidence interval for the unknown population mean is: µ = X ± 1.96 σ X Since n > 30 = X ± 1.96 × σ = 100 ± 1.96 × × n 60 N −n Since n > 0.05 N N −1 × 1000 − 144 (using s as an estimate of σ ) 1000 − 1 144 = 100 ± 1.96 × 5 × 0.93 = 100 ± 9.11 ⇒ 90.89 < µ <109.11 Example (9 – 11): A manager wishes to estimate the mean number of minutes that workers take to complete a particular manufacturing process within 3 min. and with 90 % confidence. From past experience, the manager knows that the standard deviation σ is 15 min. The minimum required sample (n > 30) is found as follows: - 19 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness z= X −µ σX Agricultural Statistic (605150) Dr. Amer Salman ⇒ zσ X = X − µ 1.64 ×σ X = X − µ Assuming n< 0.05 N 1.46 × σ n 15 = X −µ = 3 Since the total confidence interval, X − µ is 3 min. n 15 1.46 × = n 3 n = 67.24 or 68 (round to the next higher integer) 1.46 × Determining the Sample Size Necessary for Making Inference about a Population Mean We will see in this section how to choose the appropriate sample size for making an inference about the population mean depends on the desired reliability. Consider the following example: a sample of 100 delinquent accounts produced an estimate X that was within $18 of the true mean amount due, µ for all delinquent accounts at the 95 % confidence level, that is the 95 % confidence interval for µ was $36 wide when 100 accounts we sampled. 1.96 σ X =18 X µ 1.96 σ X =18 n = 100 Now suppose that we want to estimate µ to within $5 with 95% confidence. That is, we want to narrow the width of the confidence interval from $36 to $10, as shown in the following figure: - 20 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman X µ 1.96 σ X =5 n = 1300 1.96 σ X =5 How much will the sample size have to be increased to accomplish this, if we want ⎛ σ ⎞ the estimator X to be within $5 of µ 1.96 σ X = 5 or equivalently 1.96 ⎜⎜ ⎟⎟ = 5 ⎝ n⎠ If the s = 90 from the sample 100 (approximation) ⎛ σ ⎞ ⎛ s ⎞ 1.96 ⎜⎜ ⎟⎟ ≈ 1.96 ⎜⎜ ⎟⎟ ⎝ n⎠ ⎝ n⎠ ⎛ 90 ⎞ = 1.96 ⎜⎜ ⎟⎟ = 5 ⎝ n⎠ 1.96 (90) n= = 35.28 5 n = (35.28) 2 = 1,244.68 Approximately 1,245 accounts will have to be sampled to estimate the mean overdue amount µ to within $5 with approximately 95 % confidence, the confidence interval resulting from a sample of this size will be approximately $10 wide. - 21 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Testing Hypothesis about a Population Proportion (Large – sample Test for a Population Proportion P) 1. Null hypothesis: One tailed: H o : P = Po Two tailed: H o : P = Po 2. Alternative hypothesis: One tailed test: H a : P > Po or H a : P < Po Two tailed test: H a : P ≠ Po 3. Test statistic: Pˆ − Po Pˆ − Po y , where Pˆ = = z= n σ Pˆ Po qo n 4. Rejection region z cal > z tab Reject H o , Accept H a α/2 α/2 Acceptance region Rejection region - zα / 2 0 zα / 2 - 22 - Rejection region University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman α Acceptance region 0 zα Rejection region Note 1: the estimate P̂ takes the place of Y , the hypothesized value Po takes the place of µ o and the standard error σ P takes the place of σ Y . y Note 2: Pˆ = Y = n Pˆ − Po z= ⇒ σ Pˆ = Pˆo (1 − Pˆ ) n σ Pˆ z= Estimate of parameter − Null hypothesis value of parameter S tan dard error of estimate Example (9 – 12): An organization is public set up for the purpose of banning smoking in public restaurants in Greenvale Colorado. In an attempt to convince the city council of this support, they conduct a statistical test of H o : P = 0.5 against H a : P > 0.5 , where P denotes the proportion of adults in the community who support the ban on smoking. The H a (alternative hypothesis) represents the claim of organization that this proportion exceeds one – half. The null hypothesis contains the statements that this proportion is just one – half. Suppose that out of a random sample of 25 adults in this community, 15 support the smoking ban so the point estimate of P is: ﻣﻨﻈﻤﺔ ﳌﻨﻊ ﺍﻟﺘﺪﺧﲔ ﰲ ﺍﻷﻣﺎﻛﻦ ﺍﻟﻌﺎﻣﺔ ﰲ ﻣﺪﻳﻨﺔ ﻣﺎ ﻭﰲ ﳏﺎﻭﻟﺔ ﻹﻗﻨﺎﻉ ﳎﻠﺲ ﺷﻴﻮﺥ ﺍﳌﺪﻳﻨﺔ ﻭﺫﻟﻚ ﻟﺴﻦ ﻗﺎﻧﻮﻥ ﳌﻨﻊ ﺍﻟﺘﺪﺧﲔ ﰲ ﺍﻷﻣﺎﻛﻦ ﺍﻟﻌﺎﻣﺔ ﺇﺫﺍ ﻋﻠﻤﺖ ﺃﻥ ﻫﺬﻩ ﺍﳌﻨﻈﻤﺔ ﻗﺪ ﺻﺮﺣﺖ ﺃﻣﺎﻡ ﳎﻠﺲ ﺍﻟﺸﻴﻮﺥ ﰲ،(P) ½ ﻋﻠﻤﹰﺎ ﺑﺄﻥ ﻧﺴﺒﺔ ﺍﳌﺘﻮﺳﻂ ﺍﻟﻌﺎﻡ ﻟﻠﻨﺎﺱ ﺍﻟﺬﻳﻦ ﻳﺆﻳﺪﻭﻥ ﻣﻨﻊ ﺍﻟﺘﺪﺧﲔ ﻫﻲ 15 ﻭﺗﺒﲔ ﺃﻥ، ﻧﺎﺿﺞ25 ( ﻟﺪﻋﻢ ﺫﻟﻚ ﰎ ﺃﺧﺬ ﻋﻴﻨﺔ ﻣﻦP > 0.5) ﻫﺬﻩ ﺍﳌﺪﻳﻨﺔ ﺃﻥ ﻧﺴﺒﺔ ﺍﻟﺬﻳﻦ ﻳﺆﻳﺪﻭﻥ ﻣﻨﻊ ﺍﻟﺘﺪﺧﲔ ﻫﻲ ﺃﻛﱪ ﻣﻦ ﺍﻟﻨﺼﻒ - 23 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman 5 ( ﻋﻨﺪ ﺩﺭﺟﺔ ﺛﻘﺔH a : P > 0.5 ) (أو ﺍﻟﻔﺮﺽ ﺍﻟﺒﺪﻳﻞH o : P = 0.5 ﻣﻨﻬﻢ ﻳﺆﻳﺪﻭﻥ ﻣﻨﻊ ﺍﻟﺘﺪﺧﲔ ﻫﻞ ﻧﻘﺒﻞ ﺑﺎﻟﻔﺮﺽ ﺍﻷﺳﺎﺳﻲ )ﺃﻥ ( ؟α) % H o : P = 0.5 H a : P > 0.5 y 15 = 0.6 Pˆ = = n 25 Now, 5/min ( Po , 1- Po ) = 5/min (0.5, 0.5) = 10 Whereas n = 25 exceeds 10, so the sampling distribution of P̂ is approximately normal when H o is true. The assumptions are fulfilled for the large – sample test. The standard error of P̂ when H o : P = 0.5 is true is: Po (1 − Po ) (0.5)(0.5) = = 1.0 25 n Pˆ − Po 0.6 − 0.5 0.1 = = = =1 0.1 0.1 σ Pˆ σ Pˆ = z cal P( zα / 2 ) = 0.5 − α = 0.45 zα / 2 = 1.645 (+ ve, zcal )ﺗﺘﺒﻊ ﺇﺷﺎﺭﺓ zcal < zα ⇒ Accept H o , reject H a Confidence interval (in general) = Pˆ ± zα / 2 σ Pˆ Pˆ qˆ = Pˆ ± zα / 2 × n Pˆ qˆ Confidence interval (special case) = Pˆ ± zα × n = 0.6 ± 1.645 × = 0.6 ± 0.1606 Pˆ ≥ 0.76 - 24 - 0.6 × 0.4 25 University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Accept H o Reject H o z cal 1 0.6 1.645 0.76 z Scale P̂ Scale Example (9 – 13): A hospital wants to test that 90 % of the dosages of a drug it purchases contain 100 mg (1/1000 g) of the drug. To do this the hospital takes a sample of n = 100 dosages and finds that only 85 of them contain the appropriate amount. How can the hospital test this at: a. α = 1% b. α = 5% c. α = 10% a. This problem involves the binomial distribution. However, since n > 30 and nP and n(1-P) > 5 we can use the normal distribution with P = 0.90 for the sample, H o : P = 90% H a : P ≠ 90% y 85 = 0.85 And σ Pˆ = Pˆ = = n 100 Po (1 − Po ) = n (0.9)(0.1) = 0.03 100 Since we are interested in finding if P ≤ or ≥ 90% we have H o : P = 90% and H a : P ≠ 90% at the 1% level of significance lies with z tab ( ± 2.85 ) standard deviation units Pˆ − Po 0.85 − 0.9 Since z cal = = = −1.67 the hospital should accept H o , that P = 0.03 σ Pˆ 0.90 at the 1% level of significance. - 25 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Rejection Rejection Accept H o Ho z tab = −2.58 Ho z cal = 1.67 z tab = 2.58 Pˆ qˆ (0.85)(0.15) Confidence interval = Pˆ ± zα / 2 × = 0.85 ± 2.85 × n 100 b. At the 5% level of significance, the acceptance region for H o lies within ± 1.96 standard deviation units and thus the hospital should accept H o and reject H a at the 95% level of confidence as well Rejection Rejection Accept H o Ho z tab = −1.96 Ho z cal = 1.67 z tab = 1.96 c. At the 10% level of significance, the acceptance region lies within ± 1.64 standard deviation units and thus the hospital should reject H o and accept H a that P = 0.90 z cal > z tab ⇒ reject H o - 26 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Rejection Agricultural Statistic (605150) Dr. Amer Salman Rejection Accept Ho Ho Ho z tab = −1.64 z tab = 1.64 z cal = 1.67 Note that larger values of α increase the reject region for H o (i.e. increase the probability of acceptance of H a ). Furthermore, the greater the value of α (i.e. the greater is the probability of rejecting H o when true), the smaller is P (the probability of accepting a false hypothesis). Example (9 – 14): Find the probability of accepting H o for the previous example about the army recruiting center if: a. µ = µ o = 80 Kg. b. µ = 82 Kg. c. µ = 84 Kg. d. µ = 85 Kg. e. µ = 86 Kg. f. µ = 87 Kg. a. If µ = µ o = 80 Kg, X =85, σ =10kg and n = 25: z= X −µ σX = X −µ σ/ n = 85 − 80 = 5 = 2.5 2 10 / 25 The probability of accepting H o when µ = µ o = 80 kg is 0.9938 (by looking up the value of z = 2.5 in z – table and adding 0.5 to it). Therefore, the probability of H o when H o is in fact true equals 1- 0.9938, or 0.0062. b. If µ = 82 Kg instead, z= X −µ σ/ n = 85 − 82 10 / 25 = 3 = 1.5 2 - 27 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman Therefore, the probability of accepting H o when H o is false equals 0.9332 (by z = 1.5 and adding 0.5 to it). c. d. e. f. If If If If µ = 84 Kg, z = (85-84)/2 = 0.5 and β = 0.6915. µ = 85 Kg, z = (85-85)/2 = 0 and β = 0.5. µ = 86 Kg, z = (85-86)/2 = -0.5 and β = 0.5 - 0.1915 = 0.3085. µ = 87 Kg, z = (87-86)/2 = -1 and β = 0.5 - 0.3414 = 0.1587. α 0.50-α 0.50 z 0 Rejection region Acceptance region a) Form of H a : < 0.50 0.50-α α z 0 Acceptance region Rejection region b) Form of H a : > (a), (b): one tailed rejection regions for lower and upper tailed tests. - 28 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness α/2 Agricultural Statistic (605150) Dr. Amer Salman 0.50-α/2 0.50-α/2 α/2 0 Rejection region Rejection region Acceptance region c) Form of H a :≠ (c): The two tailed rejection region. The rejection regions corresponding to typical values selected for α are shown in the following table for one and two tailed tests. Note that the smaller α you select, the more evidence (the larger z) is required before you can reject H o . Table (9 – 3): Rejection region for common values of α ( H o ) Alternative hypothesis Lower – tailed Upper – tailed Two – tailed α = 0.1 z < -1.28 z > 1.28 z < -1.645 or z > 1.645 α = 0.05 z < -1.645 z > 1.645 z < -1.96 or z > 1.96 α = 0.01 z < -2.33 z > 2.33 z < -2.575 or z > 2.575 Example (9 – 15): A manufacturer of cereal wants to test the performance of its filling machine. The machine is designed to discharge a mean amount of µ ounces per box and the manufacturer wants to detect any departure from this setting. The quantity control experiments call for sampling 100 boxes to determine whether the machine is performing to specifications. Set up a test of hypothesis for this quality control experiment using α = 0.01, X = 11.85, s = 0.5 Solution: Since the manufacturer wishes to detect a departure from the setting of µ = 12 in either direction; µ < 12, µ > 12 will conduct a two tailed test. - 29 - University of Jordan Faculty of Agriculture Dept. of Agri. Econ. & Agribusiness Agricultural Statistic (605150) Dr. Amer Salman H o : µ = 12 H a : µ ≠ 12 (i.e. µ < 12, µ > 12) Test statistic: z = X − 12 σX α = 0.01 so α/2 = 0.005 is placed in each tail. This area in the tails corresponds to z = -2.575 and 2.575. Rejection region: (z < -2.575 or z > 2.575) If the sampling experiment in the rejection region of H o , the manufacturer can be 99 % confident that the machine needs adjustment. Ha Ho α/2=0.005 Ha α/2=0.005 Acceptance region -2.575 2.575 Rejection region Rejection region Example (9 – 16): Reefing to the previous example; if n = 100, X = 11.85 and s = 0.5 ounce X − 12 11.85 − 12 z= = = −3.0 σX 0.5 / 100 You can see from the figure above that the value of z = -3.0 is less than -2.575 which provides an evidence to reject the H o and conclude at the α = 0.01 level of significance, that the mean fill differs from the specification of µ = 12 ounce. It appears that the machine is, on average, under filling the boxes. - 30 -