Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INFERENTIAL STATISTICS Sample Population Information information (Statistic) Statistical inference falls into 2 branches:(i) Statistical estimation (ii) Hypothesis/significance testing In estimation, a sample statistic is used to provide an approximation (estimate) of the population parameter e.g. sample mean (x) can be used to estimate population mean (Π). Sample standard deviation (s) – Population standard deviation (6) In Hypothesis testing, the population characteristic (s) in form of parameter(s) is/are known or assumed. Sample results(s) are used to verify or confirm the population characteristics. ESTIMATION Sample Population Estimation in take two forms: (a) Point estimation (b) Interval estimation Point estimation In point estimation we calculate/determine a sing value or single point from sample data to approximate source unknown population parameter. The single value is calculated using some formula referred to as point estimator e.g point estimator for population mean is:X = ∑fx - Estimate for 0 is S = ∑f ∑f ( x – x ) ² ∑f The value obtained by substituting the data value into the estimator is known as point estimate i.e. specific value e.g. X = 15 e.t.c. An estimator (Ө) is derived on basis of the “good” characteristic for a good estimator. These characteristics include: Unbiasedness = E(Ө) e.g. E(x) equal = Ц Efficiency – The most efficient estimator has the smallest Variance i.e. if var (Ô) ═> Ô1 is more efficient than Ô2 Consistency – A sufficient estimator uses all the values in the data. Situations suitable for point estimate 1. Comparative studies e.g. comparing wealth of nations (per capita) 2. Almost an accurate estimate required e.g. approximate order size for an expensive item. 3. Where error may lead to dangerous implications e.g. approximation of voltages. Interval Estimation A point estimate may be right or wrong. It does not incorporate the possible error or precision of the estimate and yet a sample value can never be an exact representation of the population value. It is always associated with some lend of error or uncertainty. Interval estimation incorporates the degree of error or precision in the estimator. An Interval estimator consists of two values; a lower value and an upper u within which some unknown population parameter has with some specified probability (1 - ∞) 100% ═> P(L <Ө < U) = 1 - α The interval estimator consists of two values; a lower value L and an upper value U within which some unknown population parameter has with some specified probability (1 - α) 100 ═> P (l < Ө < (1) = 1 – O The interval limits L and called (1 - α) 100% confidence limits and the interval (L, U) called ( 1 α) 100% confidence interval. It simply indicates that we are (1 - α) 100% confident or sure that the unknown population parameter lies between L and U. The theory of interval estimation is based on the concept of sampling distribution for instance if; Population size, N = 6 Sample size, N = 3? How many possible samples can be selected? Let the no be S=(6) = 6( 3 = 3 6 = 20 3! (6 – 3)! No of possible sample means (or Ô value) = 20 i.e X1 , X 2……………………………….., X 20 or Ô1, Ô2,………………………………………………….., Ô20 Ô’s are numerical values ═> can be arranged as a frequency table which can be developed into a probability distribution. (ii) P (ЦÔ – Z α /2 Ô< Ô + Z α /2) = 1 - ∞ µӨ - Z α /2 µӨ µӨ + Z α /2 Or Ө ═> Ө Ө = > ӨÔ P (Ө - Z Ô Ô < Ô < + Z Ô = 1 - α But since the intention is to estimate Ө - Z Ô the expression is adjusted using arithmetical manipulation to reflect this. It becomes: P (Ө - ZÔ Ө< Ô < Ô + Z Ô Ө The interval Ô + Z α /2 Ô + ZÔ) =1-α Interval estimation of population, µ Ө = Ц Ө= X SEX = ӨX =S/n Confidence limits are:µ = x + Z α /2 s/n Suppose (a) α = 5% = 0.05 => 1 - α = 0.95 (95%) => Ө - z Ө Ө Ө + zӨ (a) α = 0.01 (1%) => 1 - α= 0.99 (99%) Using the reasoning in (a) above. Z= 2.58 µ = x ± 2.58 s/√n P[x ± 2.58 Z s s /√n ≤ µ ≤ x + 2.58 s/√n] = 0.99 /√n is called error of estimate which can be reduced by sample size n. In real life application the maximum error allowed is usually specified and the task becomes that of determining the sample size that will guarantee this maximum errors. Let the maximum allowed error be = E => Z s /√n ≤ E => Z2 S2 ≤ n E2 n≥ Z2 S2/E2 Illustrations 1. An organization wishes to estimate its average monthly profit. The accountant picks a random sample of 35 monthly profits from previous records. The sample indicates a mean value of Ksh 105 M and standard deviation of Ksh.25 M. (a) Determine (i) 95% (ii) 99% confidence intervals for the actual monthly mean profit. (b) The company policy states that any estimate for the monthly mean profit should be within a maximum error of Ksh 30M. what sample size should the accountant pick to be within the requirements of the company policy at (i) 95% (i) 99% s µ = x ± Z /√n (a) (i) µ =105 ± 1.96 25 /√35= 105 ± 8.28 => P[96.72 ≤ µ≤ 113.28]=0 = (96.72, 113.28) .This means that we are 95% confident that the actual mean monthly profit lies between 96.72m and 113.28 M. (ii) 105 ± 2.58 25 /√35 = 105 ± 10.9 => (94.1, - 115.9) => [94.1≤ µ≤ 115.9) =0.99 => P (96.72 < 113.38) = 0.99 This means that we are 99% confident that the actual mean monthly profit lies between 96.72m and 113.28m (b) n> Z²S² E² (i) n> 1.96² . 25² > 4.6225 => nmin = 5 30² SAMPLING DISTRIBUTION 1. µ pop mean X 2 Population Sample pop. proportion µ S/ П П (I – П) P Z S/ n Z S/ n n П n p(I – p) Z n % P(1 – p) n p(100 – p) n 3. Difference between 2 pop = means i.e. µ1 S1 + S2 X1 – X2 µ1 - µ2 n 1 n2 Z – u2 S1 + S2² n1 = S1² + S2² n1 n2 n1 4. Difference between 2 P1 – P2 П-П population P (1- P1) P2(1-P2) n 1 n2 Z P1 (1 – P1)P2(1 n proportions i.e. П1 – П2 Exercise 1. A management company wished to estimate the proportion defective output from their latest population line. For this purpose, a random sample of 800 units of output from the line resulted to 150 defective units. (a) Determine the estimate of the defective level of the new Pat line at (i) 95% (ii) 99% (iii) 90% C.Ls. (b) Determine minimum sample size to guarantee a maximum error of estimates of 5% of (i) 95% (ii) 99% confidence levels. 2. National supermarket Co. Ltd operate their supermarkets in 2 regions i.e. CBD and outskirt of the city. They wish to estimate the average deference in their daily sales between the 2 regions. They picked a random sample of 55 daily sales which resulted to a mean of Ksh2m and Standard deviation of Ksh 200,000. A random sample of 40 daily Ksh 1.6M and standard deviation of Ksh.150,000. Determine an estimate for the difference between the mean daily sales for the 2 regions using 95% C.L n 3. A presidential candidate wishes to determine the difference in how popularity between Central and Western regions of the country. His campaign advisors picked a random sample of 2,000 registered voters from the central region and 1,200 favored his candidature. A random 1,400 voters from the Western region indicated that 800 were in favor of the candidate. Based on this results determine the difference of the candidature popularity in the 2 regions at (i) aj % (ii) 99% confidence levels. HYPOTHESIS/SIGNIFICANCE TESTING X S µ б Sample Population In hypothesis testing the sample statistic is used to confirm/ascertain the population parameter. Does the sample value/statistic differ significantly from the population value? If they differ significantly, then the population is not the one we had in mind, it has changed. e.g The coin is fair P(H) = ½ Toss the coin 100 times Suppose; (a) = n(H) = 48 Accept coin is fair – sample value not significantly different from the expected population value. (b) n (H) = 30 Reject that coin is fair Reason The difference/sample value is significantly different from the population value. e.g Ho: P(H) = ½ , Ho : is rejected or nullified Denoted as HA : or H1 e.g H A: µ < 2kg Steps in hypothesis testing 1. Statement of the hypothesis i.e states the hypothesis. This takes two forms which includes (a) Null hypothesis denoted as Ho and (b) Alternative hypothesis denoted as H1or HA 2. Choose significance level µ - Zб µ µ + Zб Rejected = α / 2 Common significance levels, α (i) α = 0.05 (5%) confidence level, 1 - ∞ = 0.95 (95%) If Ho: rejected at 5%, then, the difference (sample – population) is said to be significant. (ii) α = 0.05 (1%) C.L 99 (99%). Hence very significant (iii) α = 0.001 (0.1 %) C.L = 0.999 (99.9%). The difference is said to be very highly significant. The level chosen depends on the implications of rejecting the null hypothesis. If the implications are senior e.g. dismissing an employee then, the significance level should be small like 0.01 or 0.001. 3. Decision rule Decide whether the two – tail test or one – tail test, depending on e.g Ho: µo = 2kg (a) HA : µ = µo = 2kgs either too high or too low => Two – tail test If α= 0.05 0.025 0.025 µo – 1.96 бx µo + 1.96 бx µ => Reject Ho when sample statistic, in this case µo + 1.96б x or x < µo – 1.96 бx OR In standardized form, reject Ho when > = x - µo > 1.91 бx (b) HA : µ > when only too high. = > one – tail test / right tail If α = o.o5 0.45 µo 0 1.645 µo + 1.645 бx = > Reject when x > µo + 1.64 Reject when x > 1.645 (c) HA µ< µo Customer => one tail test/left tail µo - 1.645 - µo -----------------> Reject when x <- - 1.645 4. Obtain the necessary sample data and then determine the require test value/test statistic in either (i) absolute form i.e Õ (ii) Standardized Z = Õ = µo e.g for (a) Ho: µ = µo Test statistic Z = X - µo S/ n (b) Ho: П = Пo Test statistic z = p – Пo Пo (I – Пo) n (c ) Ho: µ1 = µ2 or µ1 - µ2 = 0 Sample statistic = X1 – X2 Test statistic, Z = ( X1 – X2) – (µ1 - µ2) S1 ² + S2 ² n1 n2 Since µ1 - µ2 =0, then = > X1 – X 2 S1²+ S2² n1 n² (d) Ho: П1 = П2 or П1 – П2 = 0 Sample statistic = P1 – P2 Test statistic, Z = (P1 – P2) – (П1 – П2) P1 (1 – P2) + P2 (1 – P2) n1 n2 = > Z = P 1 – P2 5. Conclusion/inference It is categorized into stages i. Statistical/conclusion (a) Reject Ho (b) Fail to reject/nullify H o i.e. evidence from the ii. Managerial/conclusion Express the statistical conclusion in a language that will be understood by other stakeholders’ layman’s language. EXERCISE 1. The mean time taken for the setting of jam is 70 minutes with standard deviation for 8 minutes. It is known that the quality of the jam can be improved by doing 1 of the ingredients but it is not clear whether this would affect the setting time. To investigate this concern, 40 batches of the new jam are produced and their measurements indicate an average setting time of 78 minutes respond to the concern at (i) 5% = α = 0.0 (ii) α = 0.01. 2. The manufacturer of a certain pelt believes that the pelt has 45% share of the market. A market research survey was conducted and showed that out of random sample of 1000 consumers, 400 brought the manufacturer’s pelt and the rest bought other brands. Test the manufacture belief. 3. In testing of cake recipes, a quality assessment is made based on awarding marks for the cake quality. Two recipes are tested to find out if they differ in quality. The 1 st recipe is tested by 40 people whose average ma was so with standard deviation of 10. The 2 nd recipe was tested by 30 people and the average mark was 45 with standard deviation of 8. Do these results indicate any difference in appeal between the 2 recipes? 4. The ministry of transport has recently carried out an intensive advertising campaign to encourage vehicles drivers to use their seat belts. A survey was carried out in Nairobi and Kisumu with a view to determine the %age of vehicle drivers who use seat belts after the campaign. A random sample of 1000 drivers was observed in Nairobi and 260 of them used seat belts. In Kisumu, 500 drivers were observed and 100 of them used seat belts. Do these results indicate that greater proportion of drivers in Nairobi use seat belts that the drivers in Kisumu.