Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 7 Frequency Distributions © Ray Panko Probability Distributions 2 3 Same mean, different standard deviations Different means © Ray Panko Event is the estimation of the mean (X bar) from a sample of size n. µ X Sampling µ µ X μx x X Frequency Distribution for a variable © Ray Panko Sampling Distribution to find the mean of the variable 4 5 µ δ μx X Population Distribution Sampling Distribution for µ μX μ © Ray Panko σX x σ n 6 Forty percent of voters call themselves independents. ◦ 40% is a proportion (∏) ◦ Take a sample to estimate ∏ ◦ The sample mean, p, is an unbiased estimator of ∏ ◦ The sampling standard deviation, δp, is given by: © Ray Panko “Based on a sample of 1,500 households, the percentage of voters in favor of Proposition X is 40%, with a sampling error of plus or minus 3%.” 8 The sample mean (X) or proportion (p) is not likely to be exactly the population mean (µ) or proportion (∏) However, they should be close. Confidence intervals allow us to estimate how close. Example: “It is estimated that the proportion of independent voters is 49%, with a sampling error of plus or minus 3%.” © Ray Panko 9 Probability that the true population mean µ will lie within a certain interval around the sampling distribution mean Xbar, with a certain degree of confidence. 95% Confidence Interval Xbar © Ray Panko 10 If the confidence level is 95%, then the area outside the confidence interval, which we call α, is 0.05. The upper and lower tails are α/2 or 0.025 95% 1 , so 0.05 α 0.025 2 α 0.025 2 Xbar © Ray Panko 11 Find the Z values for α/2. For P(1-0.025) = P(0.975), Z is 1.96 So the Z values are -1.96 and 1.96 α 0.025 2 α 0.025 2 Z units: Zα/2 = -1.96 X units: Lower Confidence Limit © Ray Panko 0 Point Estimate Zα/2 = 1.96 Upper Confidence Limit 12 Confidence Level 80% 90% 95% 98% 99% 99.8% 99.9% © Ray Panko Confidence Coefficient, Zα/2 value 0.80 0.90 0.95 0.98 0.99 0.998 0.999 1.28 1.645 1.96 2.33 2.58 3.08 3.27 1 13 A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. 95% confidence for the true mean: X Zα/2 σ n 2.20 1.96 (0.35/ 11 ) 2.20 0.2068 1.9932 μ 2.4068 © Ray Panko 14 © Ray Panko A sample of 11 circuits from a large normal population has a mean resistance of 2.20 ohms. We know from past testing that the population standard deviation is 0.35 ohms. 90% confidence interval for the true mean: σ n 2.20 1.645 (0.35/ 11) 2.20 0.173595 2.0264 2.3736 X Z/2 15 Confidence Intervals Use Normal Distribution With δ Population Mean σ Known © Ray Panko σ Unknown Population Proportion Use t Distribution based on the sample standard deviation S computed from sample instead of δ 16 Assumptions ◦ Population standard deviation is unknown ◦ Population is normally distributed ◦ If population is not normal, use large sample Use Student’s t Distribution instead of the normal distribution Confidence Interval Estimate: X tα / 2 S (where tα/2 is the critical value of the t distribution with n -1 degrees of freedom and an area of α/2 in each tail) © Ray Panko n 17 Idea: Number of observations that are free to vary after sample mean has been calculated Example: Suppose the mean of 3 numbers is 8.0 X1 = 7 X2 = 8 X3 = ? If the mean of these three values is 8.0, then X3 must be 9 (i.e., X3 is not free to vary) Here, the sample size (n) = 3 So degrees of freedom = n – 1 = 3 – 1 = 2 © Ray Panko 18 For confidence intervals based on sample standard deviations, d.f. = n-1 Where n is the sample size © Ray Panko 19 Note: t so (n-1 Z as n increases n) Standard Normal (t with df = ∞) t (df = 13) t-distributions are bellshaped and symmetric, but have ‘fatter’ tails than the normal t (df = 5) 0 © Ray Panko t 90% confidence level, 20 = 0.10 /2 = 0.05 Upper Tail Area df .25 .10 .05 1 1.000 3.078 6.314 2 Sample Size = 3 df = n-1 df = 2 0.817 1.886 2.920 /2 = 0.05 3 0.765 1.638 2.353 The body of the table contains t values, not probabilities © Ray Panko 0 2.920 t 21 Confidence Level .90 t (10 d.f.) 1.812 t (20 d.f.) 1.725 t (30 d.f.) 1.697 1.645 .95 2.228 2.086 2.042 1.96 .99 3.169 2.845 2.750 2.58 As sample size n increases, df (n-1) increases. As df increases, t approaches z So at large sample sizes, t and z are the same © Ray Panko z 22 A random sample of n = 25 has X = 50 and S = 8. Form a 95% confidence interval for μ ◦ d.f. = n – 1 = 49, and α/2 = .025 ◦ From Table E.1, tα/2 = 2.0639 ◦ So The confidence interval is X t/2 S 8 50 (2.0649) n 25 46.698 ≤ μ ≤ 53.302 © Ray Panko 23 TINV(Probability, df) For a 95% confidence level, sample size of 25, and a standard deviation S of 8 ◦ df is 24 (n-1) ◦ Probability is α (.05), not α/2 = .05 ◦ Equation is = TINV(.05,24) ◦ Its value is 2.063899 ◦ This is the same value found with the table lookup © Ray Panko 24 Confidence Intervals Population Mean σ Known © Ray Panko σ Unknown Population Proportion Based on a sample of 70, 95% of our faculty members have PhDs. 25 Recall that the distribution of the sample proportion is approximately normal if the sample size is large, with standard deviation σp (1 ) n We will estimate this with sample data: p(1 p) n © Ray Panko 26 Upper and lower confidence limits for the population proportion are calculated with the formula p(1 p) p Zα/2 n where ◦ Zα/2 is the standard normal value for the level of confidence desired ◦ p is the sample proportion ◦ n is the sample size Note: must have np > 5 and n(1-p) > 5 © Ray Panko 27 A random sample of 100 people shows that 25 are left-handed. Form a 95% confidence interval for the true proportion of lefthanders. p Zα/2 p(1 p)/n 25/100 1.96 0.25(0.75) /100 0.25 1.96 * (0.0433) 0.1651 0.3349 © Ray Panko for a desired error size and confidence level 29 (continued) To determine the required sample size for the mean, you must know: ◦ The desired level of confidence (1 - ), which determines the critical value, Zα/2 ◦ The acceptable sampling error, e (the plus or minus in the estimate). ◦ The population standard deviation, σ © Ray Panko 30 If = 45, what sample size is needed to estimate the mean within ± 5 with 90% confidence? Z σ (1.645) (45) n 219.19 2 2 e 5 2 2 2 2 So the required sample size is n = 220 (Always round up) © Ray Panko 31 If unknown, σ can be estimated when using the required sample size formula ◦ Use a value for σ that is expected to be at least as large as the true σ ◦ Select a pilot sample and estimate σ with the sample standard deviation, S © Ray Panko 32 A confidence interval estimate (reflecting sampling error) should always be included when reporting a point estimate The level of confidence should always be reported The sample size should be reported An interpretation of the confidence interval estimate should also be provided © Ray Panko