* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Inference for the Mean of a Population
Survey
Document related concepts
Transcript
Chapter 7 and Chapter 8 1 Inference for the Mean of a Population – Part 1 Chapter 7.1 (omit sign test pp 469 – 470) 2 The situation where is not known • If is known then the std deviation of the sample mean is given by /sqrt(n) • We now consider the more realistic situation where is not known. In effect, we estimate using, s, the sample standard deviation. 3 4 t-table (Table D) df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 60 1000 z* 0.25 1.000 0.816 0.765 0.741 0.727 0.718 0.711 0.706 0.703 0.700 0.697 0.695 0.694 0.692 0.691 0.690 0.689 0.688 0.688 0.687 0.686 0.686 0.685 0.685 0.684 0.684 0.684 0.683 0.683 0.683 0.679 0.675 0.674 50.0% Upper tail probability p 0.2 0.15 1.376 1.963 1.061 1.386 0.978 1.250 0.941 1.190 0.920 1.156 0.906 1.134 0.896 1.119 0.889 1.108 0.883 1.100 0.879 1.093 0.876 1.088 0.873 1.083 0.870 1.079 0.868 1.076 0.866 1.074 0.865 1.071 0.863 1.069 0.862 1.067 0.861 1.066 0.860 1.064 0.859 1.063 0.858 1.061 0.858 1.060 0.857 1.059 0.856 1.058 0.856 1.058 0.855 1.057 0.855 1.056 0.854 1.055 0.854 1.055 0.848 1.045 0.842 1.037 0.842 1.036 60.0% 70.0% 0.1 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.296 1.282 1.282 80.0% 0.05 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.671 1.646 1.645 90.0% 0.025 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.000 1.962 1.960 95.0% 0.01 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.390 2.330 2.326 98.0% 0.005 63.656 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.660 2.581 2.576 99.0% 0.0025 127.321 14.089 7.453 5.598 4.773 4.317 4.029 3.833 3.690 3.581 3.497 3.428 3.372 3.326 3.286 3.252 3.222 3.197 3.174 3.153 3.135 3.119 3.104 3.091 3.078 3.067 3.057 3.047 3.038 3.030 2.915 2.813 2.807 99.5% 0.001 318.289 22.328 10.214 7.173 5.894 5.208 4.785 4.501 4.297 4.144 4.025 3.930 3.852 3.787 3.733 3.686 3.646 3.610 3.579 3.552 3.527 3.505 3.485 3.467 3.450 3.435 3.421 3.408 3.396 3.385 3.232 3.098 3.090 99.8% 0.0005 636.578 31.600 12.924 8.610 6.869 5.959 5.408 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.768 3.745 3.725 3.707 3.689 3.674 3.660 3.646 3.460 3.300 3.291 99.9% Confidence Level C 5 Using the t-table 6 • • Example: The following data are the amounts of vitamin C, measured in mg. per 100 grams of blend (dry basis) for a random sample of size 8 from a production run: 26,31,23,22,11,22,14,31 We want a 95% c.i. for µ, the mean vitamin C content produced during this run. 7 • Example: A random sample of 10 onebedroom apartment rental ads from your local newspaper has these monthly rents (dollars): 500,650,600,505,450,550,515,495,650,395. Do these data give good reason to believe that the mean rent of all advertised one apartments is greater than $500 per month? 8 Matched Pairs • Here are some sales before and after a motivational course. Employee Before After 1 212 237 2 282 291 3 203 191 4 327 341 5 165 192 6 198 180 Does the course appear to be effective in increasing sales? 9 Robustness of the t procedures • A statistical procedure is said to be robust if the probability calculations required are insensitive to violations of the assumptions made: • For t: – n < 15: use t if data is clearly close to normal. If clearly nonnormal or outliers are present do not use t. – 40>= n ≥ 15: can use t except in presence of outliers or strong skewness. – Large samples: can use t procedures even for clearly skewed data when sample size is large, roughly n ≥ 40. 10 Inference for the Mean of a Population – Part 2: Comparing Two Means Chapter 7.2 (omit pp 498- 503) 11 Overview • Want to compare means of two populations • Can use c.i. or hypothesis tests. • Many specialized procedures -- depending on data and underlying distributions. • We’ll look at some of the most important ones. 12 The idealized situation • • • We assume variances are known and normal population. Doesn’t happen often in practice Can do hypothesis tests and compute pvalues as in Ch 6. • Example: sigma1 =20, sigma 2 =30, n1 = 120, n2 = 150, x1bar = 67.3, x2 bar = 72.0 • H0: mu1- mu2 = 0. Ha: mu1-mu2 ≠0 – – (a) Compute the z statistic and p-value. (b) Get a 95% c.i for mu1- mu2 13 Two sample tprocedures • The most common situation. We use sample standard deviations to estimate sigma1 and sigma2. 14 Example The purchasing department has suggested that all new computer monitors for your company should have flat screens. You want to be sure employees like them. The next 20 employees needing screens are randomly divided into two groups, with 10 in each group. 10 get flat screens, the other 10 get conventional monitors. One month after receiving the monitors, the employees rate their satisfaction with their monitors on a scale from 1 to 5 by responding to the question “I like my new monitor ( 1= strongly disagree, 5 = strongly agree). Flat screen employees have an average satisfaction of 4.6 with std dev of 0.7. The employees with the standard monitors have an average 3.2 with a standard deviation of 1.6. (a) Give a 95% c.i for the difference in mean satisfaction scores for all employees. (b) What about a hypothesis test for comparing the two means? 15 Robustness of the two sample procedures • Generally procedures are quite robust • If sample sizes are equal and distributions of the two populations have similar shapes, p-values from t table are quite accurate even when n1 and n2 are as small as 5. • If sample sizes are unequal can use the following (same as for one sample t-tests and conf.ints., but replace n by n1+n2): – – – n1+n2 < 15: use t if data is clearly close to normal. If clearly non-normal or outliers are present do not use t. n1+n2 ≥ 15: can use t except in presence of outliers or strong skewness. Large samples: can use t procedures even for clearly skewed data when sample size is large, roughly n1+n2 ≥ 40. 16 Small samples • Have to be very careful. – Substantial uncertainty in estimates, but if differences in means is large, can often detect this • Specialized procedures – If we can assume that two populations have equal variances then can use pooled estimator. – Can test for equal variances (F test) – Numerical procedures (optional) appear in text. 17 Excel • Data analysis tool pack can do two-sample t-tests that we have discussed + optional material: • • Most important for us are the two sample t test that does not assume equal variances Excel also does the calculation for a specialized test that assumes the two populations have equal variance • All are very easy to use. • We Should alway plot data, do normal quantile plots, etc. 18 Excel example • Example– Do piano lessons improve spatial-temporal reasoning? • Excel output appears below. t-Test: Two-Sample Assuming Unequal Variances Mean Variance Observations Hypothesized Mean Difference df t Stat P(T<=t) one-tail t Critical one-tail P(T<=t) two-tail t Critical two-tail Variable 1 3.618 9.334 34 0 62 5.059 0.000 1.670 0.000 1.999 Variable 2 0.386 5.871 44 19 Chapter 8 Inferences for Proportions (Section 8.1) 20 How do sample proportions behave? Chapter 5 tells us … 21 22 Example • A SRS of 1600 BC residents found that 954 favored construction of a new highway to Whistler. • Give a 95% c.i for the true proportion of BC residents who favor a new highway to Whistler. 23 A variation that works better for small samples 24 Using the plus 4 estimator for small samples • 9 of 15 people in a SRS of 15 Buec 232 students felt that the course workload was too heavy. • Compute an approximate 90% c.i. for the proportion of students who felt the course workload was too heavy. 25 Hypothesis tests for proportions – we use sample proportion rather than plus 4 estimate. 26 Example • We found that 11 customers in a sample of 40 would be willing to buy a software upgrade that costs $100. If the upgrade is to be profitable, you will need to sell it to more than 20% of your customers. Do the sample data give good evidence that more than 20% are willing to buy? 27 • A poll (March 2, 2004) estimated that support for the BC Liberal party was 39%. Using this estimate as a “guessed value” for a follow up study, how large a sample would I need to estimate Liberal support to within +/- 3%? I want a 95% level of confidence in my estimate. 28