Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Transcript

Math 160 - Cooley Intro to Statistics OCC Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown In Section 8.2, we learned how to determine a confidence interval for a population mean, μ, when the population standard deviation, σ, is known. Recall: Standardized Version of the Sample Mean Suppose that a variable x of a population is normally distributed with mean μ and standard deviation σ. Then, for samples of size n, the variable x is also normally distributed and has mean μ and standard deviation . n Equivalently, x z , / n has the standard normal distribution. What if, as is usual in practice, the population standard deviation is unknown? Then we cannot base our confidence-interval procedure on the standardized version of x . The best we can do is estimate the population standard deviation, σ, by the sample standard deviation, s. So, by replacing σ with s, we now have the: Studentized Version of the Sample Mean Suppose that a variable x of a population is normally distributed with mean µ. Then, for samples of size n, the variable x t s/ n has the t-distribution (or Student’s t-distribution) with n – 1 degrees of freedom. William Sealy Gosset introduced the t-statistic in 1908. Gosset was a statistician as well as a chemist for the Guinness brewery in Dublin, Ireland. The Guinness brewery had the policy of recruiting the best graduates from Oxford and Cambridge, selecting from those who could provide applications of biochemistry and statistics to the company’s established industrial processes. Gosset was one such graduate and in the process, devised the t-test. It was originally envisioned as a way to monitor the quality of the stout (the dark beer the brewery produces) in a cost effective way. Gosset published the test under the pen-name ‘Student’ in Biometrika circa 1908. The reason for the pen-name was due to Guinness’ insistence, as the company wanted to keep their policy about utilizing statistics as part of their ‘trade secrets’. -1- Math 160 - Cooley Intro to Statistics OCC Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown Basic Properties Of t-Curves Property 1: The total area under any t-curve equals 1. Property 2: The t-curve extends indefinitely in both directions, approaching, but never touching, the horizontal axis as it does so. Property 3: A t-curve is symmetric about 0 Property 4: As the number of degrees of freedom becomes larger, t-curves look increasingly like the standard normal curve. Note: For df ≥ 1000, the t-density curves and the standard Normal curve are virtually indistinguishable. This happens because s estimates σ more accurately as the sample size increases. So using s in place of σ causes little extra variation when the sample is large. An example of t-curves with different degrees of freedom and all with a 95% confidence level -2- Math 160 - Cooley Intro to Statistics OCC Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown INTERVAL PROCEDURE #2 – The One-Mean t-Interval Procedure Purpose: To find a confidence interval for a population mean, μ. Assumptions 1) Simple Random Sample 2) Normal population or large sample (n ≥ 30) 3) σ unknown Step 1 – For a confidence level 1 – α, use Table IV to find t 2 with df = n – 1, where n is the sample size. Step 2 – The confidence interval for µ is from x t 2 where t 2 s s to x t 2 n n is found in Step 1, and x & s are computed from the sample data. Step 3 – Interpret the confidence interval. The confidence interval is exact for normal populations and is approximately correct for large samples from non-normal populations. Using the t procedures IN PRACTICE** 1. Except in the case of small samples, the assumption that the data are an SRS from the population of interest is more important than the assumption that the population distribution is Normal. 2. Sample size less than 15: Use t procedures if the data appear close to Normal (symmetric, single peak, no outliers). If the data are skewed or if outliers are present, do not use t. 3. Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness. 4. Large samples: The t procedures can be used even for clearly skewed distributions when the sample size is large, roughly n ≥ 30. -3- Table IV - Critical Values of Student’s t-Distribution df 0.10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.309 1.309 1.308 1.307 1.306 1.306 1.305 1.304 1.304 1.303 1.303 1.302 1.302 1.301 1.301 1.300 1.300 Amount of in one tail. 0.05 0.025 0.01 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.696 1.694 1.692 1.691 1.690 1.688 1.687 1.686 1.685 1.684 1.683 1.682 1.681 1.680 1.679 1.679 1.678 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.040 2.037 2.035 2.032 2.030 2.028 2.026 2.024 2.023 2.021 2.020 2.018 2.017 2.015 2.014 2.013 2.012 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.453 2.449 2.445 2.441 2.438 2.434 2.431 2.429 2.426 2.423 2.421 2.418 2.416 2.414 2.412 2.410 2.408 Amount of in one tail. 0.05 0.025 0.01 0.005 df 0.10 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 2.744 2.738 2.733 2.728 2.724 2.719 2.715 2.712 2.708 2.704 2.701 2.698 2.695 2.692 2.690 2.687 2.685 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 80 85 90 95 100 200 300 400 500 600 700 800 900 1000 2000 Z* 1.299 1.299 1.299 1.298 1.298 1.298 1.297 1.297 1.297 1.297 1.296 1.296 1.296 1.296 1.295 1.295 1.295 1.295 1.295 1.294 1.294 1.294 1.294 1.294 1.293 1.293 1.293 1.293 1.292 1.292 1.291 1.291 1.290 1.286 1.284 1.284 1.283 1.283 1.283 1.283 1.282 1.282 1.282 1.282 1.677 1.677 1.676 1.675 1.675 1.674 1.674 1.673 1.673 1.672 1.672 1.671 1.671 1.670 1.670 1.669 1.669 1.669 1.668 1.668 1.668 1.667 1.667 1.667 1.666 1.666 1.666 1.665 1.664 1.663 1.662 1.661 1.660 1.653 1.650 1.649 1.648 1.647 1.647 1.647 1.647 1.646 1.646 1.645 2.407 2.405 2.403 2.402 2.400 2.399 2.397 2.396 2.395 2.394 2.392 2.391 2.390 2.389 2.388 2.387 2.386 2.385 2.384 2.383 2.382 2.382 2.381 2.380 2.379 2.379 2.378 2.377 2.374 2.371 2.368 2.366 2.364 2.345 2.339 2.336 2.334 2.333 2.332 2.331 2.330 2.330 2.328 2.326 2.682 2.680 2.678 2.676 2.674 2.672 2.670 2.668 2.667 2.665 2.663 2.662 2.660 2.659 2.657 2.656 2.655 2.654 2.652 2.651 2.650 2.649 2.648 2.647 2.646 2.645 2.644 2.643 2.639 2.635 2.632 2.629 2.626 2.601 2.592 2.588 2.586 2.584 2.583 2.582 2.581 2.581 2.578 2.576 80% 90% 95% 98% Confidence Level C 99% 2.011 2.010 2.009 2.008 2.007 2.006 2.005 2.004 2.003 2.002 2.002 2.001 2.000 2.000 1.999 1.998 1.998 1.997 1.997 1.996 1.995 1.995 1.994 1.994 1.993 1.993 1.993 1.992 1.990 1.988 1.987 1.985 1.984 1.972 1.968 1.966 1.965 1.964 1.963 1.963 1.963 1.962 1.961 1.960 0.005 -4- Math 160 - Cooley Intro to Statistics OCC Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown Exercises: 1) According to Communications Industry Forecast, published by Veronis Suhler Stevenson of New York, NY, the average person watched 4.47 hours of television per day in 2000. A random sample of 40 people gave the following number of hours of television watched per day for last year. Find a 98% confidence interval for the amount of television watched per day last year by the average person. (Note: x = 4.615 hr and s = 2.277 hr.) 2) The paper “Correlations between the Intrauterine Metabolic Environment and Blood Pressure in Adolescent offspring of diabetic mothers” (Journal of Pediatrics, Vol. 136, Issue 5, pp. 587-592) by N. Cho et al. presented findings of research on children of diabetic mothers. Past studies showed that maternal diabetes results in obesity, blood pressure, and glucose tolerance complications in the offspring. Following are the arterial blood pressures, in millimeters of mercury (mm Hg), for a random sample of 16 children of diabetic mothers. Assume that the arterial blood pressures are normally distributed. 81.6 84.6 84.1 101.9 87.6 90.8 82.8 94.0 82.0 69.4 88.9 78.9 86.7 75.2 96.4 91.0 The mean and standard deviation from the above data are 85.99 and 8.08 mm Hg, respectively. a) If you were to find a 95% confidence interval for the mean arterial blood pressure of all children of diabetic mothers, would you use the z-interval procedure or the t-interval procedure? b) Now, find that 95% confidence interval. -5- Math 160 - Cooley Intro to Statistics OCC Section 8.4 – Confidence Intervals for One Population Mean When σ Is Unknown Sample Test Multiple Choice Questions: 3) What generally happens to the sampling error as the sample size is decreased? It gets A) smaller B) larger C) more predictable D) less predictable 4) Find the value of α that corresponds to a confidence level of 93%. A) 0.1762 B) 1.48 C) ‒1.48 D) 0.93 E) 0.07 AB) 0.8238 For a t-curve with df = 24, find t0.05 A) 1.711 B) 2.797 C) 2.064 5) 6) D) 2.069 E) 1.714 AB) 2.807 For a t-curve with a sample size of 10, find t0.01 A) 1.372 B) 2.821 C) 1.383 D) 3.169 E) 2.764 AB) 3.250 7) For a t-curve with df = 15, find the two t-values that divide the area under the curve into a middle 0.95 area and two outside areas of 0.025. A) 0 & 2.131 B) –2.145 & 2.145 C) 0 & 2.145 D) –2.131 & 2.131 8) Which of the following statements regarding t-curves is/are false? I. The total area under a t-curve with 20 degrees of freedom is greater than the area under the standard normal curve. II. The t-curve with 1000 degrees of freedom is distinctively flatter and wider than the standard normal curve. III. The t-curve with 20 degrees of freedom more closely resembles the standard normal curve than the t-curve with 25 degrees of freedom. A) I and II B) I and III C) II and III D) I, II, and III 9) Suppose that you wish to obtain a confidence interval for a population mean. Under the conditions described below, should you use the z-interval procedure, the t-interval procedure, or neither? - The population standard deviation is known. - The population is normally distributed. - The sample size is small. - The sample is a simple random sample. A) t-interval procedure B) neither C) z-interval procedure 10) Suppose that you wish to obtain a confidence interval for a population mean. Under the conditions described below, should you use the z-interval procedure, the t-interval procedure, or neither? - The sample size is small. - The population standard deviation is unknown. - The population comes from a Binomial distribution. - The sample is a simple random sample. A) t-interval procedure B) neither C) z-interval procedure 11) Suppose that you wish to obtain a confidence interval for a population mean. Under the conditions described below, should you use the z-interval procedure, the t-interval procedure, or neither? - The population standard deviation is unknown. - The population comes from an unknown distribution. - The sample is a simple random sample. - The sample size is large. A) t-interval procedure B) neither C) z-interval procedure -6-