Download Document

Ch. 8: Estimation Section 1 2 3 4 Y. Butterworth Title Introduction to the Chapter Estimating Mu when Sigma is Known Estimating Mu when Sigma is Unknown Estimating p in the Binomial Distribution Estimating Mu1–Mu2 and p1–p2 Ch. 8 Brase’s 9th Edition Notes Pages 2 2–5 6–8 9 – 11 12 – 16 1 Overview of Chapter 8 Concepts The best point estimate of the population mean is the same mean, because: 1) It is more consistent, meaning that it has less variation, than any other estimator for the population 2) It is an unbiased estimator of the population mean, meaning that it symmetrically estimates over and under the population mean. However, a better estimate than a point estimate exists. It is a confidence interval, which is likely to contain the true population mean, based upon Normal theory. Here is how and what we will be doing in this chapter:  We will use the standard normal distribution to define the confidence interval with, alpha (symbol: α) This is the probability that the true value of the mean is outside the interval created.  We will say that the level of confidence/degree of confidence that the interval actually contains the mean is 1 – α. What this means is that if we were to create 100 confidence intervals based upon 100 different samples, then we should expect 1 –α of those intervals to contain the true population mens. §8.1 Estimating Mu when Sigma is Known We know that sample means are normally distributed and thus there is a small probability that they fall in the tails of the normal distribution. We will define the probability that a value falls within the tail of the normal distribution as α. Due to the symmetry of the normal distribution there is only 1/2 α probability that a value will fall in either tail. We’ll write that as α/2. 1–α % Confidence -zα/2 0 zα/2 The values that are associated with the 1–α % Confidence Region are called critical values and are denoted by -Zα/2 and Zα/2. We are going to practice finding critical values based the z-table. We will also use the table in your book to find the critical values, and I will also show you how to use a t-table to find critical values for the normal distribution. Note: Your book uses “c” in place of α, but that is not typical notation and I will use α in its place. Y. Butterworth Ch. 8 Brase’s 9th Edition 2 Example: Find the critical value corresponding to 90% level of confidence. Step 1: Find α (remember that 1–α% is the level of confidence) Step 2: Split α into two tails and draw a picture Step 3: Find the z-score P(Z < zα/2) = α/2 Note: This is the same as finding normal values given a pre-defined probability. On your calculator this is done with the invnormal(α/2 There are 2 alternate ways of finding the z-score associated with a 1–α% level of confidence. Alternate #1: Table of Critical Values (Table 5 on A23 in Appendix II or p. 332) Step 1: Look up the level of confidence desired Note: The draw back is that there are only values for 70-95 by increments of 5 and 98 & 99% levels of confidence. Another draw back is that this table is not always available. Alternate #2: Step 1: The t-table on A24 of Appendix II In the left column find the ∞ (the degrees of freedom where t is approximately a standard normal) Step 2: Along the top in the One-Tail find α/2 or in the Two-Tail find α Step 3: Pinpoint the value that corresponds to step 1 & step 2 in the “body” of the t-table. Note: The draw back is that there are only values for 50%, 75%, 80-95% by increments of 5, 98%, 99% & 99.9%. Another draw back, depending on the table, is that the values may differ from those in a z-table (although they are typically more accurate than a z-table). Y. Butterworth Ch. 8 Brase’s 9th Edition 3 Now, you might be wondering why we find critical values in the way that we do, and how that relates to the normal distribution, so I will include it in my notes, although I may not take the time in the class to go over the derivation as it is mostly Algebra. We are discussing the distribution of the sample means and therefore when we are discussing the standard normal we will be talking about a standard normal where μ = population mean and σ = σ / √n ∴ z = x-bar – μ σ / √n P(-zα/2 < Z < zα/2) = 1 – α and knowing that P(-zα/2 < x-bar – μ < zα/2) = 1 – α σ / √n z = x-bar – μ σ / √n and using a little Algebra to solve a compound inequality – for μ P(x-bar – zα/2 • (σ / √n) < μ < x-bar + zα/2 • (σ / √n)) = 1 – α Thus, you see that the mean is between the values of x-bar minus zα/2 • (σ / √n) and x-bar plus zα/2 • (σ / √n) with probability 1–α. We call the zα/2 • (σ / √n) the margin of error and denote it with a “E”. Finding Confidence Intervals when σ is known 1) You must know that the value comes from an approximately normal distribution unless n ≥ 30 2) Standard deviation of the population must be known 3) Find the critical values 4) Compute the margin of error. E = zα/2 • (σ / √n) 5) Compute the interval x-bar ± E Example: If a random sample of size n = 20 from a normal population with variance 225 has mean x-bar of 64.3. Construct a 95% confidence interval for the population mean. Example: The axial loads of 175 cans have a sample mean of 267.1 lbs and population standard deviation of 22.1 lbs. Find a 99% confidence interval for the population mean. Y. Butterworth Ch. 8 Brase’s 9th Edition 4 Sometimes we might need to conduct a study in order to get sample data by which to make inferences about the population. If this is the case, we need to make sure that we have a large enough sample in order to make the desired inferences. In order to decide how large of a sample to compute, we need only to rewrite the margin of error in order to find the desired sample size to make inferences about the population mean. E = zα/2 • (σ / √n) so this means that E√n = zα/2 • σ which means that √n = zα/2 • σ E 2 and this means that n = zα/2 • σ E [ ] What do we need to decide on sample size 1) Decide on the confidence level 2) Decide on the margin of error 3) Know σ Note: If σ is unknown one way we can get around the problem is to estimate σ using the range rule of thumb based upon previous studies. We will discuss another theoretical method in the next section however. 2 4) Find n based upon n= [ ] zα/2 • σ E Example: Suppose that the weights of all fox terriers dogs are normally distributed with population standard deviation 2.5 kg. How large a sample should be taken in order to be 95% confident that the sample mean doesn’t differ from the population mean by more than 0.5kg? Example: You want to find an estimate for the population height of men in the US. You want to be 90% confident that the true population mean is within 2 inches of the true population mean. How large of sample must you draw to have this level of confidence and this margin of error? Y. Butterworth Ch. 8 Brase’s 9th Edition 5 §8.2 Estimating Mu when Sigma is Unknown When the standard deviation of the population is unknown the sampling distribution of the sample means does not follow the normal distribution but another related distribution called Student’s t-distribution. This distribution was discovered by a brewer for Guiness Brewing Company, but he was not allowed to publish his findings under his true name and although it was really Gosset’s distribution, it has come to be known in the field of science as Student’s t-distribution and thus it shall always be known as that! Facts about Student’s t-distribution 1) When s is used to approximate σ, the distribution of x-bar’s follows this distribution 2) Student’s t-distribution is symmetric about its mean (zero) and approximately bell-shaped 3) Student’s t-distribution is wider, flatter with thicker tails than the normal distribution 4) As the number of samples increases the distribution becomes more and more like the normal distribution (approximately 1000 is fairly normal) 5) The degrees of freedom, d.f. is n – 1 (in advanced texts this is referred to as ν, the Greek letter nu) 6) Still must be known that the data comes from an approximately normal distribution and if this is unknown then must be shown that the variables are approximately normally distributed (see the last chapter) or n ≥ 30 First, let’s practice using the table and then learn how to find the values on your calculator. I’ll need to upload a nice little program to everyone’s calculator before we can use our calculators to do that, however. Finding critical values for a confidence interval when the population standard deviation is unknown is the equivalent of finding critical values for the normal distribution, we just need an extra piece of information – the degrees of freedom (n – 1). Example: Using the t-table in Appendix II on A24 find the critical values for a sample size of 23 for which we want a 90% confidence interval for the population mean. Step 1: Find α Y. Butterworth Step 2: Compute the degrees of freedom – Step 3: Look in the top row for area in two-tails to find α Step 4: Look along the left column to find the degrees of freedom Step 5: Find the critical value in the body of the table Ch. 8 Brase’s 9th Edition n–1 6 Example: Using the t-table in Appendix II on A24 find the critical values for a sample of size 48 for which we want a 85% confidence interval for the population mean. Note: There is no df = 47 in the table. Convention says that when there is no corresponding degrees of freedom that we will use the next lower degrees of freedom available. Now, let’s put it all together and find confidence intervals for the population mean based upon Student’s t-distribution. Finding Confidence Intervals when σ is unknown 1) You must know that the value comes from an approximately normal distribution unless n ≥ 30 2) Standard deviation of the sample must be computed/computable 3) Find the critical values, tn–1,α 4) Compute the margin of error. E = tn–1,α • (s / √n) 5) Compute the interval x-bar ± E Example: We have a sample of size 37 with a sample mean of 20 and sample standard deviation of 2. Find a 90% confidence interval for the true population mean. Example: The data below represents the ages of people when they were 1st diagnosed with cancer. Show that a CI is relevant and give a 95% CI for the true population mean. Stem (x10) Leaf(x1) 0 2 1 57 2 779 3 5569 4 01 5 0 Step 1: Y. Butterworth Calculate the sample mean & std. dev. & median Ch. 8 Brase’s 9th Edition 7 Step 2: Y. Butterworth Investigate normality a) View stem-and-leaf b) Normal probability plot c) Mean vs Median d) Pearson’s Skewness e) Outliers? Step 3: Step 4: Find the critical value: tn–1,α Calculate the margin of error, E Step 5: Compute the interval Ch. 8 Brase’s 9th Edition 8 §8.3 Estimating p in the Binomial Distribution Notation p = population proportion p^ = x n q^ = 1 – p^ the sample proportion (read: p-hat) x = # of successes (recall binomial) n = # of trials the complement of the sample proportion α = probability that r.v. is in the tails of the dist. 1– α = Level of Confidence (be careful of how you interpret this!) zα/2 the critical value for our confidence interval Given this notation, then we can create (1–α)% confidence interval for the true population mean in the following way: If the following assumptions are met: 1) Fixed number of trials { Binomial Assumptions 2) Trials are independent 3) Two categories for the outcomes 4) Probabilities remain constant for each trial Normality  5) Assumption np≥5 & nq≥5 (1– α)% Confidence Interval for Population Proportion With margin of error: The (1 – α)% CI for p is: E = zα/2 √ p^ ± E This interval can be written in the form: Y. Butterworth p^ q^ n which creates an interval for which one has a (1–α) level of confidence, that it contains the true population proportion. ^ – E, p^ + E) p^ – E < p < p^ + E or (p Ch. 8 Brase’s 9th Edition 9 Now we will do a quick example so you can get the hang of the computation and then we will do an example with the data that you created in our last lab on random number generation and finding sample proportions. Example: A sample survey at ta supermarket showed that 204 of 300 shoppers use Cent’s-Off coupons regularly. Find a 99% confidence interval for the true population proportion of shoppers that use Cent’s-Off coupons. Step 1:Determine the values of: n = ________ & x = _______ Step 2:Calculate p^ & q^ p^ = ________ & q^ = ________ Step 3:Find α [(1 – Level of confidence)] α = _________ Step 4:Find zα/2 by looking up in large t, in t-table using two-tailed test for α use critical value for 99% listed on z-table look up (Level of Confidence) + α/2 in the body of positive z-value table (right-tail) look up α/2 in the body of the negative z-table (left-tail) 1) 2) 3) 4) Step 5:Calculate the margin of error E = ___________ Step 6:Give the confidence interval in interval notation Here are a couple more examples that we can work together or you can practice on your own. Example: In a random sample of 250 T.V. viewers in a certain area, 190 had seen a certain controversial program. Find a 90% CI for the true population proportion. Example: A study is being made to estimate the proportion of voters in a sizeable community who favor the construction of a nuclear power plant. It is found that only 140 of the 400 voters selected at random favor the project, find a 95% CI for the proportion of all voters in this community who favor the project. Y. Butterworth Ch. 8 Brase’s 9th Edition 10 Our knowledge of confidence intervals can also be used in a different way. We can use it to calculate a sample size to poll in order to achieve a set level of confidence and margin of error. This formula is found by solving the margin of error for n. n = [zα/2]2 p^ q^ E2 You may wonder why you would want to do this! Here is a real life example of how I used it in my consulting work. A large city wanted to conduct a survey to find out customer satisfaction with their garbage service. They want me to come up with a plan for conducting the survey, so I asked them a few questions pertaining to margin of error and set a confidence level at a fairly hefty size of 90% (survey’s seldom achieve that great of level of confidence due to sampling errors). With my calculation I indicated to them that they should poll around a thousand people (mind you this was in an area of around 100,000). Example: If your client wants a 90% confidence level and a margin of error equal to 0.2, calculate the sample size needed to ascertain that 90% confidence is achieved if a previous poll showed that p-hat was 0.47. *Note: When a p-hat is not known, we use p-hat=q-hat=0.5. Y. Butterworth Ch. 8 Brase’s 9th Edition 11 §8.4 Estimating Mu1–Mu2 and p1–p2 We’ve got the idea of how to compute confidence intervals by now, so let’s discuss how we would use intervals for differences. First, we need to make sure that we have 2 independent samples. Independent samples means that we can’t make pairing from one data point to another. Examples of dependent data are time series data (same object is measured twice) or data gathered for related experimental units (such as data gathered for twins). Differences are interested in the differences between proportions or means. Proportions: √ E = zα/2 p^1 q^1 n1 Interval: (p^1 – p^2) ± E + p^2 q^2 n2 Means with σ known E = z α/2 √ Interval: (x1 – x2) ± E σ12 n1 + σ22 n2 Means with σ unknown E = tnsmall-1, α/2 Interval: √ s12 + n1 s2 2 n2 (x1 – x2) ± E Note: If σ1 = σ2, but still unknown, then there is another calculation. See exercise #27 on p. 385 if you are interested. Interpretations of Intervals We can make inferences based upon the confidence intervals created that we can refer back to after we have discussed hypothesis testing in Chapter 9. Your book begins to make the inferences very informally at this time. I want you to know that all inferences made based upon confidence intervals for means are 100% valid and can be used to conduct a hypothesis test. I also want to warn you, that although we can make inferences about proportions using confidence intervals they are not as valid as inferences made using hypothesis testing as it is introduced in Chapter 9. It is generally safe to make a Y. Butterworth Ch. 8 Brase’s 9th Edition 12 preliminary statement based upon a confidence interval for a proportion but an actual inference should only be made and stated as a conclusion to a hypothesis test based on the traditional or p-value method introduced in Chapter 9. The reason for this is that the distributions are not the same for the test statistic and the confidence interval and therefore sometimes conclusions can be invalid. If the interval for μ1 – μ2 is all positive values then it can be inferred, with 1-α% confidence that μ1 > μ2 If the interval for μ1 – μ2 is all negative values then it can be inferred, with 1-α% confidence that μ1 < μ2 If the interval for μ1 – μ2 has both positive and negative values then it can be inferred, with 1-α% confidence that μ1 is not different from μ2 Your book makes the same generalizations for proportions. See my caution above! Without further explanation, let’s do some examples. Let’s call your text into play. Example 1: #8 p. 378 from Brase/Brase’s 9th edition, Understandable Statistics Inorganic phosphorous is a naturally occurring element in all plants and animals, with concentrations increasing progressively up the food chain. Geochemical surveys take soil samples to determine phosphorous content (in ppm). A high phosphorous content may or may not indicate an ancient burial site, food storage site, or even a garbage dump. Independent random samples from two regions gave the following phosphorous measurements (in ppm). Assume the distribution of phosphorous is moundshaped and symmetric for these two regions. Region 1: 855 1550 1230 875 1080 2330 1850 1860 2340 1080 910 1130 1450 1260 1010 Region 2: Y. Butterworth 540 640 810 1180 790 1160 1230 1050 1770 1020 960 1650 860 890 a) Which should be used, a z-interval or a t-interval? Why? b) What are the degrees of freedom? c) Find the critical value for an 80% CI using your calculator. d) Write out the margin of error calculations for a confidence level of 80% for the difference between the mean in R1 & R2. Ch. 8 Brase’s 9th Edition 13 e) Using your TI-83/84 find an 80% CI for μ1 – μ2 Note: We can use the data to compute the CI without even finding the mean and standard deviation with the TI 83/84 calculators f) Explain what the confidence interval means in context of this problem. Use information about the interval containing all positive, all negative or both positive and negative. g) At the 80% level of confidence, is one region more interesting than the other from a geochemical perspective? Use the information stated in f). Example 2: #14 p. 381 in Brase/Brase’s 9th edition, Understandable Statistics Most married couples have two or three personality preferences in common (see the reference to the Myers-Briggs Test in #13). Myers used a random sample of 375 married couples and found that 132 has three preferences in common. Another random sample of 571 couples showed that 217 had two personality preferences in common. Let p1 be the population proportions of all married couples with 3 preferences in common and p2 be the population proportion of all married couples with 2 preferences in common. a) Find p-hat1 b) Find p-hat2 c) Compute the margin of error for the difference between population 1 & 2 at the 90% confidence level. Y. Butterworth Ch. 8 Brase’s 9th Edition 14 d) Use your TI-83/84 to calculate the 90% CI for the difference between population proportions for pop 1 & pop 2. e) What do the values contained within the interval seem to indicate about the difference in the couples with 3 preferences in common and the couples with 2 preferences in common? Example 3: #18 p. 381 in Brase/Brase’s 9th edition, Understandable Statistics “Parental Sensitivity to Infant Cues: Similarities and Differences Between Mothers and Fathers,” by MV Graham (Journal of Pediatric Nursing, Vol. 8, No. 6), reports a study of parental empathy for sensitivity cues and baby temperament (higher mean scores means more empathy). Let x1 be a random variable that represent the score of a mother on an empathy test (in regards to her baby). Let x2 be the empathy score of a father. A random sample of 32 mothers gave a sample mean of 69.44. Another random sample of 32 fathers gave a mean of 59. Assume the population standard deviation is 11.69 for mothers and 11.60 for fathers. a) Which should be used, a z-interval or a t-interval? Why? b) What is the correct critical value for a 99% CI? c) Compute the margin of error for the difference between mother’s mean empathy score and father’s mean empathy score. d) Use your calculator to compute the 99% CI for the difference in mother’s and father’s mean empathy scores. Y. Butterworth Ch. 8 Brase’s 9th Edition 15 e) What does the confidence interval tell about the relationship between empathy scores for mothers and fathers at the 99% confidence level? f) Do you think we could draw the same conclusions as in e) for the 95% confidence level? Why or why not? Y. Butterworth Ch. 8 Brase’s 9th Edition 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document