Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Notes – Chapter 18 Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of a parameter is unknown. (µ, σ, and now p or π) A statistic is a number that can be computed from the sample data without making use of any unknown parameters. We often use a statistic to estimate an unknown parameter. ( x , sx, and now p̂ ) No longer is a proportion something we just compute from a set of data. We now see it as a random quantity that has a distribution. We call that distribution the sampling distribution model for the proportion. Sampling variability is the concept that in repeated random sampling, the value of the statistic will vary. This makes sense; the proportions vary from sample to sample because the samples are composed of different values. To describe sampling distributions, use the same descriptions as other distribution: overall shape, outliers, center, and spread. The term bias has been used to suggest that a sample technique favors a certain outcome. When we use the term bias in relation to a sampling distribution, it is the idea that the center of the sampling distribution is not that of the population. A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. The variability of a statistic is described by the spread of its sampling distribution. The spread is determined by the sampling design and the sample size. Larger samples give less variability. Sampling Distribution of a Sample Proportion – Categorical Data Choose an SRS of size n from a large population with population proportion p having some characteristic of interest. Let p̂ be the proportion of the sample having that characteristic. Then the sampling distribution of p is approximately normal as long as the conditions on the following page are met. So, provided we meet the conditions, this sampling distribution will be N(p, p (1 p ) ) n Conditions: 1) Randomization. The sample should be a simple random sample (SRS) of the population. (This is often difficult to achieve in reality. At the very least, we need to be very confident that the sampling method was not biased and that the sample is representative of the population.) 2) 10% Rule. In order to insure independence, we cannot take a sample that is too large without replacement. As long as our sample is no more than 10% of our population size, we protect independence. 3) Success/Failure. To insure that the sample size is large enough to approximate normal, we must expect at least 10 successes and at least 10 failures. np 10 and n(1 – p) 10 Examples 1) For the years 2000 – 2002, the proportion of mothers in the state of Texas under the age of 18 that gave birth to children less than 2500 grams was 9.6%. A) Draw the sampling distribution of p-hat based on a random sample of 200. B) What is the probability that more than 12% of the sample of mothers gave birth to children less than 2500 grams?\ C) What is the probability that less than 5% of the sample of mothers gave birth to children less than 2500 grams? 2) Through the census bureau we know that approximately 64% of all US households have children under the age of 16. We take a random sample of 100 households in the GCISD attendance area and find that 68% of households have children under the age of 16. If the GCISD attendance area follows the national model, what is the probability that we will get a proportion as large as 68%? 3) A manufacturer of computer printers purchases plastic ink cartridges from a vendor. When a large shipment is received, a random sample of 200 cartridges is selected, and each is inspected. If the sample proportion of defectives is more than .02, the entire shipment will be returned to the vendor. A) What is the approximate probability that the shipment will be returned if the true proportion of defectives in the shipment is .05? Be sure to check the conditions necessary for accurate probabilities using proportions. B) What is the approximate probability that the shipment will not be returned when the true proportion of defectives in the shipment is .10? Sampling Distribution of a Sample Mean – Quantitative Data Sample means are when a distribution is created from the means of many samples. We do this because: *Averages are less variable than individual observations *Averages are more normal than individual observations The mean and standard deviations of a population are and respectively. These are parameters. The mean and standard deviation calculated from sample data are statistics. We write the sample mean and the sample standard deviation as sx. x x Suppose that is the mean of an SRS of size n drawn from a large population with mean and standard deviation . Then the mean of the sampling distribution of x is and its standard deviation is /n. x ***The values of are less spread out for larger samples Their standard deviation decreases at the rate n, so you must take a sample 4 times as large to cut the standard deviation of x in half. It makes sense that the shape of the distribution x depends on the shape of the population distribution. ** If the population distribution is normal, then so is the distribution of the sample mean regardless of sample size. Even for skewed or odd shaped distributions, if the sample size is large enough, the sampling distribution will still be approximately normal. This idea leads us to… The Central Limit Theorem (CLT) The mean of a random sample has a sampling distribution whose shape can be approximated by a normal model. The larger the sample, the better the approximation will be. The sampling distribution of the sample mean x is close to the normal distribution N(, /n). The Law of Large Numbers Draw observations at random from any population with finite mean . As the number of observations drawn increases, the mean of the observed values gets closer and closer to . x The Central Limit Theorem (CLT) allows us to use normal probability calculations to answer questions about sample means as long as we meet the following conditions. Conditions: 1) Randomization. The sample should be a simple random sample (SRS) of the population. (This is often difficult to achieve in reality. At the very least, we need to be very confident that the sampling method was not biased and that the sample is representative of the population.) 2) 10% Rule. In order to insure independence, we cannot take a sample that is too large without replacement. As long as our sample is no more than 10% of our population size, we protect independence. 3) Large Enough Sample. The truth is, it depends. There is no “for sure” way to tell. It is common practice to say any sample where n ≥ 30, you are safe to assume normality for the sampling distribution. We said at the beginning that in most real life cases, we will not know the population parameters (µ, σ, p or π) so we will have to use the sample statistics as estimates of those. Our terminology changes just a little… If we don’t know µ - we estimate it with x σ – we estimate it with sx p̂ p(π) – we estimate it with if we use these estimates to calculate the variability for a sampling distribution, we now call that the standard error (instead of the standard deviation). So… If SD( p̂ ) = p (1 p ) n Then the SE( p̂ ) = pˆ (1 pˆ ) n X And… If SD( x ) = n Then the SE( x)= sx n Examples: 4) A soft-drink bottler claims that can volume is normally distributed with a mean of 12 oz of soda and a standard deviation of .16 oz. A random sample of sixteen cans are selected and the soda volume determined for each one Because the population distribution is normal, the sample distribution of x is also normal. A) Give the mean and standard deviation of the sampling distribution. B) Find the probability that the sample mean soda volume falls between 11.94 and 12.06 ounces. C) In this sample, the sample average soda volume is found to be 11.9 ounces. How likely is this to happen if the true population mean is 12 ounces? 5) The College Student Journal (Dec. 1992) investigated differences in traditional and nontraditional students, where nontraditional students are generally defined as those 25 year old or older. Based on the study results, we can assume that the population mean and standard deviation for the GPA of all nontraditional students is 3.5 and 0.5 respectively. Suppose that a random sample of 100 nontraditional students is selected from the population of all nontraditional students. A) Find the mean and standard deviation of the sampling distribution. B) What is the approximate probability that the nontraditional student sample has a mean GPA between 3.40 and 3.65? C) What is the approximate probability that the sample of 100 nontraditional students has a mean GPA that exceeds 3.62? 6) As a graduate student I randomly sampled 25 CHHS students and recorded their IQ score. I got a mean of 136 with a standard deviation of 2.4. My professor tells me that for the purposes of my research, that standard deviation is too high. I need to reduce it to no more than 0.6. A) How can I reduce the standard deviation? B) What sample size would I need? 7)The counselors are working with SAT scores and take a random sample of 60 CHHS students. Their data shows a math SAT mean of 580 with a standard deviation of 12.8 If they need to reduce the variability to 1/8 of that, what sample size would they need?