* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The process of Statistics
Survey
Document related concepts
Transcript
ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 SAMPLING DISTRIBUTIONS Sampling Distribution of the (sample) mean (Central Limit Theorem) Let μ = mean for the population of interest Let σ = standard deviation for the population of interest Let x = mean for the sample (sample mean) If numerous random samples of the same size n are taken and the n observations of each sample are independent, the distribution of the possible values for x is approximately normal, with Mean = μ Standard deviation = sd ( x ) = σ n Sample Size (n) 1. Small (n<30): only use the normal curve approximation if from a bell-shaped population 2. Large (n>30): does not have to be bell-shaped (can be any shape) to use normal curve approximation for sample means. 3. Extreme Outliers: may need a larger sample size (much larger than 30) Examples: 1. Recall the activity on NFL player weights. As n increased, what happened to the range of the distribution of sample means? Law of Large Numbers: 2. What is the sampling distribution for the sample mean of 200 NFL players’ weights? 1 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 3. Consider the ST 311 data on the number of cigarettes smoked a day. The distribution is skewed right with mean 0.586 cigarettes and standard deviation 2.26. a. Suppose I select a sample of 15 people from ST 311. Can you describe the sampling distribution of the mean number of cigarettes smoked a day by 15 ST 311 students? Why or why not? If so, what is the sampling distribution? b. Now suppose I select a sample of 200 ST 311 students and determine the mean number of cigarettes smoked a day. Can you describe the sampling distribution of the sample mean of 200 ST 311 students? Why or why not? If so, what is the sampling distribution? 4. Now consider the cost of textbooks. The distribution of the cost of textbooks is bell-shaped with mean $348 and standard deviation $143.7. a. Suppose I select a sample of 15 people from ST 311. Can you describe the sampling distribution of the mean cost of textbooks for 15 ST 311 students? Why or why not? If so, what is the sampling distribution? b. Now suppose I select a sample of 200 ST 311 students and determine the mean cost of textbooks for them. Can you describe the sampling distribution of the sample mean of 200 ST 311 students’ textbook costs? Why or why not? If so, what is the sampling distribution? 2 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Standardized Z-score for the Sample Mean Recall the z-score for an individual randomly drawn from a normally distributed population. z = ____________ Now instead of an individual, we consider a sample mean. Recall that the standard deviation for the sample mean is the population standard deviation divided by the square root of the sample size, i.e. sd ( x ) = σ n . The z-score is z = ____________ = ____________________________________ Example 1: Coca-cola uses a filling machine to fill 12 oz cans. Each can is to contain 355 milliliters of soda. In fact, the amount varies according to a normal distribution with mean 355.2 ml and standard deviation of 0.5 ml. 1. What is the probability that an individual can contains less than 355 ml? 2. What is the probability that the mean content of a 6-pack of cans is less than 355 ml? 3. I got a six pack of Coke from the store the other day. When I measured the amount of soda in the six cans the mean amount was 353 ml. What is the probability that my sixpack has a mean amount of soda of 353 ml? What does this lead you to believe about the filling machines? What about the original distribution for our population? 3 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Example 2: McDonald’s claims that the average time required to fill an order is 2.5 minutes and that the standard deviation is 15 seconds. Wendy’s wants to prove McDonald’s wrong. Over a 1-week period, Wendy’s timed a sample of 40 orders at McDonalds and it obtained a sample mean of 160 seconds. 1. We are not told the distribution of time required to fill an order at McDonald’s. Can we still use the normal curve as an approximation for the sampling distribution of the mean time? Why or why not? 2. Do you think the restaurant’s 2.5 minute claim is true? Example 3: In baseball, a “no-hitter” is a regulation 9-inning game in which the pitcher yields no hits to the opposing batters. Chance (Summer 1994) reported on a study of no-hitters in Major League Baseball. The initial analysis focused on the total number of hits yielded per game per team for all 9inning games played between 1989 and 1993. The distribution of hits/9-innings is approximately normal with mean 8.72 and standard deviation 1.10. 1. What percentage of 9-inning games results in fewer than 5 hits? 2. What is the probability that the average number of hits for ten 9-inning games is less than 5 hits? 3. Demonstrate statistically, why a no-hitter is considered an extremely rare occurrence. 4 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Standard Error of the Mean In practice the population standard deviation σ is rarely known, but the sample standard deviation s is known. We can use standard error to estimate the theoretical standard deviation of the sampling distribution for the sample means sd ( x ) = σ n Standard Error: Measures roughly how much, on average, the sample mean x is in error as an estimate of the population mean μ s se( x ) = n where s is the standard deviation of the sample of size n. Example: Consider the NFL player weights. The standard deviation for the population is 45.68 lbs with mean 245.25 lbs. 1. I select a sample of 50 players and the standard deviation of the sample is 50.36 lbs with mean 247.27 lbs. What is the standard deviation of the sampling distribution of the sample mean? What is the standard error of the sample mean? 2. I select another sample of 50 players. The standard deviation of the sample is 46.82 lbs with mean 238 lbs. What is the standard deviation of the sampling distribution of the sample mean? What is the standard error of the sample mean? 3. Compare the standard deviation of the sample means for 1 and 2. Are they the same? Different? 4. Now compare the standard error of the sample means? Are they the same? Different? Why? 5 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Generalizations about Sampling Distributions 1. As long as certain conditions are met, the sampling distribution is normal a. _________________________________ b. _________________________________ 2. The mean of the sampling distribution is the population parameter that corresponds to the sample statistic Example:__________________________________________ 3. The standard deviation of the sampling distribution measures how the values of the sample statistics might vary across different samples from the same population a. Sample size: As the sample size gets __________________, the variability among possible values of the statistic from different samples gets ___________________ 4. Central Limit Theorem: if n is sufficiently large, the sample means of random samples from a population with mean μ and finite standard deviation σ are approximately normally distributed with mean μ and standard deviation σ n http://courses.ncsu.edu/st311/common/basic.html Population Distribution ( μ , σ ) Sampling Distribution (n, x , sd ( x ) , se( x ) ) 6 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Sampling Distribution of Sample Means Example: As reported by the U.S. National Center for Health Statistics, the mean high-densitylipoprotein (HDL) cholesterol of females 20-29 years old is 53. If HDL cholesterol is normally distributed with standard deviation 13.4… 1. What is the probability that randomly selected female 20-29 years old will have an HDL cholesterol level above 60? 2. What is the probability that random sample of 20 year olds will have a mean cholesterol level above 60? 3. What might you conclude if a random sample of 20 females 20-29 years old had a mean cholesterol level above 60? 7 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 SAMPLING DISTRIBUTIONS Sample Proportions What other statistics can we make inferences about using the normal curve approximation for the sampling distribution? So far, we have focused on statistics for quantitative variables (mean, standard deviation, etc.) and have neglected categorical variables. We can use the foundations we have built for quantitative variables to make inferences about categorical variables. In Class Activity: Coin Toss 1. Get into a groups of two, assign one person as the “flipper” and the other as the recorder. 2. The flipper will flip the coin 50 times while the recorder records the number of heads and tails that occur during the 50 flips. 3. Once finished, write the number of heads on the post-it note and write your names on the back. Go up to the chalk board and place the post-it in the appropriate bin on the chalkboard 8 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Sampling Distribution for Sample Proportions Let p = population proportion of interest or binomial probability of success Let p̂ = corresponding sample proportion or proportion of successes If numerous random samples or repetitions of the same size n are taken, the distribution of the possible values of p̂ is approximately a normal curve distribution with Mean = p Standard deviation = s.d .( pˆ ) = p (1 − p ) n Sample size: Need a sample size large enough to observe at least ten of each response (success/failure), i.e., 1. np ≥ 10 2. n(1-p) ≥ 10 Examples: In the coin toss example, we know that the population proportion for the number of heads should be p=_______. What is the sampling distribution for the sample proportion of the 50 coin tosses? What is the sampling distribution for the proportion of heads in 100 flips? Compare the standard deviation for 50 and 100 flips. If I were to keep flipping the coin forever what would expect the sample proportion of heads to be? What about the standard deviation? What would the sampling distribution be for the proportion of tails in 100 flips? 9 ST 311- Introduction to Statistics Instructor: Judith Canner Learning Objectives E Spring 2010 Standardized Score for Sample Proportions Similar to the sample mean, if the correct conditions are met, we can use the normal distribution to describe the sampling distribution of the sample proportion. Therefore, we can also make inferences about the sample proportion in the same way we make inferences about the sample mean, by using the z-score to calculate probabilities associated with different sample proportions. z= sample proportion − population proportion standard deviation of the sample proportion z= pˆ − p = s.d .( pˆ ) pˆ − p p (1 − p ) n Examples: What is the probability that we flip 40 heads in 50 coin tosses? Let’s consider one of your samples. What is the probability that someone flipped _____ heads in 50 coin tosses? What would be the probability that someone flipped the same number, _____, of tails? Standard Error for Sample Proportions Just as with the sample means, we generally do not know the actual population proportion, p, so we use the standard error to estimate the standard deviation of the sample proportion, p̂ . s.e.( pˆ ) = pˆ (1 − pˆ ) n 10