Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistic Business Week 7 Sampling and Sampling Distribution Agenda Time 30 minutes 60 minutes 60 minutes 50 minutes Activity Sampling Sampling Distribution of the Mean Sampling Distribution of the Proportion Exercise Learning Objectives In this chapter, you learn: • To distinguish between different sampling methods • The concept of the sampling distribution • To compute probabilities related to the sample mean and the sample proportion • The importance of the Central Limit Theorem SAMPLING Why Sample? • Selecting a sample is less time-consuming than selecting every item in the population (census). • Selecting a sample is less costly than selecting every item in the population. • An analysis of a sample is less cumbersome and more practical than an analysis of the entire population. A Sampling Process Begins With A Sampling Frame • The sampling frame is a listing of items that make up the population • Frames are data sources such as population lists, directories, or maps • Inaccurate or biased results can result if a frame excludes certain portions of the population • Using different frames to generate data can lead to dissimilar conclusions Types of Samples Samples Non-Probability Samples Judgment Convenience Probability Samples Simple Random Systematic Stratified Cluster Types of Samples: Nonprobability Sample • In a nonprobability sample, items included are chosen without regard to their probability of occurrence. – In convenience sampling, items are selected based only on the fact that they are easy, inexpensive, or convenient to sample. – In a judgment sample, you get the opinions of preselected experts in the subject matter. Types of Samples: Probability Sample • In a probability sample, items in the sample are chosen on the basis of known probabilities. Probability Samples Simple Random Systematic Stratified Cluster Probability Sample: Simple Random Sample • Every individual or item from the frame has an equal chance of being selected • Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn’t returned to the frame). • Samples obtained from table of random numbers or computer random number generators. Selecting a Simple Random Sample Using A Random Number Table Sampling Frame For Population With 850 Items Item Name Item # Bev R. Ulan X. . . . . Joann P. Paul F. 001 002 . . . . 849 850 Portion Of A Random Number Table 49280 88924 35779 00283 81163 07275 11100 02340 12860 74697 96644 89439 09893 23997 20048 49420 88872 08401 The First 5 Items in a simple random sample Item # 492 Item # 808 Item # 892 -- does not exist so ignore Item # 435 Item # 779 Item # 002 Probability Sample: Systematic Sample • Decide on sample size: n • Divide frame of N individuals into groups of k individuals: k=N/n • Randomly select one individual from the 1st group • Select every kth individual thereafter N = 40 n=4 k = 10 First Group Probability Sample: Stratified Sample • Divide population into two or more subgroups (called strata) according to some common characteristic • A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes • Samples from subgroups are combined into one • This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines. Population Divided into 4 strata Probability Sample Cluster Sample • Population is divided into several “clusters,” each representative of the population • A simple random sample of clusters is selected • All items in the selected clusters can be used, or items can be chosen from a cluster using another probability sampling technique • A common application of cluster sampling involves election exit polls, where certain election districts are selected and sampled. Population divided into 16 clusters. Randomly selected clusters for sample Probability Sample: Comparing Sampling Methods • Simple random sample and Systematic sample – Simple to use – May not be a good representation of the population’s underlying characteristics • Stratified sample – Ensures representation of individuals across the entire population • Cluster sample – More cost effective – Less efficient (need larger sample to acquire the same level of precision) Evaluating Survey Worthiness • • • • • What is the purpose of the survey? Is the survey based on a probability sample? Coverage error – appropriate frame? Nonresponse error – follow up Measurement error – good questions elicit good responses • Sampling error – always exists Types of Survey Errors • Coverage error or selection bias – Exists if some groups are excluded from the frame and have no chance of being selected • Non response error or bias – People who do not respond may be different from those who do respond • Sampling error – Variation from sample to sample will always exist • Measurement error – Due to weaknesses in question design, respondent error, and interviewer’s effects on the respondent (“Hawthorne effect”) Types of Survey Errors (continued) • Coverage error Excluded from frame • Non response error Follow up on nonresponses • Sampling error • Measurement error Random differences from sample to sample Bad or leading question SAMPLING DISTRIBUTION OF THE MEAN Sampling Distributions • A sampling distribution is a distribution of all of the possible values of a sample statistic for a given size sample selected from a population. • For example, suppose you sample 50 students from your college regarding their mean GPA. If you obtained many different samples of 50, you will compute a different mean for each sample. We are interested in the distribution of all potential mean GPA we might calculate for any given sample of 50 students. Developing a Sampling Distribution • Assume there is a population … • Population size N=4 • Random variable, X, is age of individuals • Values of X: 18, 20, 22, 24 (years) A B C D Developing a Sampling Distribution (continued) Summary Measures for the Population Distribution: X μ P(x) i N .3 18 20 22 24 21 4 σ (X i μ) N .2 .1 0 2 2.236 18 A B 20 C 22 D 24 Uniform Distribution x Developing a Sampling Distribution (continued) Now consider all possible samples of size n=2 1st Obs 2nd 16 Sample Means Observation 18 20 22 24 18 18,18 18,20 18,22 18,24 20 20,18 20,20 20,22 20,24 1st 2nd Observation Obs 18 20 22 24 22 22,18 22,20 22,22 22,24 18 18 19 20 21 24 24,18 24,20 24,22 24,24 20 19 20 21 22 16 possible samples (sampling with replacement) 22 20 21 22 23 24 21 22 23 24 Developing a Sampling Distribution (continued) Sampling Distribution of All Sample Means Sample Means Distribution 16 Sample Means 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 _ P(X) .3 .2 20 19 20 21 22 .1 22 20 21 22 23 0 24 21 22 23 24 18 19 20 21 22 23 24 (no longer uniform) _ X Developing a Sampling Distribution (continued) Summary Measures of this Sampling Distribution: μX X i 18 19 19 24 21 σX N 16 2 ( X μ ) i X N (18 - 21)2 (19 - 21)2 (24 - 21)2 1.58 16 Comparing the Population Distribution to the Sample Means Distribution Population N=4 μ 21 Sample Means Distribution n=2 μX 21 σ 2.236 σ X 1.58 _ P(X) .3 P(X) .3 .2 .2 .1 .1 0 18 A 20 B 22 C 24 D X 0 18 19 20 21 22 23 24 _ X Sample Mean Sampling Distribution: Standard Error of the Mean • Different samples of the same size from the same population will yield different sample means • A measure of the variability in the mean from sample to sample is given by the Standard Error of the Mean: (This assumes that sampling is with replacement or sampling is without replacement from an infinite population) σ σX n • Note that the standard error of the mean decreases as the sample size increases Sample Mean Sampling Distribution: If the Population is Normal • If a population is normally distributed with mean μ and standard deviation σ, the sampling distribution of X is also normally distributed with μX μ and σ σX n Z-value for Sampling Distribution of the Mean • Z-value for the sampling distribution of X : Z where: ( X μX ) σX ( X μ) σ n X = sample mean μ = population mean σ = population standard deviation n = sample size Sampling Distribution Properties μx μ (i.e. x is unbiased ) Normal Population Distribution μ x μx x Normal Sampling Distribution (has the same mean) Sampling Distribution Properties (continued) – As n increases, – σ x decreases Larger sample size Smaller sample size μ x Example Oxford Cereals fills thousands of boxes of cereal during an eight-hour shift. As the plant operations manager, you are responsible for monitoring the amount of cereal placed in each box. To be consistent with package labeling, boxes should contain a mean of 368 grams of cereal. Because of the speed of the process, the cereal weight varies from box to box, causing some boxes to be underfilled and others overfilled. If the process is not working properly, the mean weight in the boxes could vary too much from the label weight of 368 grams to be acceptable. Example Because weighing every single box is too timeconsuming, costly, and inefficient, you must take a sample of boxes. For each sample you select, you plan to weigh the individual boxes and calculate a sample mean. You need to determine the probability that such a sample mean could have been randomly selected from a population whose mean is 368 grams. Based on your analysis, you will have to decide whether to maintain, alter, or shut down the cerealfilling process. Example a. If you randomly select a sample of 25 boxes without replacement from the thousands of boxes filled during a shift, the sample contains much less than 5% of the population. Given that the standard deviation of the cereal-filling process is 15 grams, compute the standard error of the mean. Example a. If you randomly select a sample of 25 boxes without replacement from the thousands of boxes filled during a shift, the sample contains much less than 5% of the population. Given that the standard deviation of the cereal-filling process is 15 grams, compute the standard error of the mean. σX σ n 15 25 3 Example b. How is the standard error of the mean affected by increasing the sample size from 25 to 100 boxes? Example b. How is the standard error of the mean affected by increasing the sample size from 25 to 100 boxes? σX σ n 15 100 1.5 Example c. If you select a sample of 100 boxes, what is the probability that the sample mean is below 365 grams? Example c. If you select a sample of 100 boxes, what is the probability that the sample mean is below 365 grams? 365 368 X μ 365 368 3 Z 2 σ 15 1.5 n 100 P( x 365) P( z 2) 0.0228 Example d. Find an interval symmetrically distributed around the population mean that will include 95% of the sample means, based on samples of 25 boxes. Example d. Find an interval symmetrically distributed around the population mean that will include 95% of the sample means, based on samples of 25 boxes. 95% XL Therefore: 368 XU P( X X L ) 0.025 P( X X U ) 0.975 Example P( X X L ) 0.025 P( X X U ) 0.975 Z X L 1.96 Z XU 1.96 X L 368 - 1.96 15 25 X L 368 1.96 15 25 X L 368 1.96.3 X L 368 1.96.3 X L 362.12 X L 373.88 EXERCISE Exercise 1 The U.S. Census Bureau announced that the median sales price of new houses sold in 2009 was $215,600, and the mean sales price was $270,100. Assume that the standard deviation of the prices is $90,000. a. If you select a random sample of n = 100, what is the probability that the sample mean will be less than $300,000? b. If you select a random sample of n = 100, what is the probability that the sample mean will be between $275,000 and $290,000? Exercise 2 Time spent using e-mail per session is normally distributed, with minutes and minutes. If you select a random sample of 25 sessions, a. what is the probability that the sample mean is between 7.8 and 8.2 minutes? b. what is the probability that the sample mean is between 7.5 and 8 minutes? c. If you select a random sample of 100 sessions, what is the probability that the sample mean is between 7.8 and 8.2 minutes? d. Explain the difference in the results of (a) and (c). Exercise 3 The amount of time a bank teller spends with each customer has a population mean, = 3.10 minutes and a standard deviation, = 0.40 minute. If you select a random sample of 16 customers, a. what is the probability that the mean time spent per customer is at least 3 minutes? b. there is an 85% chance that the sample mean is less than how many minutes? c. What assumption must you make in order to solve (a) and (b)? d. If you select a random sample of 64 customers, there is an 85% chance that the sample mean is less than how many minutes? ANSWER Exercise 1 a. Karena mean > median, distribusi populasi harga jual akan menceng ke kiri. Karena n=3 (n<30) maka distribusi sampelnya juga akan menceng ke kiri b. Karena n=100 maka distribusi sampelnya akan mendekati normal dengan rata-rata $274.300 dan simpangan baku $9.000 c. 0.9996 d. 0.2796 Exercise 3 a. P(X >3) = P(Z>-1.00) = 1.0 – 0.1587 = 0.8413 b. P(Z<1.04) = 0.85 X = 3.10 + 1.04 (0.1) = 3.204 a. Distribusi populasi paling tidak harus simetris b. P(Z<1.04) = 0.85 X = 3.10 + 1.04 (0.05) = 3.152 CENTRAL LIMIT THEOREM Central Limit Theorem As the sample size gets large enough… n↑ the sampling distribution becomes almost normal regardless of shape of population x Sample Mean Sampling Distribution: If the Population is not Normal How Large is Large Enough? • For most distributions, n > 30 will give a sampling distribution that is nearly normal • For fairly symmetric distributions, n > 15 will usually give a sampling distribution is almost normal • For normal population distributions, the sampling distribution of the mean is always normally distributed SAMPLING DISTRIBUTION OF THE PROPORTION Population Proportions π = the proportion of the population having some characteristic Sample proportion ( p ) provides an estimate of π: p X number of items in the sample having the characteristic of interest n sample size • 0≤ p≤1 • p is approximately distributed as a normal distribution when n is large • (assuming sampling with replacement from a finite population or without replacement from an infinite population) Sampling Distribution of p • Approximated by a normal distribution if: nπ 5 P( ps) Sampling Distribution .3 .2 .1 0 and 0 n(1 π ) 5 where μp π and .2 .4 .6 8 π(1 π ) σp n (where π = population proportion) 1 p Z-Value for Proportions Standardize p to a Z value with the formula: p Z σp p (1 ) n Example • Suppose that the manager of the local branch of a bank determines that 40% of all depositors have multiple accounts at the bank. • If you select a random sample of 200 depositors, because n = 200(0.40) = 80 ≥ 5 and n(1 – ) = 200(0.60) = 120 ≥ 5, the sample size is large enough to assume that the sampling distribution of the proportion is approximately normally distributed. • Calculate the probability that the sample proportion of depositors with multiple accounts is less than 0.30 Example Z p (1 ) n 0.30 0.40 (0.40)(0.60) 200 0.10 0.24 200 2.89 P(Z<-2.89) = 0.0019 Therefore, if the population proportion of items of interest is 0.40, only 0.19% of the samples of would be expected to have sample proportions less than 0.30. EXERCISE Exercise 4 A political pollster is conducting an analysis of sample results in order to make predictions on election night. Assuming a two-candidate election, if a specific candidate receives at least 55% of the vote in the sample, that candidate will be forecast as the winner of the election. If you select a random sample of 100 voters, what is the probability that a candidate will be forecast as the winner when a. the population percentage of her vote is 50.1%? b. the population percentage of her vote is 60%? c. the population percentage of her vote is 49% (and she will actually lose the election)? Exercise 5 In a recent survey of full-time female workers ages 22 to 35 years, 46% said that they would rather give up some of their salary for more personal time. Suppose you select a sample of 100 full-time female workers 22 to 35 years old. a. What is the probability that in the sample, fewer than 50% would rather give up some of their salary for more personal time? b. What is the probability that in the sample, between 40% and 50% would rather give up some of their salary for more personal time? c. What is the probability that in the sample, more than 40% would rather give up some of their salary for more personal time? ANSWER Exercise 4 a. = 0.501 (1 ) 0.501(1 0.501) P 0.05 n 100 P(p>0.55) = P(Z>0.98) = 1.0 – 0.8365 = 0.1635 b. = 0.60 (1 ) 0.6(1 0.6) P 0.04899 n 100 P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461 Exercise 4 c. = 0.49 (1 ) 0.49(1 0.49) P 0.05 n 100 P(p>0.55) = P(Z>1.28) = 1.0 – 0.8849 = 0.1151 d. = 0.60 (1 ) 0.6(1 0.6) P 0.04899 n 100 P(p>0.55) = P(Z>-1.021) = 1.0 – 0.1539 = 0.8461 THANK YOU