Chapter 7: Sampling and Sampling distributions Statistical Inference is to make decisions that are based on data. We will study hypothesis testing, which is one of the tools of statistical inference. To do hypothesis testing, you will need to know sampling distributions, which are the distributions you encounter when doing sampling. For example, you want to estimate the mean of a distribution. You estimate the mean from your sample. If you continue to do this, you get many different values of sample means. If you find the frequency distribution of these means, that will approximate a sampling distribution. The continuous form of the distribution as the number of samples approaches infinity is a probability distribution know as a sampling distribution. 7.2 Sampling Distribution Central limit theorem: The sampling distribution of sample means approximates a normal distribution as the sample size becomes large no matter what the shape of the population distribution is. Xn X1 Population mean , variance 2 X5 X3 X2 X4 X X / n If sample size > 30, the distribution of the sample mean will approach a normal distribution. The mean of the sampling distributi on of means is the population mean X The variance of the means is inversely proportion al to the sample size X 2 X 2 n ( 2 is the population variance) [7.2] (Standard error of the means) [7.3] n Standard error means the standard deviation of a sampling distributi on. X The z - value for means is z / n Example Assume that a test is normally distributed with a mean of 800 and a standard deviation of 100. a) What is the probability that one person selected at random from the population will have a score at or above 850? z = (X - )/ = (850 – 800)/100 = 0.5 P(z > 0.5) = 0.3085 b) What is the probability that a sample of 20 people selected at random from the population will have a mean score above 850? z ( X ) /( / n ) (850 800 ) /(100 / 20 ) 2.24 P( z 2.24 ) 0.0125 Part a requires a normal population to be valid; part b is valid without a normal population although a sample size of 30 or more would be better. Learning Activity 7.2-2 Sampling distribution calculations Assume a machine produces parts with a mean diameter of 60.2 and a s.d. of 2.4. What is the probability that a randomly selected part will have a diameter greater than 62? What is the probability that a sample of 17 parts will have a mean diameter greater than 62? See SamplDist.xls!Solution Proportions (百分比) A proportion is a special case of a mean where the data are 0 and 1. The central limit theorem applies to proportions. (See 7.C-2 Proportion as a mean.) Mean of the proportion is ( the population proportion) p Standard error of the proportion (s. d. of the proportion) p (1 ) n [7.5] where p represents a sample proportion, and a population proportion. The mean of the proportion is and its variance is (1-)/n. Let X be a random variable representing the number of 1s in a sample of n 0s or 1s with the population proportion . Then X has a binomial distribution with E(X) = n and V(X) = n(1- ). Now what are the mean and variance for the sample proportion p = X/n, which itself is a mean? The central limit theorem applies. That is, p is normally distributed if n is large, and E(p) = E(X/n) = n/n = , V(p) = (1/n)2V(X) = (1/n2) n(1 - ) = (1-)/n s.d. of p (or standard error) is (1 ) n Actually, the population proportion is often unknown. We will use sample proportion p in the above equation. Target Sample size n (mean) (proportion) Estimator ˆ X Mean E (ˆ) Standard error ˆ / n (1 ) sample mean n p = X/n sample proportion : population mean, 2: population variance : population proportion n 7.A Generating Random Numbers In selecting random samples, it is necessary to generate random numbers. Random numbers are also used for Simulations and can be used to create sample datasets. In Excel you can generate random numbers Random numbers Between 0 and 1 Between 0 and 100 Integers between 0 and 99 Integers between 1 to 100 Between a and b Excel function =RAND() [note: no 1s] =RAND()*100 [note: no 100s] =INT(RAND()*100) =INT(RAND()*100) + 1 =RAND()*(b – a) + a You can also use MegaStat | Generate Random Numbers to generate numbers with uniform, normal or exponential distributions. Learning Activity 7.A-1 Generating Random Numbers Open RandomNumbers.xls!Start Create a sample of 20 random numbers between 1 and 100 by using RAND() function. Use ROUND function to round the value to 3 decimal places. See RandomNumbers.xls!RandInt. Use MegaStat | Generate Random Numbers to generage 300 normal random numbers with mean of 100 and s.d of 16, specifying 0 decimal places and live function. Look at how the normal random number are generated by =ROUND(NORMINV(RAND(),100,16),0). [The values from RAND() are random probabilities that are input into NORMINV() function to create normally distributed random numbers.] Note: NORMINV(probability, mean, standard_dev) Check the histogram of the random numbers you created. Randomizing data (to select random samples) You can rearrange (or shuffle) a column of existing values randomly. Learning Activity 7.A-2 Randomizing Data (take n samples randomly from a pool of data) Open RandomNumbers.xls!PriceData. Type =RAND() in cell C2 and copy it down through C125. Click anywhere in the random number range and then click Excel’s Sort Ascending or Sort Descending. You can take the first n values, they are your random samples. If you want to put the numbers back in their original order, sort the No. column. 7.B Central Limit Theorem Simulation Open CLT.xls. The Excel workbook contains 600 random samples (=RAND()*100,1), i.e. from a uniform population, and summarize the distribution of the population and the distribution of the means, that is the sampling distribution.) See the distribution of the means. 7.C-2 Proportion as a mean A professor asks each of his student if he or she owns or rents and codes the data 1 = own, 0 = rent. The proportion of his students who owns their own home is p = count/n = 11/25 = 0.44. You will get the same answer if you calculate the mean of the data. Open Proportions.xls!Start. Check Solution1. Open Proportions.xls!Practice. Check Solution2.