Download Sampling Distributions

Chapter 7 Sampling Distributions Introduction • In real life calculating parameters of populations is prohibitive (difficult to find) because populations are very large. • Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference. • The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter. population Sampling Techniques Parameters Statistics x, s2 ,  2 inference sample Statistical Procedures Sampling Example: • A pollster is sure that the responses to his “agree/disagree” question will follow a binomial distribution, but p, the proportion of those who “agree” in the population, is unknown. • An agronomist believes that the yield per acre of a variety of wheat is approximately normally distributed, but the mean  and the standard deviation  of the yields are unknown.  If you want the sample to provide reliable information about the population, you must select your sample in a certain way! Types of Sampling Methods/Techniques The sampling plan or experimental design determines the amount of information you can extract, and often allows you to measure the reliability of your inference. Sampling Non-Probability Samples Probability Samples Simple Random Judgement Quota Chunk Stratified Cluster Systematic Simple Random Sampling Sampling Plan: 1. Simple random sampling is a method of sampling that allows each possible sample of size n an equal probability of being selected. Example •There are 89 students in a statistics class. The instructor wants to choose 5 students to form a project group. How should he proceed? 1. Give each student a number from 01 to 89. 2. Choose 5 pairs of random digits from the random number table. 3. If a number between 90 and 00 is chosen, choose another number. 4. The five students with those numbers form the group. Other Sampling Techniques • There are several other sampling plans that still involve randomization: 2. Stratified random sample: Divide the population into subpopulations or strata and select a simple random sample from each strata. 3. Cluster sample: Divide the population into subgroups called clusters; select a simple random sample of clusters and take a census of every element in the cluster. 4. 1-in-k systematic sample: Randomly select one of the first k elements in an ordered population, and then select every k-th element thereafter. Examples • Divide West Malaysia into states and take a simple random sample within each state. • Divide West Malaysia into states and take a simple random sample of 5 states. • Divide a city into city blocks, choose a simple random sample of 10 city blocks, and interview all who live there. • Choose an entry at random from the phone book, and select every 50th number thereafter. Non-Random Sampling Plans • There are several other sampling plans that do not involve randomization. They should NOT be used for statistical inference! 1. Convenience sample: A sample that can be taken easily without random selection. • People walking by on the street 2. Judgment sample: The sampler decides who will and won’t be included in the sample. 3. Quota sample: The makeup of the sample must reflect the makeup of the population on some selected characteristic. • Race, ethnic origin, gender, etc. Types of Samples • Sampling can occur in two types of practical situations: 1. Observational studies: The data existed before you decided to study it. Watch out for  Nonresponse: Are the responses biased because only opinionated people responded?  Undercoverage: Are certain segments of the population systematically excluded?  Wording bias: The question may be too complicated or poorly worded. Types of Samples • Sampling can occur in two types of practical situations: 2. Experimentation: The data are generated by imposing an experimental condition or treatment on the experimental units.  Hypothetical populations can make random sampling difficult if not impossible.  Samples must sometimes be chosen so that the experimenter believes they are representative of the whole population.  Samples must behave like random samples! Sampling Distributions • Numerical descriptive measures calculated from the sample are called statistics. • Statistics vary from sample to sample and hence are random variables. • The probability distributions for statistics are called sampling distributions. • In repeated sampling, they tell us what values of the statistics can occur and how often each value occurs. Sampling Distributions Definition: The sampling distribution of a statistic is the probability distribution for the possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. Population: 3, 5, 2, 1 Draw samples of size n = 3 without replacement p(x) Possible samples 3, 5, 2 3, 5, 1 3, 2, 1 5, 2, 1 1/4 2 3 x x 10 / 3  3.33 9/3  3 6/3  2 8 / 3  2.67 Each value of x-bar is equally likely, with probability 1/4 1. Sampling Distribution of the Mean… •A fair die is thrown infinitely many times, with the random variable X = # of spots on any throw. The probability distribution of X is: x P(x) 1 2 3 4 5 6 1/6 1/6 1/6 1/6 1/6 1/6 …and the mean and variance are:    xp( x) 1 1 1  1( )  2( )  ...  6( )  3.5 6 6 6   ( x   ) 2 p( x)  1.71 Sampling Distribution of Two Dice •A sampling distribution is created by looking at all samples of size n=2 (i.e. two dice) and their means… •While there are 36 possible samples of size 2, there are only 11 values for (e.g. =3.5) occur more frequently than others (e.g. =1). , and some Sampling Distribution of Two Dice… •The sampling distribution of 6/36 P( ) 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36 is shown below: 5/36 p x  4/36 3/36 2/36 1/36 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Example Thrown two fair dice. Based on all possible samples, the calculation of mean and standard deviation can also be done as. Mean :   3.5 Std Dev : / 2  1.71 / 2  1.21 Sampling Distribution of the Mean n5  x  3.5 n  10  x  3.5 n  25  x  3.5 2x   .5833 (  ) 5 6 2x 2  x  .2917 (  ) 10 2x   .1167 (  ) 25 2 x 2 x 19 Sampling Distribution of the Mean n5  x  3.5 2x  .5833 (   ) 5 2 x n  10  x  3.5 n  25  x  3.5 2  2x  .2917 (  x ) 10 2  2x  .1167 (  x ) 25 Notice that  is smaller than  The larger the sample size the 2 smaller  x . Therefore, x tends to fall closer to , as the sample size increases. 2 x 2 x. Central Limit Theorem Central Limit Theorem: If random samples of n observations are drawn from a nonnormal population with finite  and standard deviation  , then, when n is large, the sampling distribution of the sample mean x is approximately normally distributed, with mean  and standard deviation  / n . The approximation becomes more accurate as n becomes large. How Large is Large? If the population is normal, then the sampling distribution of x will also be normal, no matter what the sample size. When the population is approximately symmetric, the distribution becomes approximately normal for relatively small values of n. When the population is skewed, the sample size must be at least 30 before the sampling distribution of x becomes approximately normal. The Sampling Distribution of the Sample Mean A random sample of size n is selected from a population with mean  and standard deviation . The sampling distribution of the sample mean have mean  and standard deviation  / n . x will If the original population is normal, the sampling distribution will be normal for any sample size. If the original population is nonnormal, the sampling distribution will be normal when n is large. The standard deviation of x-bar is sometimes called the STANDARD ERROR (SE). Sampling Distribution of the Mean Central Limit Theorem Given population with  and  2 the sampling distribution will have: Mean Variance Standard Deviation Standard Error (mean) x   2   x2  n  =x  n As n increases, the shape of the distribution becomes normal (whatever the shape of the population) Example Finding Probabilities for the Sample Mean If the sampling distribution of x is normal or approximately normal, standardize or rescale the interval of interest in terms of z x  / n Find the appropriate area using Z Table. Example: A random sample of size n = 16 from a normal distribution with  = 10 and  = 8. 12  10 P ( x  12)  P ( z  ) 8 / 16  P ( z  1)  1  .8413  .1587 Example A soda filling machine is supposed to fill cans of soda with 12 fluid ounces. Suppose that the fills are actually normally distributed with a mean of 12.1 oz and a standard deviation of 0.2 oz. What is the probability that the average fill for a 6-pack of soda is less than 12 oz? P (x  12)  x   12  12.1 P(  )  / n .2 / 6 P( z  1.22)  .1112 Exercise The time that the laptop’s battery pack can function before recharging is needed is normally distributed with a mean of 6 hours and standard deviation of 1.8 hours. A random sample of 25 laptops with a type of battery pack is selected and tested. What is the probability that the mean until recharging is needed is at least 7 hours? The characteristics of the sampling distribution of a statistic: • The distribution of values is obtained by means of repeated sampling • The samples are all of size n • The samples are drawn from the same population 2. Sampling Distribution of a Proportion… The proportion in the sample is denoted "p-hat" number of " successes" pˆ  n The proportion in the population (parameter) is denoted p pˆ is to p, as x is to  Types of response variables Quantitative Sums Averages Response type Categorical Counts Proportions Prior chapters have focused on quantitative response variables. We now focus on categorical response variables. The Sampling Distribution of the Sample Proportion The Central Limit Theorem can be used to conclude that the binomial random variable X with mean np and standard deviation np 1  p  is approximately normal when n is large. x ˆ The sample proportion, p  n is simply a rescaling of the binomial random variable X, dividing it by n. From the Central Limit Theorem, the sampling distribution of p̂ will also be approximately normal, with a rescaled mean and standard deviation. Approximating Normal from the Binomial  Under certain conditions, a binomial random variable has a distribution that is approximately normal.  When n is large, and p is not too close to zero or one, areas under the normal curve with mean np and standard deviation np 1  p  can be used to approximate binomial probabilities.  Make sure that np and n(1-p) are both greater than 5 to avoid inaccurate approximations! The Sampling Distribution of the Sample Proportion A random sample of size n is selected from a binomial population with parameter p. x The sampling distribution of the sample proportion, pˆ  n pq will have mean p and standard deviation n If n is large, and p is not too close to zero or one, the sampling distribution of p̂ will be approximately normal. The standard deviation of p-hat is sometimes called the STANDARD ERROR (SE) of p-hat. Finding Probabilities for the Sample Proportion If the sampling distribution of p̂ is normal or approximately normal, standardize or rescale the p̂  p interval of interest in terms of If both np > 5 and z pq n np(1-p) > 5 Find the appropriate area using Z Table. Example: A random sample of size n = 100 from a binomial population with p = 0.4. .5  .4 P ( pˆ  .5)  P ( z  ) .4(.6) 100  P ( z  2.04)  1  .9793  .0207 Example The soda bottler in the previous example claims that only 5% of the soda cans are underfilled. A quality control technician randomly samples 200 cans of soda. What is the probability that more than 10% of the cans are underfilled? n = 200 U: underfilled can p = P(U) = 0.05 q = 0.95 np = 10 nq = 190 OK to use the normal approximation P( pˆ  .10) .10  .05  P( z  )  P( z  3.24) .05(.95) 200  1 .9994  .0006 This would be very unusual, if indeed p = .05! Example Sampling Distribution of the Difference Between Two Averages • Theorem : If independent sample of size n1 and n2 are drawn at random from two populations,with means 1 and 2 and variances 12 and 22, respectively, then the sampling distribution of the differences of means, X 1  X 2 , is approximately normally distributed with mean and variance given by X 1X 2 Hence  1   2 and  X2 1  X 2  Z  12 n1   22 n2 . ( X 1  X 2 )  ( 1   2 ) ( 12 / n1 )  ( 22 / n2 ) is approximat ely a standard normal variable. 38 Example •Starting salaries for MBA grads at two universities are normally distributed with the following means and standard deviations. Samples from each school are taken… University 1 University 2 Mean RM 62,000 /yr RM 60,000 /yr Std. Dev. RM14,500 /yr RM18,300 /yr 50 60 sample size, n • What is the sampling distribution of • What is the probability that a sample mean of U1 students will exceed the sample mean of U2 students? Example mean:  x  x  1   2  x1  x2 = 62,000 – 60,000 =2000 and standard deviation: = 14,500 2  18,300 2 1 2 50 60 = 3128.3 P( x1  x2  0)  P( x1  x2  ( 1 -  2 )  2 1 n1   2 2 0  2000  ) 3128 n2  P( z  .64)  .5  .2389  .7389 Sampling Distribution of the Difference Between Two Sample Proportions • Theorem : If independent sample of size n1 and n2 are drawn at random from two populations, where the proportions of obs with the characteristic of interest in the two populations are p1 and p2 respectively, then the sampling distribution of the differences between sample proportions, pˆ1  pˆ 2 is approximately normally distributed with mean and variance given by  pˆ1  pˆ 2  p1  p2 Then Z and pˆ1  pˆ 2  ( p1  p2 )  pˆ1  pˆ 2 is approximately a standard normal variable 41 Sampling Distribution of the Difference Between Two Sample Proportions • Example : It is known that 16% of the households in Community A and 11% of the households in Community B have internets in their houses. If 200 households and 225 households are selected at random from Community A and Community B respectively, compute the probability of observing the difference between the two sample proportions at least 0.10? 42 Sampling Distribution of S2 • Theorem : If S2 is the variance of a random sample of size n taken from a normal population having the variance 2, then the statistic   2 (n  1) S 2  2 n  i 1 ( X i  X )2 2 • has a chi-squared distribution with v = n -1 degrees of freedom. For v  7  02.05  14.067  02.95  2.167 43

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Sampling Distributions