Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
___________________________________ ___________________________________ ___________________________________ Sampling Distributions Central Limit Theorem ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Sampling Distribution ___________________________________ Distribution of a statistic ___________________________________ Recall ___________________________________ Population : Collection of all data on a characteristic Parameter : Measure of a population Sample Statistic ___________________________________ e.g. mean, s.d, proportion, median, etc., of a population notation: µ,σ,π, etc. ___________________________________ ___________________________________ : Any subset of a population : Measure of a population e.g. mean, s.d., proportion, median of sample data notation: x , s, p 2 clt ___________________________________ Objective: ___________________________________ Learn properties of statistics (sample measures) ___________________________________ 3 ___________________________________ In particular, properties of sample mean and sample proportion ___________________________________ What are the average and s.d. of the sample mean? What is the distribution of the sample mean x ? ___________________________________ ___________________________________ What are the average and s.d. of sample proportion? What is the distribution of the sample proportion, p? clt ___________________________________ POPULATION (X) MEAN : µ S.D. : σ ___________________________________ ___________________________________ ___________________________________ Choose random, independent samples each of size n. 1: x11, x 12, x13, ..., x1n Sample mean 1 Sample x Sample ___________________________________ Sample 2 x21, x22, x23,...., x2n Sample mean = x2 ___________________________________ ___________________________________ 3: x31, x 32, x33, ..., x3n mean , etc. 3 x Sample 4 clt ___________________________________ So, we could have many sample means. Sample mean is a random variable ___________________________________ ___________________________________ A statistic ___________________________________ What can we say about this? ___________________________________ 1. On the average, the sample means must equal the population mean ___________________________________ ___________________________________ 2. The sample means are closer together than the population values themselves. 5 clt ___________________________________ Meaning: ___________________________________ Average of sample means= population mean µX = µ ___________________________________ ___________________________________ Variance of the sample means is smaller than the variance of the population (σ2 ) How much smaller? Variance of sample means = σ 2x = 6 σ2 n ; so, σ X = ___________________________________ ___________________________________ ___________________________________ population variance sample size σ n clt ___________________________________ Important Observations ___________________________________ There are three “averages” we are talking about: 1. Population mean = µ Å This is a constant 2. Sample mean = 3. Mean of Sample means x ___________________________________ ___________________________________ Å This is a random variable µX ___________________________________ Å A constant Similarly, there are two variances: 1. Variance of the population= σ Å A constant 2. Variance of Sample means= σ X2 ___________________________________ ___________________________________ Å A constant You need to use the right notation and interpret each quantity appropriately. 7 clt µx = µ; ___________________________________ So, we have the important relationship: σ 2x = σ ___________________________________ 2 ___________________________________ n Note that the bigger the sample size, the less variable are the values of sample means. The above are true for means of samples from any population and for samples of any size. What about the distribution of the sample means, x ? Recall that x is a r.v. 8 ___________________________________ ___________________________________ ___________________________________ clt If population is normal then ___________________________________ ~ Normal ___________________________________ x ___________________________________ ___________________________________ for large samples ___________________________________ How large? x If population is not normal or unknown, then ≈ Normal (approximately normal) ___________________________________ Sample size at least 30. ___________________________________ CENTRAL LIMIT THEOREM 9 ___________________________________ Applies when population is non-normal or when the population distribution is unknown. clt ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ Central Limit Theorem ___________________________________ ___________________________________ ___________________________________ The random variable is the number of units a component sold by a company. The probability distribution of X is ___________________________________ ___________________________________ given as: U n its s o ld 1000 3000 5000 10000 P ro b a b ility 0 .1 0 .3 0 .4 0 .2 ___________________________________ ___________________________________ We can compute the mean µ = 5000 units and the ___________________________________ standard deviation is σ = 2792.8 units Suppose we look at a sample of 16 sales and do this, say, ___________________________________ 100 times. We can easily mimic (simulate) this experiment in Minitab. 11 clt ___________________________________ First we have to enter the probability distribution in our worksheet. 12 ___________________________________ ___________________________________ Column 1 has the values 1000,3000,5000.10000 Column 2 has the probabilities ___________________________________ ___________________________________ Now we tell Minitab to generate 100 samples each of size 16 from our discrete distribution: ___________________________________ ___________________________________ Take a look at some of the samples… Row 1 Sample 1 2 Sample 2 C3 10000 3000 C4 C5 C6 C7 C8 C9 C10 5000 5000 3000 3000 5000 1000 10000 …. 1000 3000 3000 3000 10000 5000 10000 ….. clt ___________________________________ Now, compute the means of each of the 100 samples store them in column c19 Look at the histogram of sample means in C19 compare with the histogram of the “population distribution”. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ 0.4 20 prob Frequency 0.3 0.2 0.1 10 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 3500 4000 4500 5000 5500 6000 6500 7000 units sam16 13 clt ___________________________________ Let us repeat the process by taking samples of size 64, 81. Below you see the graphs of the histograms of the sample averages for all three cases. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ 14 clt ___________________________________ ___________________________________ ___________________________________ ___________________________________ Frequency 20 ___________________________________ ___________________________________ 10 ___________________________________ 0 3500 4000 4500 5000 5500 6000 6500 7000 sam16 15 clt ___________________________________ ___________________________________ ___________________________________ ___________________________________ Frequency 20 ___________________________________ ___________________________________ 10 ___________________________________ 0 4250 4500 4750 5000 5250 5500 5750 6000 sam64 16 clt ___________________________________ ___________________________________ ___________________________________ Frequency 30 ___________________________________ ___________________________________ 20 ___________________________________ 10 ___________________________________ 0 4400 4600 4800 5000 5200 5400 5600 5800 sam81 17 clt ___________________________________ Note that: In all the three examples, the middle value -- the mean of the sample means -- is about 5000 (the mean of the population) this is what the property µ x = µ says As the sample size increases, the values of the sample mean get closer. this is what the property σx =σ / n ___________________________________ ___________________________________ ___________________________________ ___________________________________ says. The distribution gets more symmetrical. (I.e.) As the sample size gets larger, the distribution of the sample averages looks like that of a normal distribution. 18 ___________________________________ ___________________________________ this is what CLT says. clt ___________________________________ Central Limit Theorem for Proportions ___________________________________ There is nothing special about averages ___________________________________ Binomial Population Success / Failure π= P(Success) π= Proportion of successes ___________________________________ ___________________________________ ___________________________________ ___________________________________ Sample of size n : x1,x2,... , xn Success / failure 19 clt ___________________________________ Examples ___________________________________ Each component is either good or defective π = Proportion of good items in the population (Shipment) ___________________________________ Shipment of components ___________________________________ process dependent may or may not be known ___________________________________ Random sample of 250 components ___________________________________ Each component is either good or bad p = proportion of good components in the sample Changes from one sample to another sample 20 ___________________________________ clt ___________________________________ Population: ___________________________________ ___________________________________ p = proportion (fraction) of voters in the sample who favor the candidate ___________________________________ Value of p will change from one sample to another 21 ___________________________________ Random sample of 1057 voters ___________________________________ Favorable or unfavorable opinion about a candidate, say in our county. π = proportion of voters who favor the candidate ___________________________________ Random clt ___________________________________ • p = proportion of successes in the sample ___________________________________ • Sample proportion ___________________________________ • Statistic; Random • What are the properties of the statistic p ? ___________________________________ • (1) Average of sample proportion is the population proportion: µp = π • (2) S.d. of sample proportion p: • CLT => p ≈ Normal σp = ___________________________________ ___________________________________ π (1 − π ) ___________________________________ n • Combine with (1) and (2) to get • 22 p≈ ( N π, π (1 − π ) n ) for large samples. clt ___________________________________ Things to remember... 23 ___________________________________ Use appropriate notation to identify the given information Define the random variables Write the question in terms of the r.v. Keep the information about the population and the information about the sample separate. Connect your steps. ___________________________________ ___________________________________ ___________________________________ ___________________________________ ___________________________________ clt