Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Concept of a Sampling Distribution (Predicting the behavior of a statistic) Parameter & Statistic A parameter is a numerical descriptive measure of a population. Because it is based on all the observations in the population, its value is almost always unknown. A sample statistic is a numerical descriptive measure of a sample. It is calculated from the observations in the sample. Common Statistics & Parameters Sample Statistic Mean Standard Deviation Variance Binomial Proportion Population Parameter µ s σ s2 p • The median household income for US is roughly $51,900; • The mean household income is $70,900 (people like Bill Gates pull the mean to the right). • Now, suppose we take a random sample of 2,000 U.S. Households and gather information on their annual income. • Assume we got a represantative sample to ourselves. • So the drawn sample, on average, should look like U.S. • There should be whole types of people like homeless people, academicians, very rich people, etc. • So we expect the mean household income about $70,900 . • Will it be exactly? • If we get different samples of 2,000, we would expect some means to be higher, and some to be lower. • Might we get a sample of 2,000 with a mean household income of $500,000? • That is possible but highly unlikely unless our sample is just from very rich part. • It is also highly unlikely to have mean annual income of $7,000. • We can not compare any two statistics on the basis of their performance for a single sample. • Sample statistics are themselves are random variables because different samples can lead to different values for the sample statistics. • As random variables, sample statistics must be judged and compared on the basis of their probability distribution (i.e., the collection of values and associated probabilities of each statistic that would be obtained if the sampling experiment was repeated a very large number of times). Sampling Distribution The sampling distribution of a sample statistic calculated from a sample of n measurements is the probability distribution of the statistic. Developing Sampling Distributions Suppose There’s a Population ... • Population size, N = 4 • Random variable, x • Values of x: 1, 2, 3, 4 • Uniform distribution Population Characteristics Summary Measure N µ= ∑x i=1 N i = 2.5 Population Distribution 3 2 1 0 P(x) x 1 2 3 4 All Possible Samples of Size n = 2 16 Samples 16 Sample Means 1st 2nd Observation Obs 1 2 3 4 1st 2nd Observation Obs 1 2 3 4 1 1,1 1,2 1,3 1,4 1 1.0 1.5 2.0 2.5 2 2,1 2,2 2,3 2,4 2 1.5 2.0 2.5 3.0 3 3,1 3,2 3,3 3,4 3 2.0 2.5 3.0 3.5 4 4,1 4,2 4,3 4,4 4 2.5 3.0 3.5 4.0 Sample with replacement Sampling Distribution of All Sample Means 16 Sample Means Sampling Distribution of the Sample Mean 1st 2nd Observation Obs 1 2 3 4 1 1.0 1.5 2.0 2.5 2 1.5 2.0 2.5 3.0 3 2.0 2.5 3.0 3.5 4 2.5 3.0 3.5 4.0 P(x) .3 .2 .1 .0 x 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Summary Measure of All Sample Means N ∑x 1.0 + 1.5 + ... + 4.0 µX = = = 2.5 N 16 i i=1 Comparison Sampling Distribution of the Sample Mean Population Distribution .3 .2 .1 .0 P(x) x 1 2 3 4 P(x) .3 .2 .1 .0 x 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ x = 2.5 Example 5.1 Consider the popular casino game of craps, in which a player throws two dice and bets on the outcome (the sum of total dots showing on the upper faces of two dice). If the sum total of dice is 7 or 11, the roller wins $5; if the total is a 2,3, or 12, the roller loses $5; and for any other total ( 4, 5,6,8,9,10)no money is lost. Let x represent the result of the come-out roll wager. Outcome of wager -5 0 5 p(x) 6/9 2/9 1/9 Now consider a random sample of n=3 come-out rolls. Find the sampling distribution of the sample mean. Find the distribution of the sample median. Another example • Lets consider you as population • And we are interested in your grades from the first midterm exam • Mean of your first exam grades is 76.34 • Since we considered this class as the population, this value will be parameter, µ. • We will take 20 samples of size 15. • We will calculate mean and median values for each sample. Sampling distributions for mean and median • Don’t confuse the sampling distribution with the distribution of the sample. – When you take a sample, you look at the distribution of the values, usually with a histogram, and you may calculate summary statistics. – The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get. 5.2 Properties of Sampling Distributions: Unbiasedness and Minimum Variance Point Estimator A point estimator of a population parameter is a rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the population parameter. Estimates • If the sampling distribution of a sample statistic has a mean equal to the population parameter the statistic is intended to estimate, the statistic is said to be an unbiased estimate of the parameter. • If the mean of the sampling distribution is not equal to the parameter, the statistic is said to be a biased estimate of the parameter. Comparison Sampling Distribution of the Sample Mean Population Distribution .3 .2 .1 .0 P(x) x 1 2 3 4 P(x) .3 .2 .1 .0 x 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ x = 2.5 Unbiased Biased stt315 • For 20 samples, we can not have an unbiased estimate for µ. • But statistical theory tells us that for very large sample size, sample mean is an unbiased estimate of µ. Standard Error The standard deviation of a sampling distribution measures another important property of statistics: the spread of these estimates generated by repeated sampling. • Even though both statistics have sampling distribution centered at parameter, the probability that A is closer to the parameter value is higher than the probability that B is closer to the parameter value. • It is better to use a statistics which is centered at the parameter and has smaller variation, i.e. smaller standard error. Standard Error(cont.) • To make an inference about a population parameter, we use the sample statistic with a sampling distribution that is unbiased and has a smaller standard deviation than the any other unbiased statistic. • The standard deviation of the sampling distribution of a statistic is also called the standard error of the statistic. Back to Example 5.1 With smaller standard error, sample mean seems as the better estimator for population mean. Thinking Challenge 5.3 The Sampling Distribution of a Sample Mean and the Central Limit Theorem Properties of the Sampling Distribution of x 1. Mean of the sampling distribution equals mean of sampled population*, that is, µ x = E (x ) = µ. 2. Standard deviation of the sampling distribution equals Standard deviation of sampled population Square root of sample size That is, σ x = σ n . Standard error of sample mean Theorem 5.1 If a random sample of n observations is selected from a population with a normal distribution, the sampling distribution of x will be a normal distribution. Sampling from Normal Populations • Central Tendency µx = µ Population Distribution σ = 10 • Dispersion σ σx = n – Sampling with replacement µ = 50 x Sampling Distribution n=4 σx = 5 n =16 σx = 2.5 µx- = 50 x Standardizing the Sampling Distribution of x x − µx x − µ z= = σ σx n Sampling Distribution Standardized Normal Distribution σ=1 σx µx x µ =0 z Thinking Challenge You’re an operations analyst for AT&T. Long-distance telephone calls are normally distributed with µ = 8 min. and σ = 2 min. If you select random samples of 25 calls, what percentage of the sample means would be between 7.8 & 8.2 minutes? © 1984-1994 T/Maker Co. Sampling Distribution Solution* x−µ Sampling Distribution 7.8 − 8 z= = = −.50 2 σ 25 n x − µ 8.2 − 8 z= = = .50 2 σ Standardized Normal 25 n Distribution σx = .4 σ=1 .3830 .1915 .1915 7.8 8 8.2 x –.50 0 .50 z Sampling from Non-Normal Populations • Central Tendency µx = µ Population Distribution σ = 10 • Dispersion σ σx = n – Sampling with replacement µ = 50 x Sampling Distribution n=4 σx = 5 n =30 σx = 1.8 µx- = 50 x Central Limit Theorem Consider a random sample of n observations selected from a population (any probability distribution) with mean μ and standard deviation σ. Then, when n is sufficiently large, the sampling distribution of x will be approximately a normal distribution with mean µ x = µ and standard deviationσ x = σ n . The larger the sample size, the better will be the normal approximation to the sampling distribution of x . Central Limit Theorem As sample size gets large enough (n ≥ 30) ... σx = σ n sampling distribution becomes almost normal. µx = µ x Central Limit Theorem Example The amount of soda in cans of a particular brand has a mean of 12 oz and a standard deviation of .2 oz. If you select random samples of 50 cans, what percentage of the sample means would be less than 11.95 oz? SODA Central Limit Theorem Solution* x−µ 11.95 − 12 z= = = −1.77 .2 σ Sampling Standardized Normal n 50 Distribution Distribution σx = .03 .0384 σ=1 .4616 11.95 12 x –1.77 0 Shaded area exaggerated z • When population standard deviation is unknown; ̅ Thinking Challenge • Assume that the systolic blood pressure of 30-year-old males is normally distributed, with an average of 122 mmHg and a standard deviation of 10mmHg. A random sample of 16 men from this age group is selected. • Calculate the probability that the average blood pressure of the sample will be greater than 125mmHg? • Calculate the probability that the average blood pressure of this sample will be between 118 and 124 mmHg? • Calculate the probability that the blood pressure of an individual male from this population will be between 118 and 124mmHg? Thinking Challenge • Assume that the average weight of an NFL player is 245.7 pounds with a standard deviation of 34.5 pounds, but the probability distribution of the population is unknown. If a random sample of 32 players is selected, • what is the probability that the average weight of the sample will be less than 234 pounds? • What is the probability that the average weight of the sample is between 248 and 254 pounds? The Sampling Distribution of the Sample Proportion (Predicting the behavior of discrete random variables) Sample Proportion Just as the sample mean is a good estimator of the population mean, the sample proportion—denoted p̂ — is a good estimator of the population proportion p. How good the estimator p̂ is will depend on the sampling distribution of the statistic. This sampling distribution has properties similar to those of the sampling distribution of x. Z-score for the sampling distribution of proportion • When we do not know the population proportions; Thinking Challenge • A report claims that 15% of women are left-handed. a) Calculate the probability that more than 12% of a random sample of 100 women is left-handed. np=100*0.15=15≥15 n(1-p)=100*0.85=85 ≥15 So we can use the normal approximation to the binomial distribution. Thinking Challenge (cont.) b) Calculate the probability that 11% to 16%-women random sample is left-handed. 0.11 0.11 < ̂ < 0.16 0< 0.15 < 0.15 ∗ 0.85 100 < 1.12 + 0 < < 0.16 0.15 1.12 < 0.15 ∗ 0.85 100 < 0.28 0.3686 + 0.1103 < 0.28 0.4789