Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Inference Sampling Distributions Inference with a Single Observation Population ? Sampling Parameter: Inference Observation Xi • Each observation Xi in a random sample is a representative of unobserved variables in population • How different would this observation be if we took a different random sample? Inference with Sample Mean Population ? Sampling Sample Parameter: Inference Estimation Statistic: x • Sample mean is our estimate of population mean • How much would the sample mean change if we took a different sample? • Key to this question: Sampling Distribution of x Sampling Distribution of a Sample • Sampling Distribution of a Sample Statistic: The Statistic distribution of values for a sample statistic obtained from repeated samples, all of the same size and all drawn from the same population Example: Consider the set {1, 2, 3, 4}: 1) Make a list of all samples of size 2 that can be drawn from this set (Sample with replacement) 2) Construct the sampling distribution for the sample mean for samples of size 2 3) Construct the sampling distribution for the minimum for samples of size 2 Table of All Possible Samples This table lists all possible samples of size 2, the mean for each sample, and the probability of each sample occurring (all equally likely) # of possible samples (with placement) = Nn Sample x {1, 1} {1, 2} {1, 3} {1, 4} {2, 1} {2, 2} {2, 3} {2, 4} {3, 1} {3, 2} {3, 3} {3, 4} {4, 1} {4, 2} {4, 3} {4, 4} 1.0 1.5 2.0 2.5 1.5 2.0 2.5 3.0 2.0 2.5 3.0 3.5 2.5 3.0 3.5 4.0 Minimum Probability 1 1 1 1 1 2 2 2 1 2 3 3 1 2 3 4 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 1/16 Sampling Distribution • Summarize the information in the previous table to obtain the sampling distribution of the sample mean and the sample minimum: Sampling Distribution of the Sample Mean x 1.0 1.5 2.0 2.5 3.0 3.5 4.0 P( x ) 1/16 2/16 3/16 4/16 3/16 2/16 1/16 Histogram: Sampling Distribution of the Sample Mean P( x ) 0 . 2 5 0 . 2 0 0 . 1 5 0 . 1 0 0 . 0 5 0 . 0 0 1 . 0 1 . 5 2 . 0 2 . 5 3 . 0 3 . 5 4 . 0 x Sampling Distribution of Sample Mean • Distribution of values taken by statistic in all possible samples of size n from the same population • Model assumption: our observations xi are sampled from a population with mean and variance 2 Population Unknown Parameter: Sample 1 of size n Sample 2 of size n Sample 3 of size n Sample 4 of size n Sample 5 of size n Sample 6 of size n Sample 7 of size n Sample 8 of size n . . . x x x x x x x x Distribution of these values? Mean of Sample Mean • First, we examine the center of the sampling distribution of the sample mean. • Center of the sampling distribution of the sample mean is the unknown population mean: mean( X ) = μ • Over repeated samples, the sample mean will, on average, be equal to the population mean – no guarantees for any one sample! Variance of Sample Mean • Next, we examine the spread of the sampling distribution of the sample mean • The variance of the sampling distribution of the sample mean is variance( X ) = 2/n • As sample size increases, variance of the sample mean decreases! • Averaging over many observations is more accurate than just looking at one or two observations • Comparing the sampling distribution of the sample mean when n = 1 (parent population) vs. n = 10 Law of Large Numbers • Remember the Law of Large Numbers: • If one draws independent samples from a population with mean μ, then as the number of observations increases, the sample mean x gets closer and closer to the population mean μ • This is easier to see now since we know that mean(x) = μ variance(x) = 2/n 0 as n gets large Example • Population: seasonal home-run totals for 7032 baseball players from 1901 to 1996 • Take different samples from this population and compare the sample mean we get each time • In real life, we can’t do this because we don’t usually have the entire population! Mean Variance 100 samples of size n = 1 3.69 46.8 100 samples of size n = 10 4.43 4.43 100 samples of size n = 100 4.42 0.43 100 samples of size n = 1000 4.42 0.06 Sample Size Population Parameter = 4.42 Distribution of Sample Mean • We now know the center and spread of the sampling distribution for the sample mean. • What about the shape of the distribution? • If our data x1,x2,…, xn follow a Normal distribution, then the sample mean x will also follow a Normal distribution! Example • Mortality in US cities (deaths/100,000 people) • This variable seems to approximately follow a Normal distribution, so the sample mean will also approximately follow a Normal distribution irrespective of the sample size drawn. Central Limit Theorem • What if the original data doesn’t follow a Normal distribution? • HR/Season for sample of baseball players • If the sample is large enough, it doesn’t matter! Central Limit Theorem • If the sample size is large enough (n≥ 30), then the sample mean x has an approximately Normal distribution • This is true no matter what the shape of the distribution of the original data! Example: Home Runs per Season • Take many different samples from the seasonal HR totals for a population of 7032 players • Calculate sample mean for each sample n=1 n = 10 n = 100 Important Definition & Theorem Sampling Distribution of Sample Means If all possible random samples, each of size n, are taken from any population with a mean and a standard deviation , the sampling distribution of sample means will: 1. have a mean x equal to 2. have a standard deviation x equal to n Further, if the sampled population has a normal distribution, then the sampling distribution of x will also be normal for samples of all sizes Central Limit Theorem The sampling distribution of sample means will become normal as the sample size increases. Summary • The mean of the sampling distribution of x is equal to the mean of the original population: x = • The standard deviation of the sampling distribution of x (also called the standard error of the mean) is equal to the standard deviation of the original population divided by the square root of the sample size: x = n Notes: – The distribution of x becomes more compact as n increases. (Why?) – The variance of x : x2 = 2 n • The distribution of x is (exactly) normal when the original population is normal • The CLT says: the distribution of x is approximately normal regardless of the shape of the original distribution, when the sample size is large enough! Standard Error of the Mean Standard Error of the Mean: The standard deviation of the sampling distribution of sample means: x = n Notes: • The n in the formula for the standard error of the mean is the size of the sample • The proof of the Central Limit Theorem is beyond the scope of this course • The following example illustrates the results of the Central Limit Theorem Graphical Illustration of the Central Limit Distribution of x: Original Population Theorem n=2 10 20 30 x 10 20 Distribution of x: n = 30 Distribution of x: n = 10 10 x x 30 10 20 x 7.3 ~ Applications of the Central Limit Theorem • When the sampling distribution of the sample mean is (exactly) normally distributed, or approximately normally distributed (by the CLT), we can answer probability questions using the standard normal distribution, using the z standard score for dealing with the normal distribution, Example 2 Example:Consider a normal population with = 50 and = 15. Suppose a sample of size 9 is selected at random. Find: 1) P ( 45 x 60) 2) P ( x 47.5) Solutions: Since the original population is normal, the distribution of the sample mean is also (exactly) normal 1) x = = 50 2) x = n = 15 9 = 15 3 = 5 Example 2 0.4772 0.3413 45 - 1.00 z= x- n ; 50 0 60 2.00 x z 45 - 50 60 - 50 z P (45 x 60) = P ÷ 5 5 = P( -1.00 z 2.00) = 0.3413 + 0.4772 = 0.8185 Example 2 0.3085 01915 . 47.5 50 -0.50 z= x- n ; 0 x z x - 50 47.5 - 50 P( x 47.5) = P ÷ 5 5 = P( z -.5) = 0.5000 - 01915 = 0.3085 .