Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
9/30/2015 Statistics • A statistic is any quantity whose value can be calculated from sample data. CH5: Statistics and their distributions MATH/STAT360 CH5 • A statistic can be thought of as a random variable. 1 MATH/STAT360 Sampling Distribution CH5 Note • Any statistic, being a random variable, has a probability distribution. • The probability distribution of a statistic is sometimes referred to as its sampling distribution. • In this group of notes we will look at examples where we know the population and it’s parameters. • This is to give us insight into how to proceed when we have large populations with unknown parameters (which is the more typical scenario). MATH/STAT360 MATH/STAT360 CH5 2 3 CH5 4 9/30/2015 The “Meta-Experiment” Sample Statistics • The “Meta-Experiment” consists of indefinitely many repetitions of the same experiment. • If the experiment is taking a sample of 100 items from a population, the meta-experiment is to repeatedly take samples of 100 items from the population. • This is a theoretical construct to help us understand the probabilities involved in our experiment. MATH/STAT360 CH5 5 Distribution of the Sample Mean Meta-Experiment Experiment Population Sample Population Statistic Sample of n Sample Statistic Sample of n Sample of n Sample Statistic Sample of n Sample . Statistic . Etc. MATH/STAT360 CH5 6 Example: Random Rectangles 100 Rectangles with µ=7.42 and σ=5.26. Let X1, X2,…,Xn be a random sample from a distribution with mean value µ and standard deviation σ. Then frequency 1. E ( X ) X 2. V ( X ) X2 2 / n 0 5 SD ( X ) X / n 10 15 Histogram of Areas MATH/STAT360 CH5 7 MATH/STAT360 0 5 CH5 10 Areas 15 20 8 9/30/2015 Based on 68 random samples of size 5: Mean of the sample means=7.33 SD of the sample means=1.88. So, the distribution of the sample mean based on samples of size 5, should have 1. E ( X ) 7.42 Histogram of Sample Means 10 0 5 frequency 15 2. SD ( X ) 5.26 / 5 2.35 MATH/STAT360 CH5 9 MATH/STAT360 0 5 CH5 10 15 20 10 Means Normal Distributions Example: Women’s Heights • Let X1, X2,…,Xn be a random sample from a normal distribution with mean value µ and standard deviation σ. • Then for any n, X is normally distributed (with mean µ and standard deviation / n ). MATH/STAT360 CH5 11 • It is known that women’s heights are normally distributed with population mean 64.5 inches and population standard deviation 2.5 inches. • We will look at the distribution of sample means for various sample sizes. • Since the population follows a normal distribution, the sampling distribution of X is also normal regardless of sample size. MATH/STAT360 CH5 12 9/30/2015 For n=9, the sample means will be normally distributed with mean=64.5 and standard deviation= 2.5 / 9 0.83. For n=25, the sample means will be normally distributed with mean=64.5 and standard deviation= 2.5 / 25 0.5. Distribution of Sample Means (n=25) 0.2 0.4 Height of Curve 0.3 0.2 0.0 0.0 0.1 Height of Curve 0.6 0.4 0.8 Distribution of Sample Means (n=9) 62 63 64 65 MATH/STAT360 66 67 62 68 Sample CH5 Mean 13 For n=100, the sample means will be normally distributed with mean=64.5 and standard deviation = 2.5 / 100 0.25. MATH/STAT360 63 64 65 66 67 68 Sample Mean CH5 14 The Central Limit Theorem Distribution of Sample Means (n=100) 1.0 E ( X ) and V ( X ) 2 / n. 0.0 0.5 Height of Curve 1.5 • Let X1, X2,…,Xn be a random sample from a distribution with mean value µ and standard deviation σ. • Then if n is sufficiently large, X has approximately a normal distribution with 62 MATH/STAT360 63 64 65 Sample Mean CH5 66 67 68 15 MATH/STAT360 CH5 16 9/30/2015 Rule of Thumb Sampling Distribution Simulation • If n>30, the Central Limit Theorem can be used. http://onlinestatbook.com/stat_sim/index.html • For highly skewed data or data with extreme outliers, it may take sample of 40+ before the CLT starts “working”. MATH/STAT360 CH5 17 The sampling distribution of X depends on the sample size (n) in two ways: 1. The standard deviation is X / n which is inversely proportional to n 2. If the population distribution is not normal, then the shape of the sampling distribution of X depends on n, being more nearly normal for larger n. CH5 CH5 18 Why is the Central Limit Theorem Important? Dependence on Sample Size MATH/STAT360 MATH/STAT360 19 • Every different type of population needs a different set of procedures (i.e. probability tables) to answer questions about probability. • The CLT shows us that we can use the same procedure (normal probability procedures) for questions about probability and the sample mean, regardless of the shape of the original population. • The only requirement is a “large” sample. MATH/STAT360 CH5 20 9/30/2015 Proportions as Means Distribution of Sample Proportion • Recall that a binomial RV X is the number of successes in an experiment consisting of n independent success/failure trials. • Let 1 if the ith trial results in a success • The CLT implies that if n is sufficiently large, then p̂ has approximately a normal distribution with p (1 p ) n where p is the true population proportion. E ( pˆ ) p and V ( pˆ ) Xi 0 if the ith trial results in a failure • Then the sample proportion can be expressed as pˆ MATH/STAT360 # successes n CH5 X i n • In order for the approximation to hold we need np≥10 and n(1-p)≥10. X 21 Example: Suppose we took samples of size 20 from a population where p=0.5. MATH/STAT360 0.10 0.00 0.05 Probability 0.15 The sampling distribution of p̂ is approximately Normal with mean = 0.5 and variance = p(1 p ) / n 0.5 0.5 / 20 0.0125 CH5 0.0 0.2 0.4 23 0.6 p-hat 0.8 1.0 MATH/STAT360 CH5 22