Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT 101 Dr. Kari Lock Morgan 10/18/12 Normal Distribution Chapter 5 • Normal distribution • Central limit theorem • Normal distribution for confidence intervals • Normal distribution for p-values • Standard normal Statistics: Unlocking the Power of Data Lock5 Project 1 • Due Tuesday • 5 pages, double spaced, including figures • Hypotheses should not change based on data • This is a research paper – there should be text and complete sentences. Statistics: Unlocking the Power of Data Lock5 Bootstrap and Randomization Distributions Correlation: Malevolent uniforms Measures from Scrambled Collection 1 Slope :Restaurant tips Measures from Scrambled RestaurantTips -60 -40 Dot Plot -20 0 20 slope (thousandths) Mean :Body Temperatures Measures from Sample of BodyTemp50 98.2 98.3 98.4 40 -0.4 -0.2 0.0 r 0.2 All bell-shaped What do you Diff means: Finger taps distributions! notice? 0.4 0.6 Dot Plot Measures from Scrambled CaffeineTaps 98.5 98.6 Nullxbar 98.7 98.8 0.5 phat 0.6 98.9 Dot Plot Dot Plot 99.0 -4 Proportion : Owners/dogs 0.4 60 -0.6 Measures from Sample of Collection 1 0.3 Dot Plot -3 -2 -1 0 Diff 1 2 3 Mean : Atlanta commutes Measures from Sample of CommuteAtlanta 0.7 0.8 Statistics: Unlocking the Power of Data 26 27 28 29 xbar 30 4 Dot Plot 31 32 Lock5 Normal Distribution • The symmetric, bell-shaped curve we have 1000 0 500 Frequency 1500 seen for almost all of our bootstrap and randomization distributions is called a normal distribution -3 Statistics: Unlocking the Power of Data -2 -1 0 1 2 3 Lock5 Central Limit Theorem! For a sufficiently large sample size, the distribution of sample statistics for a mean or a proportion is normal www.lock5stat.com/StatKey Statistics: Unlocking the Power of Data Lock5 Central Limit Theorem • The central limit theorem holds for ANY original distribution, although “sufficiently large sample size” varies • The more skewed the original distribution is (the farther from normal), the larger the sample size has to be for the CLT to work Statistics: Unlocking the Power of Data Lock5 Central Limit Theorem • For distributions of a quantitative variable that are not very skewed and without large outliers, n ≥ 30 is usually sufficient to use the CLT • For distributions of a categorical variable, counts of at least 10 within each category is usually sufficient to use the CLT Statistics: Unlocking the Power of Data Lock5 Normal Distribution • The normal distribution is fully characterized by it’s mean and standard deviation N mean,standard deviation Statistics: Unlocking the Power of Data Lock5 Normal Distribution N 0.523,0.048 Statistics: Unlocking the Power of Data Lock5 Bootstrap Distributions If a bootstrap distribution is approximately normally distributed, we can write it as a) b) c) d) N(parameter, sd) N(statistic, sd) N(parameter, se) N(statistic, se) sd = standard deviation of variable se = standard error = standard deviation of statistic Statistics: Unlocking the Power of Data Lock5 Confidence Intervals If the bootstrap distribution is normal: To find a P% confidence interval , we just need to find the middle P% of the distribution N(statistic, SE) Statistics: Unlocking the Power of Data Lock5 Best Picture What proportion of visitors to www.naplesnews.com thought The Artist should win best picture? pˆ .15 SE ??? Statistics: Unlocking the Power of Data Lock5 Best Picture www.lock5stat.com/statkey Statistics: Unlocking the Power of Data Lock5 Area under a Curve • The area under the curve of a normal distribution is equal to the proportion of the distribution falling within that range • Knowing just the mean and standard deviation of a normal distribution allows you to calculate areas in the tails and percentiles www.lock5stat.com/statkey Statistics: Unlocking the Power of Data Lock5 Best Picture www.lock5stat.com/statkey Statistics: Unlocking the Power of Data Lock5 Best Picture Statistics: Unlocking the Power of Data Lock5 Confidence Intervals For a normal sampling distribution, we can also use the formula sample statistic 2 SE to give a 95% confidence interval. .156 2 .03 0.096, 0.216 Statistics: Unlocking the Power of Data Lock5 Confidence Intervals For normal bootstrap distributions, the formula sample statistic 2 SE gives a 95% confidence interval. How would you use the N(0,1) normal distribution to find the appropriate multiplier for other levels of confidence? Statistics: Unlocking the Power of Data Lock5 Confidence Intervals For a P% confidence interval, use sample statistic z SE * where P% of a N(0,1) distribution is between –z* and z* Statistics: Unlocking the Power of Data Lock5 Confidence Intervals P% -z* Statistics: Unlocking the Power of Data z* Lock5 Confidence Intervals Find z* for a 99% confidence interval. www.lock5stat.com/statkey z* = 2.575 Statistics: Unlocking the Power of Data Lock5 News Sources “A new national survey shows that the majority (64%) of American adults use at least three different types of media every week to get news and information about their local community” The standard error for this statistic is 1% Find a 99% confidence interval for the true proportion. Source: http://pewresearch.org/databank/dailynumber/?NumberID=1331 Statistics: Unlocking the Power of Data Lock5 News Sources sample statistic z SE * 0.64 2.575 0.01 0.64 0.026 0.614,0.666 Statistics: Unlocking the Power of Data Lock5 Confidence Interval Formula From N(0,1) sample statistic z SE * From original data Statistics: Unlocking the Power of Data From bootstrap distribution Lock5 First Born Children • Are first born children actually smarter? • Based on data from last semester’s class survey, we’ll test whether first born children score significantly higher on the SAT X first born X not first born 30.26 • From a randomization distribution, we find SE = 37 Statistics: Unlocking the Power of Data Lock5 First Born Children X first born X not first born 30.26, SE 37 What normal distribution should we use to find the p-value? a) b) c) d) N(30.26, 37) N(37, 30.26) N(0, 37) N(0, 30.26) Statistics: Unlocking the Power of Data Because this is a hypothesis test, we want to see what would happen if the null were true, so the distribution should be centered around the null. The variability is equal to the standard error. Lock5 Hypothesis Testing Distribution of Statistic Assuming Null Observed Statistic p-value -3 -2 -1 0 1 2 3 Statistic Statistics: Unlocking the Power of Data Lock5 p-values If the randomization distribution is normal: To calculate a p-value, we just need to find the area in the appropriate tail(s) beyond the observed statistic of the distribution N(null value, SE) Statistics: Unlocking the Power of Data Lock5 First Born Children N(0, 37) www.lock5stat.com/statkey p-value = 0.207 Statistics: Unlocking the Power of Data Lock5 First Born Children Statistics: Unlocking the Power of Data Lock5 Standard Normal • Sometimes, it is easier to just use one normal distribution to do inference • The standard normal distribution is Distribution of Statistic Assuming Null the normal distribution with mean 0 and standard deviation 1 N 0,1 -3 -2 Statistics: Unlocking the Power of Data -1 0 Statistic 1 2 3 Lock5 Standardized Test Statistic • The standardized test statistic is the number of standard errors a statistic is from the null value sample statistic null value z SE • The standardized test statistic (also called a z-statistic) is compared to N(0,1) Statistics: Unlocking the Power of Data Lock5 p-value 1) Find the standardized test statistic: sample statistic null value z SE 2) The p-value is the area in the tail(s) beyond z for a standard normal distribution Statistics: Unlocking the Power of Data Lock5 First Born Children 1) Find the standardized test statistic sample statistic null value z SE 30.26 0 37 0.818 Statistics: Unlocking the Power of Data Lock5 First Born Children 2) Find the area in the tail(s) beyond z for a standard normal distribution p-value = 0.207 Statistics: Unlocking the Power of Data Lock5 z-statistic sample statistic null value z SE If z = –3, using = 0.05 we would (a) Reject the null (b) Not reject the null (c) Impossible to tell (d) I have no idea Statistics: Unlocking the Power of Data About 95% of z-statistics are within -2 and +2, so anything beyond those values will be in the most extreme 5%, or equivalently will give a p-value less than 0.05. Lock5 z-statistic • Calculating the number of standard errors a statistic is from the null value allows us to assess extremity on a common scale Statistics: Unlocking the Power of Data Lock5 Formula for p-values From original data From H0 sample statistic null value z SE From randomization distribution Compare z to N(0,1) for p-value Statistics: Unlocking the Power of Data Lock5 Standard Error • Wouldn’t it be nice if we could compute the standard error without doing thousands of simulations? • We can!!! • Or rather, we’ll be able to next week! Statistics: Unlocking the Power of Data Lock5 To Do Do Project 1 (due 10/23) Read Chapter 5 Statistics: Unlocking the Power of Data Lock5