Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 31, Section 1, Last Time Sampling Distributions • Binomial Distribution • Binomial Probs • Normal Approx. to Binomial • Counts Scale vs. Proportion Scale Important Announcement 2nd Midterm Date Changed, from: Tuesday, April 5, to Tuesday, April 12. Section 5.2: Distrib’n of Sample Means n 1 Idea: Study Probability Structure of X X i n i 1 • Based on X 1 ,..., X n • Drawn independently • From same distribution, • Having Expected Value: • EX X And Standard Deviation: X Expected Value of Sample Mean How does E X relate to X ? X 1 n X 1 X n 1 X 1 X n n 1 1 X X n X X n n Sample mean “has the same mean” as the original data. Variance of Sample Mean Study “spread” (i.e. quantify variation) of 2 X 2 1 X 1 X n n X 1 2 2 X 1 X n n 1 2 1 1 2 2 2 2 X X 2 n X X n n n 1 Variance of Sample mean “reduced by ” n S. D. of Sample Mean Since Standard Deviation is square root of Variance, Take square roots to get: 1 X X n 1 S. D. of Sample mean “reduced by ” n Mean & S. D. of Sample Mean Summary: Averaging: 1. Gives same centerpoint 2. Reduces variation by factor of Called “Law of Averages, Part I” 1 n Law of Averages, Part I Some consequences (worth noting): • To “double accuracy”, need 4 times as much data. • For 10 times accuracy”, need 100 times as much data. Law of Averages, Part I HW: 5.28 (5.77, 4) Distribution of Sample Mean Now know center and spread, what about “shape of distribution”? Case 1: If CAN SHOW: X 1 ,, X n are indep. N , X ~ N , n (knew these, news is “mound shape”) Thus work with NORMDIST & NORMINV Distribution of Sample Mean Case 2: If X 1 ,, X n are “almost anything” STILL HAVE: X “approximately” N , n Distribution of Sample Mean Remarks: • Mathematics: in terms of lim • Called “Law of Averages, Part II” • Also called “Central Limit Theorem” • Gives sense in which Normal Distribution is in the center • Hence name “Normal” (ostentatious?) n Law of Averages, Part II More Remarks: • Thus we will work with NORMDIST & NORMINV a lot, for averages • This is why Normal Dist’n is good model for many different populations (any sum of small indep. Random pieces) • Also explains Normal Approximation to the Binomial Normal Approx. to Binomial Explained by Law of Averages. II, since: For X ~ Binomial (n.p) n Can represent X as: X Xi i 1 Where: 0 F on trial i Xi 1 S on trial i Thus X is an average (rescaled sum), so Law of Averages gives Normal Dist’n Law of Averages, Part II Nice Java Demo: http://www.amstat.org/publications/jse/v6n3/applets/CLT.html 1 Dice (think n = 1): 2 Dice (n = 1): Average Dist’n is flat Average Dist’n is triangle … 5 Dice (n = 5): Looks quite “mound shaped” Law of Averages, Part II Another cool one: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html • Create U shaped distribut’n with mouse • Simul. samples of size 2: non-Normal • Size n = 5: more normal • Size n = 10 or 25: mound shaped Law of Averages, Part II Class Example: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg19.xls Shows: • Even starting from non-normal shape, • Averages become normal • More so for more averaging • 1 SD smaller with more averaging ( ) n Law of Averages, Part II HW: 5.31, 5.33, 5.35, 5.39 And now for something completely different…. A statistics professor was describing sampling theory to his class, explaining how a sample can be studied and used to generalize to a population. ??? Chapter 6: Statistical Inference Main Idea: Form conclusions by quantifying uncertainty (will study several approaches, first is…) Section 6.1: Confidence Intervals Background: The sample mean, X , is an “estimate” of the population mean, How accurate? (there is “variability”, how much?) Confidence Intervals Recall the Sampling Distribution: X ~ N , n (maybe an approximation) Confidence Intervals Thus understand error as: X dist ' n How to explain to untrained consumers? (who don’t know randomness, distributions, normal curves) n Confidence Intervals Approach: present an interval With endpoints: Estimate +- margin of error I.e. X m reflecting variability How to choose m ? Confidence Intervals Choice of “Confidence Interval radius”, i.e. margin of error, m: Notes: • No Absolute Range (i.e. including “everything”) is available • From infinite tail of normal dist’n • So need to specify desired accuracy Confidence Intervals Choice of “Confidence Interval radius”, m: Approach: • Choose a Confidence Level • Often 0.95 (e.g. FDA likes this number for approving new drugs, and it is a common standard for publication in many fields) • And take margin of error to include that part of sampling distribution Confidence Intervals E.g. For confidence level 0.95, want X distribution 0.95 = Area m = margin of error Confidence Intervals Computation: Recall NORMINV takes areas (probs), and returns cutoffs Issue: NORMINV works with lower areas Note: lower tail included Confidence Intervals So adapt needed probs to lower areas…. When inner area = 0.95, Right tail = 0.025 Shaded Area = 0.975 So need to compute: NORMINV 0.975, , n Confidence Intervals Need to compute: NORMINV 0.975, , n Major problem: is unknown • But should answer depend on ? • “Accuracy” is only about spread • Not centerpoint • Need another view of the problem Confidence Intervals Approach to unknown : Recenter, i.e. look at X dist’n Key concept: Centered at 0 Now can calculate as: m NORMINV 0.975,0, n Confidence Intervals Computation of: m NORMINV 0.975,0, n Smaller Problem: Don’t know Approach 1: Estimate with s • Leads to complications • Will study later Approach 2: Sometimes know Confidence Intervals 138 E.g. Crop researchers plant 15 plots 139.1 113 with a new variety of corn. The 132.5 140.7 yields, in bushels per acre are: 109.7 118.9 134.8 Assume that 109.6 = 10 bushels / acre 127.3 115.6 130.4 130.2 111.7 105.5 Confidence Intervals E.g. Find: a) The 90% Confidence Interval for the mean value , for this type of corn. b) The 95% Confidence Interval. c) The 99% Confidence Interval. d) How do the CIs change as the confidence level increases? Solution, part 1 of: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg20.xls Confidence Intervals An EXCEL shortcut: CONFIDENCE Careful: parameter is: 2 tailed outer area So for level = 0.90, = 0.10 Confidence Intervals HW: 6.1, 6.3, 6.5