Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stat 31, Section 1, Last Time • • Sampling Experiments – – – – • Design Controls Randomization Blind & Double Blind Pepsi Challenge Midterm I Coming up: Tuesday, Feb. 15 Material: HW Assignments 1 – 4 Extra Office Hours: Mon. Feb. 14, 8:30 – 12:00, 2:00 – 3:30 (Instead of Review Session) Bring Along: 1 8.5” x 11” sheet of paper with formulas Sec. 3.4: Basics of “Inference” Idea: Build foundation for statistical inference, i.e. quantitative analysis (of uncertainty and variability) Fundamental Concepts: Population described by parameters e.g. mean , SD . Unknown, but can get information from… Fundamental Concepts Last page: Population, here: Sample (usually random), described by corresponding “statistics” e.g. mean x , SD s . (Will become important to keep these apart) Population vs. Sample E.g. 1: Political Polls • Population is “all voters” • Parameter of interest is: p = % in population for A (bigger than 50% or not?) • Sample is “voters asked by pollsters” • Statistic is p̂ = % in sample for A (careful to keep these straight!) Population vs. Sample E.g. 1: Political Polls • Notes – p̂ is an “estimate” of p – – – – Variability is critical Will construct models of variability Possible when sample is random Recall random sampling also reduces bias Population vs. Sample E.g. 2: • • Measurement Error (seemingly quite different…) Population is “all possible measurem’ts” (a thought experiment only) Parameters of interest are: = population mean = population SD Population vs. Sample E.g. 2: Measurement Error • Sample is “measurem’ts actually made” • Statistics are: x = mean of measurements s = SD of measurements Population vs. Sample E.g. 2: • Measurement Error Notes: – x estimates – s estimates – Again will model variability – “Randomness” is just a model for measurement error Population vs. Sample HW: 3.59 3.61 Basic Mathematical Model Sampling Distribution Idea: Model for “possible values” of statistic E.g. 1: Distribution of p̂ in “repeated samplings (thought experiment only) E.g. 2: Distribution of x in “repeated samplings (again thought experiment) Basic Mathematical Model Sampling Distribution Tools Can study these with: • Histograms “shape”: often Normal • Mean Gives measure of “bias” • SD Gives measure of “variation” Bias and Variation Graphical Illustration Scanned from text: Fig. 3.9 Bias and Variation Class Example: Results from previous class on “Estimate % of males at UNC” https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg16.xls Recall several approaches to estimation (3 bad, on sensible) E.g. % Males at UNC At top: – Counts – Corresponding proportions (on [0,1] scale) – Bin Grid (for histograms on [0,1] numbers) Next Part: – Summarize mean of each – Summarize SD (spread) of each Histograms (appear next) E.g. % Males at UNC Recall 4 way to collect data: Q1: Sample from class Q2: Stand at door and tally – Q1 “less spread and to left”? Q3: Make up names in head – Q3 “more to right”? Q4: Random Sample – Supposed to be best, can we see it? E.g. % Males at UNC Better comparison: Q4 vs. each other one Use “interleaved histograms” Q1 & Q4: – Q1 has smaller center: x1 0.39 0.42 x4 – i.e. “biased”, since Class – And less spread: – since “drawn from smaller pool” Population s1 0.086 0.109 s4 E.g. % Males at UNC Q2 & Q4: – Centers have Q2 bigger: x2 0.47 0.42 x4 – Reflects bias in door choice – And Q2 is “more spread” : s2 0.139 0.109 s4 – Reflects “spread in doors chosen” + “sampling spread” E.g. % Males at UNC Q3 & Q4: – Center for Q3 is bigger: x3 0.48 0.42 x4 – Reflects “more people think of males”? – And Q3 is “more spread” : s3 0.124 0.109 s4 – Reflects “more variation in human choice” E.g. % Males at UNC A look under the hood: • Highlight an interleaved Chart • Click Chart Wizard • Note Bar (and interleaved subtype) • Different colors are in “series” • Computed earlier on left • Using Tools Data Anal. Histo’m E.g. % Males at UNC Interesting question: What is “natural variation”? Will model this soon. This is “binomial” part of this example, which we will study later. Bias and Variation HW: 3.62 (Hi bias – hi var, lo bias – lo var, lo bias – hi var, hi bias – lo var) 3.65 Chapter 4: Probability Goal: quantify (get numerical) uncertainty • Key to answering questions above (e.g. what is “natural variation” in a random sample?) (e.g. which effects are “significant”) Idea: Represent “how likely” something is by a number Simple Probability E.g. (will use for a while, since simplicity gives easy insights) Roll a die (6 sided cube, faces 1,2,…,6) • 1 of 6 faces is a “4” • So say “chances of a 4” are: “1 out of 6” 1 6. • What does that number mean? • How do we find such for harder problems? Simple Probability A way to make this precise: “Frequentist Approach” In many replications (repeat of die roll), expect about 16 of total will be 4s Terminology (attach buzzwords to ideas): Think about “outcomes” from an “experiment” e.g. #s on die e.g. roll die, observe # Simple Probability Quantify “how likely” by assigning “probabilities” I.e. a number between 0 and 1, to each outcome, reflecting “how likely”: Intuition: • 0 means “can’t happen” • ½ means “happens half the time” • 1 means “must happen” Simple Probability HW: C10: Match one of the probabilities: 0, 0.01, 0.3, 0.6, 0.99, 1 with each statement about an event: a. Impossible, can’t occur. b. Certain, will happen on every trial. c. Very unlikely, but will occur once in a long while. d. Event will occur more often than not. Simple Probability Main Rule: Sum of all probabilities (i.e. over all outcomes) is 1: P1 1 6 E.g. for die rolling: P2 1 6 P3 1 6 P4 1 6 P5 1 6 P6 1 6 1 Simple Probability HW: 4.13a 4.15 Probability General Rules for assigning probabilities: i. Frequentist View (what happens in many repititions?) ii. Equally Likely: for n outcomes P{one outcome} = 1/n (e.g. die rolling) iii. Based on Observed Frequencies e.g. life tables summarize when people die Gives “prob of dying” at a given age “life expectancy” Probability General Rules for assigning probabilities: iv. Personal Choice: – – – HW: 4.16 Reflecting “your assessment: E.g. Oddsmakers Careful: requires some care