Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sufficient statistic wikipedia , lookup
Foundations of statistics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Statistical inference wikipedia , lookup
Gibbs sampling wikipedia , lookup
Misuse of statistics wikipedia , lookup
Statistical St ti ti l Foundations: F d ti Point Estimation and Sampling Distributions Psychology 790 Lecture #6 – 9/12/2006 Today’ss Lecture Today • Homework Home ork qquestions? estions? – My comments on homework #1… – Any questions on homework #2? • Point estimation – How H we come up with ith the th mean, variance,… i • Sampling Distributions – Wh Whatt the th mean andd variance i give i us over the th long run. Lecture #3 Psychology 790 Example p for Today y We Begin With An Example • To fully describe today today’ss topics, let me begin by describing an example “experiment” p that will follow us throughout the lecture. • For whatever reason, lets say we are interested in measuring the body temperature from f individuals on campus. Lecture #3 Psychology 790 Let Me Introduce You to R • R iis a FREE statistics t ti ti package k that th t will ill help h l us run our example. – You can download R from http://www.r-project.org • R has an even bigger learning curve that SAS – so we will not use it for analysis y in this class. – Only examples where we can show statistical properties through simulation. Lecture #3 Psychology 790 Sim Data • Much like Sim City and The Sims, we will make use of simulation to ggenerate our data for our example. • Simulations are frequently used in applied statistics. – They test the properties of the statistics we commonly use. use – Simulated data cooperates a whole lot better than real data. Lecture #3 Psychology 790 Using R R… • Let me also note that the figures and data you see in these slides were created by me when I developed the slides. • As you will see, when I use R for examples in class, the numbers and figures will change. – I will be drawing g entirely y different data. • My “live” simulation will have different values from my canned simulation as long as I use a different random “canned” seed. Lecture #3 Psychology 790 Point Estimation Our First Sample • P Prior i tto discussing di i point i t estimation, ti ti we should h ld talk some about what we are about to do. • We need data: – So I will randomly sample the body temperatures (in Fahrenheit)) from 20 subjects. j – To make our life easy, the 20 subjects must be healthy to participate in our study. – None of the subjects were harmed in the “collection” of the data. Lecture #3 Psychology 790 Some Terminology • Kno Knowing ing that I am running r nning an “experiment” where I am recording the body temperature of 20 subjects, subjects let me ask you: – What is the sample space? • What is our expected range of observations? – How will this distribution look? – How should I numerically characterize this distribution? Lecture #3 Psychology 790 Our First Sample • S So off ff I go – collecting a sample of body temperatures temperatures. Lecture #3 Psychology 790 Some More Terminology • All of the numbers we could dream up to characterize this distribution are statistics. – “A statistic is simply a function on samples, such that any sample is paired with a value of that statistic (Hays, p. 205).” – An statistic is the result of the application of an estimator. estimator • Its value is an estimate. • A sample attempts to describe the nature of a distribution in the population l i at large. l – Therefore statistics collected from a sample are, hopefully, characteristic of the population from which the sample was drawn. • Where this goes bad is if: – The statistics do not have good properties in the long run. – The sample is not representative of the target population. Lecture #3 Psychology 790 Sample Statistics • So, So wee have ha e our o r sample: 98.63007 98.81505 98.54399 98.44912 98.76482 98.75803 98.52507 98.55176 98.62402 98.83739 98.67826 98.48538 98.79620 98.73230 98.33478 98.54835 98.43057 98.78581 98.77759 98.41347 • And we have some sample statistics: – – – – Lecture #3 Mean: 98 M 98.62 62 Median: 98.63 V i Variance (“n” (“ ” in i denominator): d i t ) 0.023 0 023 Standard Deviation (“n”): 0.15 Psychology 790 Point Estimates • A “point point estimate estimate” is the result of the use of some sample statistic to infer the value of a population parameter. – The word parameter will be used quite often in this course. – Next time we will talk about a theoretical distribution that has parameters we try to gather information about. • All of the statistics ppresented on the pprevious ppage g are examples of point estimates. • The choice of a certain statistic for use as a point estimate is driven by the statistical properties the estimate has in the long run. Lecture #3 Psychology 790 Desirable Properties of Estimators • Consistency – in the long run, the value of the statistic comes close to that of the parameter (as N increases, the variance around a parameter decreases). • Relative R l ti Effi Efficiency i – a good d estimator ti t will ill hhave lless variability i bilit around the population parameter than other estimators. • Sufficiency – The statistic contains all the information about the parameter available from the data. • Unbiasedness – the long run expectation of the statistic is identical to th value the l off the th parameter. t Lecture #3 Psychology 790 What is this “long long run” run • Most of the desirable properties of estimators had the phrase “in the long run.” • Other phrases talked about when speaking of “the the long run” are: – Asymptotically – In the limit – As N approaches infinity. • All this means is that if you had an infinite sample size, size that your statistic would come to accurately capture the population parameter value. Lecture #3 Psychology 790 Maximum Likelihood • In statistics statistics, we talk about maximum likelihood quite often. • There is a class of estimators formed by taking the value that maximizes a likelihood (called MLEs) • It is difficult to describe MLEs without a distribution, so we will hold this discussion on Thursday. Thursday – Until then, take comfort in the fact that the mean is an MLE. Lecture #3 Psychology 790 Sampling p g Distributions Sampling Distributions • Up to this point, point we talked about taking a sample of size N. – Our body temperature example had N=20. • Our sample consisted of the observations we collected. collected • Now imagine we were only interested in taking the value of a statistic from our sample. – Take the sample mean, mean for instance. instance • If we wanted to run an analogous experiment about means of body temperatures, we would need more samples. – This time a single observation would be the sample mean from a sample of N=20. Lecture #3 Psychology 790 Sampling Distributions • Definition: D fi iti – “A sampling distribution is a theoretical probability b bilit distribution di t ib ti that th t shows h the th relation l ti between possible values of a given statistic and the probability (density) associated with each values, for all possible samples of size N drawn from a pparticular ppopulation.” p (p. (p 206)) Lecture #3 Psychology 790 Example of Sampling Distributions • To demonstrate a sampling distrib distributions, tions consider taking repeated versions of our experiment previously: – To start, let’s take 10 different replications – we g go and gget 10 different samples p of 20 subjects each. – We then compute the mean of each of our samples. l – What does the distribution look like? Lecture #3 Psychology 790 Sampling Distribution of the Mean (N=20, Replications=10) • W We have h a total t t l off 10 means: • 98.61965 98.62748 98 53955 98.58505 98.53955 98 58505 98.65607 98.55390 98.60464 98.58298 98.60942 98.64139 Lecture #3 Psychology 790 More Means • From F th those 10 means, we have: h – The mean of the means = 98.602 – The standard deviation of the means = 0.035 0 035 • What is convenient to know would be the population parameters I used to simulate the data: – μ = 98.6 – σ2 = 0.2 Lecture #3 Psychology 790 Sampling Distribution of the Mean • With expectations expectations, we can show that the mean is unbiased: – The expected p value of the sample p mean is equal q to μ μ. 2 – The variance of the sample mean is σ N • We will ill learn l next time i that h as N gets large l (goes ( to infinity), the distribution of the sample means is Gaussian (or normal). – This is the central limit theorem, something we rely on quite often in statistics. Lecture #3 Psychology 790 Larger Samples… Samples • Wh Whatt if we did 1000 replications of our experiment. experiment – Mean of means = 98.60082 – Standard Deviation of means = 0.043 Lecture #3 Psychology 790 What About the Median? • Sampling distributions are not only for the mean. – Any statistic has a sampling distribution distribution. • Let’s do 1000 replications of our experiment and see what hat the median looks like. – Mean of medians = 98 60045 98.60045 – Standard Deviation of medians = 0.052 Lecture #3 Psychology 790 What About the Standard Deviation? • U Using i the th variance i formula with “N” in the denominator denominator, we get the following: – Mean of SDs = 0.192 – SD of SDs = 0.0326 Lecture #3 Psychology 790 What About the Standard Deviation? (UNBIASED VERSION) • U Using i the th variance i formula with “N-1” in the denominator denominator, we get the following: – Mean of SDs = 0.197 – SD of SDs = 0.0334 Lecture #3 Psychology 790 Finally the Variance Finally, Lecture #3 Psychology 790 Wrapping Up • This lecture covered some pretty fundamental statistical concepts that are critical to understand. – The numbers people use as statistics must be verifiable. • Sampling distributions are one method we use to obtain p-values for hypothesis tests. Lecture #3 Psychology 790 Next Time • The Th normall di distribution t ib ti in i all ll off its it glory. l • The central limit theorem. • MLEs of the normal distribution. • Why variance with “N” is biased. Lecture #3 Psychology 790