Download Point Estimation and Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Gibbs sampling wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Statistical
St
ti ti l Foundations:
F
d ti
Point Estimation and Sampling
Distributions
Psychology 790
Lecture #6 – 9/12/2006
Today’ss Lecture
Today
• Homework
Home ork qquestions?
estions?
– My comments on homework #1…
– Any questions on homework #2?
• Point estimation
– How
H we come up with
ith the
th mean, variance,…
i
• Sampling Distributions
– Wh
Whatt the
th mean andd variance
i
give
i us over the
th
long run.
Lecture #3
Psychology 790
Example
p for Today
y
We Begin With An Example
• To fully describe today
today’ss
topics, let me begin by
describing an example
“experiment”
p
that will
follow us throughout the
lecture.
• For whatever reason, lets
say we are interested in
measuring the body
temperature from
f
individuals on campus.
Lecture #3
Psychology 790
Let Me Introduce You to R
• R iis a FREE statistics
t ti ti package
k
that
th t will
ill help
h l us
run our example.
– You can download R from
http://www.r-project.org
• R has an even bigger learning curve that SAS –
so we will not use it for analysis
y in this class.
– Only examples where we can show statistical
properties through simulation.
Lecture #3
Psychology 790
Sim Data
• Much like Sim City and
The Sims, we will make
use of simulation to
ggenerate our data for our
example.
• Simulations are
frequently used in applied
statistics.
– They test the properties of
the statistics we commonly
use.
use
– Simulated data cooperates
a whole lot better than real
data.
Lecture #3
Psychology 790
Using R
R…
• Let me also note that the figures and data you see in these
slides were created by me when I developed the slides.
• As you will see, when I use R for examples in class, the
numbers and figures will change.
– I will be drawing
g entirely
y different data.
• My “live” simulation will have different values from my
canned simulation as long as I use a different random
“canned”
seed.
Lecture #3
Psychology 790
Point Estimation
Our First Sample
• P
Prior
i tto discussing
di
i point
i t estimation,
ti ti we should
h ld
talk some about what we are about to do.
• We need data:
– So I will randomly sample the body temperatures (in
Fahrenheit)) from 20 subjects.
j
– To make our life easy, the 20 subjects must be healthy
to participate in our study.
– None of the subjects were harmed in the “collection”
of the data.
Lecture #3
Psychology 790
Some Terminology
• Kno
Knowing
ing that I am running
r nning an
“experiment” where I am recording the
body temperature of 20 subjects,
subjects let me ask
you:
– What is the sample space?
• What is our expected range of observations?
– How will this distribution look?
– How should I numerically characterize this
distribution?
Lecture #3
Psychology 790
Our First Sample
• S
So off
ff I go –
collecting a sample of
body temperatures
temperatures.
Lecture #3
Psychology 790
Some More Terminology
•
All of the numbers we could dream up to characterize this
distribution are statistics.
– “A statistic is simply a function on samples, such that any sample is
paired with a value of that statistic (Hays, p. 205).”
– An statistic is the result of the application of an estimator.
estimator
• Its value is an estimate.
•
A sample attempts to describe the nature of a distribution in the
population
l i at large.
l
– Therefore statistics collected from a sample are, hopefully, characteristic
of the population from which the sample was drawn.
• Where this goes bad is if:
– The statistics do not have good properties in the long run.
– The sample is not representative of the target population.
Lecture #3
Psychology 790
Sample Statistics
• So,
So wee have
ha e our
o r sample:
98.63007 98.81505 98.54399 98.44912 98.76482 98.75803 98.52507 98.55176 98.62402 98.83739
98.67826 98.48538 98.79620 98.73230 98.33478 98.54835 98.43057 98.78581 98.77759 98.41347
• And we have some sample statistics:
–
–
–
–
Lecture #3
Mean: 98
M
98.62
62
Median: 98.63
V i
Variance
(“n”
(“ ” in
i denominator):
d
i t ) 0.023
0 023
Standard Deviation (“n”): 0.15
Psychology 790
Point Estimates
• A “point
point estimate
estimate” is the result of the use of some sample
statistic to infer the value of a population parameter.
– The word parameter will be used quite often in this course.
– Next time we will talk about a theoretical distribution that has
parameters we try to gather information about.
• All of the statistics ppresented on the pprevious ppage
g are
examples of point estimates.
• The choice of a certain statistic for use as a point estimate
is driven by the statistical properties the estimate has in
the long run.
Lecture #3
Psychology 790
Desirable Properties of Estimators
•
Consistency – in the long run, the value of the statistic comes close to
that of the parameter (as N increases, the variance around a parameter
decreases).
•
Relative
R
l ti Effi
Efficiency
i
– a good
d estimator
ti t will
ill hhave lless variability
i bilit
around the population parameter than other estimators.
•
Sufficiency – The statistic contains all the information about the
parameter available from the data.
•
Unbiasedness – the long run expectation of the statistic is identical to
th value
the
l off the
th parameter.
t
Lecture #3
Psychology 790
What is this “long
long run”
run
• Most of the desirable properties of estimators had the
phrase “in the long run.”
• Other phrases talked about when speaking of “the
the long
run” are:
– Asymptotically
– In the limit
– As N approaches infinity.
• All this means is that if you had an infinite sample size,
size
that your statistic would come to accurately capture the
population parameter value.
Lecture #3
Psychology 790
Maximum Likelihood
• In statistics
statistics, we talk about maximum likelihood
quite often.
• There is a class of estimators formed by taking
the value that maximizes a likelihood (called
MLEs)
• It is difficult to describe MLEs without a
distribution, so we will hold this discussion on
Thursday.
Thursday
– Until then, take comfort in the fact that the mean is an
MLE.
Lecture #3
Psychology 790
Sampling
p g Distributions
Sampling Distributions
• Up to this point,
point we talked about taking a sample of size
N.
– Our body temperature example had N=20.
• Our sample consisted of the observations we collected.
collected
• Now imagine we were only interested in taking the value
of a statistic from our sample.
– Take the sample mean,
mean for instance.
instance
• If we wanted to run an analogous experiment about
means of body temperatures, we would need more
samples.
– This time a single observation would be the sample mean from a
sample of N=20.
Lecture #3
Psychology 790
Sampling Distributions
• Definition:
D fi iti
– “A sampling distribution is a theoretical
probability
b bilit distribution
di t ib ti that
th t shows
h
the
th relation
l ti
between possible values of a given statistic
and the probability (density) associated with
each values, for all possible samples of size N
drawn from a pparticular ppopulation.”
p
(p.
(p 206))
Lecture #3
Psychology 790
Example of Sampling Distributions
• To demonstrate a sampling distrib
distributions,
tions
consider taking repeated versions of our
experiment previously:
– To start, let’s take 10 different replications –
we g
go and gget 10 different samples
p of 20
subjects each.
– We then compute the mean of each of our
samples.
l
– What does the distribution look like?
Lecture #3
Psychology 790
Sampling Distribution of the Mean
(N=20, Replications=10)
• W
We have
h
a total
t t l off 10
means:
• 98.61965 98.62748
98 53955 98.58505
98.53955
98 58505
98.65607 98.55390
98.60464 98.58298
98.60942 98.64139
Lecture #3
Psychology 790
More Means
• From
F
th
those 10 means, we have:
h
– The mean of the means = 98.602
– The standard deviation of the means = 0.035
0 035
• What is convenient to know would be the
population parameters I used to simulate the data:
– μ = 98.6
– σ2 = 0.2
Lecture #3
Psychology 790
Sampling Distribution of the Mean
• With expectations
expectations, we can show that the mean is
unbiased:
– The expected
p
value of the sample
p mean is equal
q to μ
μ.
2
– The variance of the sample mean is σ
N
• We will
ill learn
l
next time
i that
h as N gets large
l
(goes
(
to infinity), the distribution of the sample means
is Gaussian (or normal).
– This is the central limit theorem, something we rely on
quite often in statistics.
Lecture #3
Psychology 790
Larger Samples…
Samples
• Wh
Whatt if we did 1000
replications of our
experiment.
experiment
– Mean of means =
98.60082
– Standard Deviation of
means = 0.043
Lecture #3
Psychology 790
What About the Median?
• Sampling distributions
are not only for the mean.
– Any statistic has a
sampling distribution
distribution.
• Let’s do 1000 replications
of our experiment and see
what
hat the median looks
like.
– Mean of medians =
98 60045
98.60045
– Standard Deviation of
medians = 0.052
Lecture #3
Psychology 790
What About the Standard
Deviation?
• U
Using
i the
th variance
i
formula with “N” in
the denominator
denominator, we
get the following:
– Mean of SDs = 0.192
– SD of SDs = 0.0326
Lecture #3
Psychology 790
What About the Standard Deviation?
(UNBIASED VERSION)
• U
Using
i the
th variance
i
formula with “N-1” in
the denominator
denominator, we
get the following:
– Mean of SDs = 0.197
– SD of SDs = 0.0334
Lecture #3
Psychology 790
Finally the Variance
Finally,
Lecture #3
Psychology 790
Wrapping Up
• This lecture covered some
pretty fundamental
statistical concepts that
are critical to understand.
– The numbers people use as
statistics must be
verifiable.
• Sampling distributions
are one method we use to
obtain p-values for
hypothesis tests.
Lecture #3
Psychology 790
Next Time
• The
Th normall di
distribution
t ib ti in
i all
ll off its
it glory.
l
• The central limit theorem.
• MLEs of the normal distribution.
• Why variance with “N” is biased.
Lecture #3
Psychology 790