Download Distribution of sample means - the Department of Psychology at

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Gibbs sampling wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Review from last time:
-2 -1 0
1
2
Example 2: What proportion of scores falls between -.2 standard
deviations and -.6 standard deviations?
1.
2.
3.
Convert each score to a z score (-.2 and -.6)
Draw a graph of the normal distribution and shade out the area to be
identified.
Identify the area below the highest z score using the unit normal table:
For z=-.2, the proportion to the left = 1 - .5793 = .4207
4.
Identify the area below the lowest z score using the unit normal table.
For z=-.6, the proportion to the left = 1 - .7257 = .2743
5.
Subtract step 4 from step 3:
.4207 - .2743 = .1464
About 15% of the observations fall between -.2 and -.6 SD.
Probability & Samples: Distribution of Sample Means
Last time…
We learned how to convert a distribution of raw
scores into a distribution of z-scores, and vice versa.
We reviewed some basic probability concepts and
observed how these apply to scores and
distributions.
Next we will learn about how to apply probability
concepts to the binomial distribution (chapter 6), and
to the distribution of sample means (chapter 7).
Questions before we move on?
Why do we need to know about probability in this class?
Inferential statistics
–Focused on making inferences about a
population based on sample data
–Probability helps us connect a sample to its
population
–If we know (or can estimate) population
parameters, we can use probability to tell us
how likely (or unlikely) it is that a given sample
came from the population of interest
Check your understanding
• Next we will see how
the shape of the
binomial distribution is
similar to that of the
normal distribution.
• Before we move on, any
questions about use of
the unit normal table?
Flipping a coin example
3=
n
=
2
2
8 total outcomes
HHH
Number of heads
3
HHT
2
HTH
2
HTT
1
THH
2
THT
1
TTH
1
TTT
0
Flipping a coin example
Number of heads
3
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
2
X
f
p
3
1
.125
2
2
1
3
3
.375
.375
1
0
1
.125
1
2
1
0
Flipping a coin example
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
Can make predictions about
likelihood of outcomes based on
this distribution.
What’s the probability of
flipping three heads in a
row?
p = 0.125
Flipping a coin example
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
Can make predictions about
likelihood of outcomes based on
this distribution.
What’s the probability of
flipping at least two heads
in three tosses?
p = 0.375 + 0.125 = 0.50
Flipping a coin example
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
Can make predictions about
likelihood of outcomes based on
this distribution.
What’s the probability of
flipping all heads or all tails
in three tosses?
p = 0.125 + 0.125 = 0.25
Binomial Distribution
•
•
•
•
•
•
•
Two categories of outcomes (A, B) (e.g., coin toss)
p=p(A) = Probability of A (e.g., Heads)
q=p(B) = Probability of B (e.g., Tails)
p + q = 1.0 (e.g., .5 + .5; could be different values)
n = number of observations (e.g., coin tosses)
X = number of times category A occurs in a sample
If pn > 10 and qn > 10, X follows a nearly normal
distribution with μ = pn and σ = npq
New Topic
Sample Distributions & The Central
Limit Theorem
Central Limit Theorem (p. 205)
For any population with mean μ and standard
deviation σ, the distribution of sample means for
sample size n will approach a normal
distribution with a mean of μ and a standard
deviation of s and will approach a normal
n
distribution as n approaches infinity
This theorem provides the conceptual foundation of
most of the inferential statistics covered in this class.
Today we will learn about what it means and why it
makes sense. In the next class we will see how the
Central Limit Theorem makes inferential statistics
possible.
Flipping a coin example
3=
n
=
2
2
8 total outcomes
HHH
Number of heads
3
HHT
2
HTH
2
HTT
1
THH
2
THT
1
TTH
1
TTT
0
Flipping a coin example
Number of heads
3
Distribution of possible outcomes
probability
(n = 3 flips)
.4
.3
.2
.1 .125
.375 .375 .125
0 1 2 3
Number of heads
2
X
f
p
3
1
.125
2
2
1
3
3
.375
.375
1
0
1
.125
1
2
1
0
Hypothesis testing
Distribution of possible outcomes
(of a particular sample size, n)
Can make predictions about
likelihood of outcomes based on
this distribution.
• In hypothesis testing, we
compare our observed samples
with the distribution of possible
samples (transformed into
standardized distributions)
• This distribution of possible
outcomes is often Normally
Distributed
Distribution of sample means
• So far, when we have used the unit normal table
to decide how “unlikely” a particular score is,
our “comparison distribution” has been a
distribution of individual scores
• In social science research, we are usually
interested in making inferences about a mean of
a group of scores (not just one score).
– Comparison distribution is the distribution of
all possible sample means of a given sample
size (“distribution of sample means” for short)
Distribution of sample means
• A simple case
– Population:
2
4
6
– All possible samples of size n = 2
8
Assumption: sampling
with replacement
Distribution of sample means
• A simple case
– Population:
2
4
– All possible samples of size n = 2
mean
mean
2
2
4
6
2
5
2
4
2
6
2
8
4
2
4
4
3
4
5
4
8
6
2
6
4
3
4
6
6
6
8
6
4
5
6
7
6
8
There are 16 of them
mean
8 2
5
8
4
8
6
8
8
6
7
8
Distribution of sample means
5
4
3
2
1
In long run, the random selection of tiles
leads to a predictable pattern
2 3 4 5 6 7 8
means
2
mean
2
2
4
mean
6
5
8
mean
2
5
2
4
3
4
5
4
8
8
4
2
6
6
2
8
6
2
8
6
4
8
8
4
2
3
4
6
6
4
4
6
8
6
4
5
6
7
6
7
8
Distribution of sample means
• Sample problem:
5
4
3
2
1
– What is the probability of getting a sample
with a mean of 6 or more?
2 3 4 5 6 7 8
means
X
f
p
8
1
0.0625
7
2
0.1250
6
3
0.1875
5
4
0.2500
4
3
0.1875
3
2
0.1250
2
1
0.0625
P(M > 6) = .1875 + .1250 + .0625 = 0.375
• Same as before, except now we’re
asking about sample means rather
than single scores
Distribution of sample means
• Distribution of sample means is a “virtual” distribution
between the sample and population
Population
Distribution of sample means
Sample
Properties of the distribution of sample
means
• Shape
– If population is Normal, then the distribution of sample
means will be Normal
– If the sample size is large (n > 30), the distribution of sample
means will be normal regardless of shape of the population
Distribution of sample means
Population
N > 30
Properties of the distribution of sample
means
• Center
– The mean of the dist of sample means is equal to the mean of
the population
Population
m
Distribution of sample means
same numeric value
different conceptual values
mM
Properties of the distribution of sample
means
• Center
– The mean of the dist of sample means is equal to the mean of
the population
– Consider our earlier example
Population
2
4
6
Distribution of sample means
5
4
3
2
1
8
μ= 2 + 4 + 6 + 8
4
=5
mM
2 3 4 5 6 7 8
means
2+3+4+5+3+4+5+6+4+5+6+7+5+6+7+8
16
=5
=
Properties of the distribution of
sample means
• Spread
– The standard deviation of the distribution of sample means
depends on two things
• Standard deviation of the population
(as the standard deviation of the population gets larger, the standard
deviation of the distribution of sample means also gets larger)
• Sample size
(as the sample size gets larger, the standard deviation of the
distribution of sample means gets smaller – law of large numbers)
Properties of the distribution of sample
means
• Spread
• Standard deviation of the population
• The smaller the population variability, the closer the sample
means are to the population mean
X3 X1μ
X2
X3 Xμμ
X2
Properties of the distribution of sample means
• Spread
• Sample size
μ
M
n=1
Properties of the distribution of sample
means
• Spread
• Sample size
n = 10
μ
M
Properties of the distribution of sample
means
• Spread
• Sample size
n = 100
The larger the sample
size the smaller the
spread
μ
M
Properties of the distribution of sample
means
• Spread
• Standard deviation of the population
• Sample size
– Putting them together we get the standard deviation of the
distribution of sample means
sM =
s
n
– Commonly called the standard error (= SE = SEM = σM)
– Can be thought of as the reliability of sample means (that is
consistency expected between different measurements of
the mean)
Standard error
• The standard error is the average amount that
you’d expect a sample (of size n) to deviate
from the population mean
– In other words, it is an estimate of the error that
you’d expect by chance (or by sampling)
• The standard error is similar to the
standard deviation, but it is important to
know the difference between the two,
both conceptually and mathematically!!!
Distribution of sample means
• Keep your distributions straight by taking care
with your notation
Population
σ
μ
Distribution of sample means
sM
mM
Sample
s
M
Properties of the distribution of
sample means
• All three of these properties of the distribution
of sample means (shape, center, and spread) are
combined to form the Central Limit Theorem
– For any population with mean μ and standard deviation
σ, the distribution of sample means for sample size n
will approach a normal distribution with a mean of μ
s
and a standard deviation of
as n approaches infinity
n
(good approximation if n > 30).
Who came up with the CLT & why?
• Developed over more than a century and
attributed to several different mathematicians.
– Abraham DeMoivre (early-mid 1700s): While
studying “games of chance” discovered that “coin
toss” probabilities follow the normal distribution.
– Pierre-Simon Laplace (late 1700s-early 1800s):
Expanded on DeMoivre’s work while trying to
estimate (via probability distributions) sums of
meteor inclination angles.
The Central Limit Theorem is Your Friend
Do yourself a favor and
MEMORIZE IT!!
The Central Limit Theorem is Your Friend
• It helps us make inferences about
sample statistics (e.g., means)
• For example, it can help us
determine how likely or unlikely
a particular sample mean is,
given what we know about the
population parameters.
Probability & the Distribution of Sample Means
• We can use the Central Limit Theorem to
calculate z-scores associated with individual
sample means (the z-scores are based on the
distribution of all possible sample means).
• Each z-score describes the exact location of its
respective sample mean, relative to the
distribution of sample means.
• Since the distribution of sample means is
normal, we can then use the unit normal table to
determine the likelihood of obtaining a sample
mean greater/less than a specific sample mean.
Probability & the Distribution of Sample Means
When using z scores to represent sample means, the
correct formula to use is:
ZM =
M -m
sM
Probability & the Distribution of Sample Means
EXAMPLE: What is the probability of obtaining a sample
mean greater than M = 60 for a random sample of n =
16 scores selected from a normal population with a
mean of μ = 65 and a standard deviation of σ = 20?
M = 60; μ = 65; σ = 20; n = 16
s
20 20
sM =
=
=
=5
n
16 4
ZM =
M -m
sM
60 - 65
== -1
5
p(ZM > -1) = .8413
Last time we reviewed
• Z-Scores
• Probability
• The connection between probability and
distributions of individual scores
• How to use the unit normal table to find
probabilities associated with z-scores
Today we reviewed
• The Central Limit Theorem & distribution of
sample means
• The connection between probability and the
distribution of sample means
Last topic before the exam:
• Hypothesis testing (pulls together
everything we’ve learned so far and applies
it to testing hypotheses about about
sample means).
• Before we move on, questions about CLT,
distributions of samples, standard error of
the mean and how to calculate it?
Hypothesis testing
• Example: Testing the effectiveness of a new memory
treatment for patients with memory problems
– Our pharmaceutical company develops a new drug
treatment that is designed to help patients with
impaired memories.
– Before we market the drug we want to see if it works.
– The drug is designed to work on all memory patients,
but we can’t test them all (the population).
– So we decide to use a sample and conduct an
experiment.
– Based on the results from the sample we will make
conclusions about the population.
– Next time we’ll find out exactly how to do this!