Download Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Sociology 601, Class 4: September 10, 2009
Chapter 4: Distributions
• Probability distributions (4.1)
• The normal probability distribution (4.2)
• Sampling distributions (4.3, 4.4)
1
4.1: probability distributions
• We study probability to get an idea of how well
sample statistics match up to their population
parameters
• probability: the proportion of times that a
particular outcome would occur in a long run of
repeated observations
– example: you go to Monte Carlo and watch people
play roulette. What is the probability of observing the
number “23” in a single spin of a roulette wheel with
38 slots?
• probability distribution: a listing of possible
outcomes for a variable, together with their
probabilities
2
Probability distributions for discrete variables: formulas
• let y denote a possible outcome for variable Y, and let
P(y) denote the probability of that outcome.
– then 0 P(y)  1 and all y P(y) = 1
• the mean of a probability distribution:  = (y*P(y))
– why do we use  instead of Ybar?
– Is this equation compatible with our formula for a
sample mean?
• variance of a probability distribution: 2 = ((y-)2*P(y))
3
Probability distributions for discrete variables: 3 flips of a
coin
4
Probability distributions for discrete variables: example
(p. 83)
we will estimate parameters from this chart:
Answers to "How many people did you know
that were victims of homicide in the last 12
months?"
probability
1
0.91
0.5
0.06
0.02
0.01
1
2
3
0
0
number of victims
5
Calculating the mean, variance, and standard deviation
of a probability distribution
based on the previous chart:
y
P(y)
y*P(y)
µ
y-µ
(y - µ)2
(y - µ)2*P(y)
0
1
2
3
µ
σ2
σ
6
Probability distributions for continuous variables
• So far we have described discrete probability distributions
where the variable can take on only a finite number of values.
• As the number of possible values for the variable increases, the
probability distribution becomes a continuous function.
•
In such cases, we must solve areas under curves to find:
o Population mean or standard deviation
o Probability for a certain range of the x-variable.
7
4.2: The normal probability distribution
Many social and natural variables have a
distinctive continuous probability distribution
when we measure them, sort of a ‘bellshaped’ curve, or a normal distribution.
8
Examples of normal probability distributions
Graph on board:
• Normal distribution for adult women’s heights:
 = 64.3 inches,  = 2.8 inches
• Normal distribution for adult men’s heights:
 = 69.9 inches,  = 3.0 inches
9
Standardizing scores
Standardizing a score is taking a raw score, a mean, and
a standard deviation, and translating the score into a
number of standard deviations from the mean.
•formula:
•examples:
z=0
z = (y - ) / 
if
y=
then
y=+
z=1
y =  + 2
z=2
y =  - 2
z=-
2
10
Standardizing scores: Examples
Calculate a z-score for each example
1.
2.
3.
4.
5.
SAT score: y = 350,  = 500,  = 100
SAT score: y = 520,  = 500,  = 100
IQ score: y = 88,  = 100,  = 15
Woman’s height: y = 71,  = 65,  = 3.5
Psychological test: y = -2.58,  = 0,  = 1
11
General properties of the normal curve
• The normal curve is symmetric about the mean
• The normal curve is bell-shaped, with the
highest probability occurring at the mean
• for z from –1 to +1, the probability is about 0.68
• for z from –2 to +2, the probability is about 0.95
• for z from –3 to +3, the probability is about 0.997
If a curve is not symmetrical, or if a z-score is
inconsistent with the above probabilities, then it is not
a normal curve.
• any z-score is conceptually possible, because
the normal curve never quite converges to a
probability of zero.
12
Formula for a normal probability distribution
A normal probability distribution (e.g. the
probability distribution for a roll of 100
dice) is based on the formula:
P( y ) per unit of y 
1
*e
2 
 1  y  2 
 
 

 2   

• Note that  and  are both elements of the probability.
• This formula is impossible to integrate, so it is difficult
to calculate the probability that an observation will be
between y1 and y2.
13
A dilemma and a solution
• The dilemma: the universe is filled with
phenomena that have a probability
distribution we can’t calculate!
• The solution: since this distribution recurs so
often, it is worth the effort to painstakingly
estimate the probabilities associated with
each part of the normal distribution, list them
by z-scores, then put all the results in a table
for everybody to use. (see Appendix A, page
668)
14
– This is an important purpose of standardization.
Using Table A (page 668) to estimate areas under the
normal curve
• You are given a z-score and asked to find a p-value
Example: z = 1.53, p(z >1.53 = ?)
• 1.) Move down to the row with the first decimal (1.5)
• 2.) Move across to the row with the second decimal (.03)
• 3.) Write the corresponding p-value in an inequality
(P(z > 1.53) = .063, by chance alone)
• For negative z-scores, use the same procedure but reverse
the inequality. (p(z < -1.53) = .063, by chance alone)
15
Using Table A (page 668) to estimate areas under the
normal curve
Practice these examples:
•
•
•
•
what is p(z ≥ 1.19) by chance alone?
what is p(z ≤ - .04) by chance alone ?
what is p(-1 ≤ z ≤ 1) by chance alone?
what is p(z ≤ -1.96) or p(z ≥ 1.96) by chance
alone?
• what is p(|z| ≥ 1.96) by chance alone?
16
reading stata computer outputs #1
going between z-statistics and p-values using
DISPLAY NORMPROB and
DISPLAY INVNORM
note differences between these results and Page 668!
display invnorm(.025)
-1.959964
display invnorm(.975)
1.959964
* to verify that +/-1.96 are the z-scores you want
display normprob(-1.96)
.0249979
display normprob(1.96)
.9750021
17
Notes about working with the normal curve
• The table for deriving probabilities only works for
normal distributions.
• If you have some other distribution, you can still calculate σ and z,
but you can’t match z to a p-value.
• Axis references are often confusing in statistics
books:
• the x-axis often lists values for what we call the y-variable
• the y-axis often has no scale listed at all. It probably should have
values for probability per unit of the y-variable.
• Tables are also confusing:
• some texts provide tables for p(z<z), while
some texts provide tables for p(z>z).
• To save space, texts don’t provide information for z<0, it is assumed
that you understand that the distribution is symmetrical
18
4.3: Sampling distributions
Why would we care about a distribution of samples?
• We can’t study a population, but we can study a sample.
• We can’t know how well this sample reflects the
population, but we can use probability theory to study
how samples would tend to come out if we did know the
characteristics of the population.
19
Definitions:
• Sampling distribution: a probability distribution that
determines probabilities of a possible values of a
sample statistic (i.e. a relative frequency distribution
of many sample means).
• Standard error of a sampling distribution: a
measure of the typical distance between a sample
mean and a population mean
• Standard deviation of a population: a measure of
the typical distance between an observation and
the population mean.
20
Equations:
• Mean of a sampling
distribution:
• Standard error of a
sampling distribution:
Y  
Y 

n
– Example: estimate the standard error of this sample:
– 1, 3, 5, 5, 5, 7, 9
– Is this estimate the true standard error of the population?
21
An advantage of large samples:
The central limit theorem. As the sample size
n grows, the sampling distribution of Y(bar)
approaches a normal distribution.
• This is true even for variables that are not
normally distributed in the population, such as
age or income!
22
proportion of cases
per $10,000 income
range
Probability distribution of incom es for 160 households
0.4
0.2
0
0 to 9.9
20 to 29.9
40 to 49.9
60 to 69.9
80 to 89.9
household incom e (x1,000)
proportion of cases
per $10,000 income
range
Sam pling distribution of incom es for 20 sam ples, sam ple size = 4
0.4
0.2
0
0 to 9.9
20 to 29.9
40 to 49.9
60 to 69.9
household incom e (x1,000)
80 to 89.9
23
proportion of cases
per $10,000 income
range
Sam pling distribution of incom es for 20 sam ples, sam ple size = 8
0.4
0.2
0
0 to 9.9
20 to 29.9
40 to 49.9
60 to 69.9
80 to 89.9
household incom e (x1,000)
proportion of cases
per $10,000 income
range
Probability distribution of incom es for 1 sam ple of 8 households
0.6
0.4
0.2
0
0 to 9.9
20 to 29.9
40 to 49.9
60 to 69.9
household incom e (x1,000)
80 to 89.9
24
Why is the central limit theorem a big deal?
• When you use a sample statistic to guess a parameter, you will
want to know how good your guess is.
• If the distribution of sample means about the population mean
is normal, you can estimate how far off a given sample mean
might be.
• With a moderate sample size, the sampling distribution is
normal, even if the underlying distribution is not!
• However, you still may not have a large enough sample to
estimate the parameter with the precision you want.
25
Another advantage of large samples:
• The law of large numbers. The bigger the
sample, the closer (on average) the sample
statistic to the parameter.
• In other words, as samples become larger, the
variation between samples becomes smaller.
Y 

n
• Note: the law of large numbers does not
involve any sort of telos.(Example of 4th coin
26
toss)
The law of large numbers in action.
Here is the complete sampling distribution of possible
sample means for up to four coin tosses
• (score variable “heads” = “1” if heads, “0” if tails)
n=1
0
n=2
n=3
n=4
1
0
(0,0)
0
(0,0,0)
0
0,0,
0,0
.25
0,0,
0,1
.5
(0,1)
.33
(0,0,1)
.25
0,0,
1,0
.5
0,0,
1,1
.33
(0,1,0)
.25
0,1,
0,0
.5
0,1,
0,1
.5
(1,0)
.67
(0,1,1)
.5
0,1,
1,0
.75
0,1,
1,1
.33
(1,0,0)
.25
1,0,
0,0
.5
1,0,
0,1
1
(1,1)
.67
(1,0,1)
.5
1,0,
1,0
.75
1,0,
1,1
.67
(1,1,0)
.5
1,1,
0,0
.75
1,1,
0,1
1
(1,1,1)
.75
1,1,
1,0
27
1
1,1,
1,1
The law of large numbers: the standard error of a
sample shrinks as n increases
• Recall the formula for a variance of a probability distribution:
σ2 = Σ((y – μ)2 * P(y))
• For n = 1, σ2 = ((0 - .5)2 * .5) + ((1 - .5)2 * .5) = .25
• For n = 2, σ22 = .125,
• For n = 4, σ42 = .0625,
σ = .5
σ2 = .35
σ4 = .25
• The standard error is the standard deviation of a
distribution of samples.
• This is not the same thing as a standard deviation of a single sample, or
the standard deviation of a population.
• The sample standard deviation does not shrink as n increases.
28
Summary: Why we work with samples
• On average, a statistic from a good random sample
will have the same value as the corresponding
population parameter.
• With a larger sample, the sample statistic will be
closer to the population parameter on average.
• If the distribution of sample means is normal, one
can make additional guesses about how close the
sample statistic might be to the population
parameter.
• We assume the distribution of sample means is
normal …
- If n > 30 (by the central limit theorem), or
- If the population is normally distributed
29