Download Lecture 3: The Normal Distribution and Statistical

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Lecture 3: The Normal Distribution and
Statistical Inference
Sandy Eckel
[email protected]
24 April 2008
1 / 36
A Review and Some Connections
The Normal Distribution
The Central Limit Theorem
Estimates of means and proportions: uses and properties
Confidence intervals and Hypothesis tests
2 / 36
The Normal Distribution
A probability distribution for continuous data
Characterized by a symmetric bell-shaped curve
(Gaussian curve)
Symmetric about its mean µ
Under certain conditions, can be used to approximate
Binomial(n,p) distribution
np>5
n(1-p)>5
3 / 36
Normal Density, f(x)
Normal Distribution
−∞
µ
+∞
x
Takes on values between −∞ and +∞
Mean = Median = Mode
Area under curve equals 1
Notation for Normal random variable: X ∼ N(µ, σ 2 )
Parameters
µ = mean
σ = standard deviation
4 / 36
Normal Density, f(x)
Formula: Normal Probability Density Function (pdf)
−∞
µ
+∞
x
The normal probability density function for X ∼ N(µ, σ 2 ) is:
f (x) = √
1
2
2
· e −(x−µ) /2σ , −∞ < x < +∞
2πσ
Note: π ≈ 3.14 and e ≈ 2.72 are mathematical constants
5 / 36
Standard Normal
Definition: a Normal distribution N(µ, σ 2 ) with parameters
µ = 0 and σ = 1
Its density function is written as:
1
2
f (x) = √ · e −x /2 , −∞ < x < +∞
2π
We typically use the letter Z to denote a standard normal
random variable (Z ∼ N(0, 1))
Important! We use the standard normal all the time because
if X ∼ N(µ, σ 2 ), then X σ−µ ∼ N(0, 1)
This process is called “standardizing” a normal random
variable
6 / 36
68-95-99.7 Rule I
Normal Density, f(x)
68% of the density is within one standard deviation of the mean
0.68
0.16
−∞
µ − 1σ
σ
0.16
µ
µ + 1σ
σ
+∞
x
7 / 36
68-95-99.7 Rule II
Normal Density, f(x)
95% of the density is within two standard deviations of the mean
0.95
0.025
−∞
µ − 2σ
σ
µ
0.025
µ + 2σ
σ
+∞
x
8 / 36
68-95-99.7 Rule III
Normal Density, f(x)
99.7% of the density is within three standard deviations of the
mean
0.997
0.0015
−∞
µ − 3σ
σ
0.0015
µ
µ + 3σ
σ
+∞
x
9 / 36
Normal Density
Different Means
µ1
µ2
µ3
x
Three normal distributions with different means
µ1 < µ2 < µ3
10 / 36
Different Standard Deviations
Normal Density
σ1
σ2
σ3
x
Three normal distributions with different standard deviations
σ1 < σ2 < σ3
11 / 36
Standard Normal N(0,1)
Normal Density
σ=1
−4
−2
0
2
4
µ=0
12 / 36
Density
Example: Birthweights (in grams) of infants in a
population
0
1000
2000
3000
4000
5000
6000
Weights
Continuous data
Mean = Median = Mode = 3000 = µ
Standard deviation = 1000 = σ
The area under the curve represents the probability
(proportion) of infants with birthweights between certain
values
13 / 36
Normal Probabilities
We are often interested in the probability that z takes on values
between z0 and z1
Z z1
1
2
√ · e −z /2 dz
P(z0 ≤ z ≤ z1 ) =
2π
z0
How do we calculate this probability?
Equivalent to finding area under the curve
Continuous distribution, so we cannot use sums to find
probabilities
Performing the integration is not necessary since tables and
computers are available
14 / 36
Z Tables
15 / 36
But...we’ll use R
For standard normal random variables Z ∼ N(0, 1) we’ll use
1
2
pnorm(?) to find P(Z ≤?)
pnorm(?, lower.tail=F) to find P(Z ≥?)
<?
?
>?
?
For any normal random variable X ∼ N(µ, σ 2 )
(but taking X ∼ N(2, 32 ) as an example) we’ll use
1
2
pnorm(?, mean=2, sd=3) to find P(X ≤?)
pnorm(?, mean=2, sd=3, lower.tail=F) to find P(X ≥?)
16 / 36
Density
Example: Birthweights (in grams)
0
1000
2000
3000
4000
5000
6000
Weights
µ = 3000
σ = 1000
X
Z
= birthweight
X −µ
=
σ
17 / 36
Question I
What is the probability of an infant weighing more than 5000g?
X −µ
5000 − 3000
>
)
σ
1000
= P(Z > 2)
P(X > 5000) = P(
= 0.0228
Get this using pnorm(2, lower.tail=F) (since we standardized)
18 / 36
Question II
What is the probability of an infant weighing less than 3500g?
X −µ
3500 − 3000
<
)
σ
1000
= P(Z < 0.5)
P(X < 3500) = P(
= 0.6915
19 / 36
Question III
What is the probability of an infant weighing between 2500 and
4000g?
X −µ
4000 − 3000
2500 − 3000
<
<
)
1000
σ
1000
= P(−0.5 < Z < 1)
P(2500 < X < 4000) = P(
= 1 − P(Z > 1) − P(Z < −0.5)
= 1 − 0.1587 − 0.3085
= 0.5328
20 / 36
Statistical Inference
Populations and samples
Sampling distributions
21 / 36
Definitions
Statistical inference is “the attempt to reach a conclusion
concerning all members of a class from observations of only
some of them.” (Runes 1959)
A population is a collection of observations
A parameter is a numerical descriptor of a population
A sample is a part or subset of a population
A statistic is a numerical descriptor of the sample
22 / 36
Population vs. Sample
Population
population size = N
µ = mean, a measure of center
σ 2 = variance, a measure of dispersion
σ = standard deviation
Sample from the population is used to calculate sample estimates
(statistics) that approximate population parameters
sample size = n
X̄ = sample mean
s 2 = sample variance
s = sample standard deviation
Population: parameters
Sample: statistics
23 / 36
Estimating the population mean, µ
Usually µ is unknown and we would like to estimate it
We use X̄ to estimate µ
We know the sampling distribution of X̄
Definition: Sampling distribution
The distribution of all possible values of some statistic, computed
from samples of the same size randomly drawn from the same
population, is called the sampling distribution of that statistic
24 / 36
Sampling Distribution of X̄
Distribution of Sample Mean X
Population Distribution of X
Density
X~N(µ
µ,σ
σ2 n)
Density
X~N(µ
µ,σ
σ2)
n=10
n=30
n=100
µ
µ
X
X
When sampling from a normally distributed population
X̄ will be normally distributed
The mean of the distribution of X̄ is equal to the true mean µ
of the population from which the samples were drawn
The variance of the distribution is σ 2 /n, where σ 2 is the
variance of the population and n is the sample size
We can write: X̄ ∼ N(µ, σ 2 /n)
When sampling from a population whose distribution is not normal
and the sample size is large, use the Central Limit Theorem
25 / 36
The Central Limit Theorem (CLT)
Given a population of any distribution with mean, µ, and variance,
σ 2 , the sampling distribution of X̄ , computed from samples of size
n from this population, will be approximately N(µ, σ 2 /n) when
the sample size is large
In general, this applies when n ≥ 25
The approximation of normality becomes better as n increases
26 / 36
What if a random variable has a Binomial distribution?
First, recall that a Binomial
Pn variable is just the sum of n
Bernoulli variable: Sn = i=1 Xi
Notation:
Sn ∼ Binomial(n,p)
Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n
In this case, we want to estimate p by p̂ where
Pn
Xi
Sn
p̂ =
= i=1
= X̄
n
n
p̂ is just a sample mean!
So we can use the central limit theorem when n is large
27 / 36
Binomial CLT
For a Bernoulli variable
µ = mean = p
σ 2 = variance = p(1-p)
X̄ ≈ N(µ, σ 2 /n) as before
)
Equivalently, p̂ ≈ N(p, p(1−p)
n
28 / 36
Distribution of Differences
Often we are interested in detecting a difference between two
populations
Differences in average income by neighborhood
Differences in disease cure rates by age
29 / 36
Distribution of Differences: Notation
Population 1:
Samples of size n1 from Population 1:
Size = N1
Mean = µX̄1 = µ1
Mean = µ1
Standard deviation =
√
σ1 / n1 = σX̄1
Standard deviation = σ1
Population 2:
Size = N2
Mean = µ2
Standard deviation = σ2
Samples of size n2 from Population 2:
Mean = µX̄2 = µ2
Standard deviation =
√
σ2 / n2 = σX̄2
30 / 36
Distribution of Differences: CLT result
Now by CLT, for large n:
X̄1 ∼ N(µ1 , σ12 /n1 )
X̄2 ∼ N(µ2 , σ22 /n2 )
and X̄1 − X̄2 ≈ N(µ1 − µ2 ,
σ12
n1
+
σ22
n2 )
31 / 36
Difference in proportions?
We’re done if the underlying variable is continuous. What if
the underlying variable is Binomial?
Then X̄1 − X̄2 ≈ N(µ1 − µ2 ,
is replaced by:
p̂1 − p̂2 ≈ N(p1 − p2 ,
σ12
n1
+
σ22
n2 )
p1 (1 − p1 ) p2 (1 − p2 )
+
)
n1
n2
32 / 36
Summary of Sampling Distributions
Statistic
X̄
Sampling Distribution
Mean
Variance
σ2
µ
n
X̄1 − X̄2
p̂
np̂
p̂1 − p̂2
µ1 - µ2
p
np
p1 − p2
σ12
n1
+
pq
n
σ22
n2
npq
+ pn2 q2 2
p1 q1
n1
33 / 36
Statistical inference
Two methods
Estimation (Confidence intervals)
Hypothesis testing
Both make use of sampling distributions
Remember to use CLT
34 / 36
Rest of material moved to lecture 4
We didn’t get a chance to cover the rest of the material, so it has
been moved to lecture 4.
35 / 36
Lecture 3 Summary
The Normal Distribution
The Central Limit Theorem
Sampling distributions
Next time, we’ll discuss
Confidence intervals for population parameters
The t-distribution
Hypothesis testing (p-values)
36 / 36