Download The Normal Distribution and Statistical Inference A

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
A Review and Some Connections
Lecture 3: The Normal Distribution and
Statistical Inference
The Normal Distribution
The Central Limit Theorem
Estimates of means and proportions: uses and properties
Sandy Eckel
[email protected]
Confidence intervals and Hypothesis tests
24 April 2008
1 / 36
The Normal Distribution
2 / 36
Normal Distribution
Normal Density, f(x)
A probability distribution for continuous data
Characterized by a symmetric bell-shaped curve
(Gaussian curve)
−∞
µ
+∞
x
Takes on values between −∞ and +∞
Mean = Median = Mode
Area under curve equals 1
Symmetric about its mean µ
Notation for Normal random variable: X ∼ N(µ, σ 2 )
Parameters
µ = mean
σ = standard deviation
Under certain conditions, can be used to approximate
Binomial(n,p) distribution
np>5
n(1-p)>5
3 / 36
4 / 36
Standard Normal
Formula: Normal Probability Density Function (pdf)
Normal Density, f(x)
Definition: a Normal distribution N(µ, σ 2 ) with parameters
µ = 0 and σ = 1
Its density function is written as:
µ
−∞
1
2
f (x) = √ · e −x /2 , −∞ < x < +∞
2π
+∞
x
We typically use the letter Z to denote a standard normal
random variable (Z ∼ N(0, 1))
The normal probability density function for X ∼ N(µ, σ 2 ) is:
f (x) = √
Important! We use the standard normal all the time because
if X ∼ N(µ, σ 2 ), then X σ−µ ∼ N(0, 1)
1
2
2
· e −(x−µ) /2σ , −∞ < x < +∞
2πσ
This process is called “standardizing” a normal random
variable
Note: π ≈ 3.14 and e ≈ 2.72 are mathematical constants
5 / 36
68% of the density is within one standard deviation of the mean
95% of the density is within two standard deviations of the mean
Normal Density, f(x)
68-95-99.7 Rule II
Normal Density, f(x)
68-95-99.7 Rule I
6 / 36
0.68
0.16
−∞
µ − 1σ
σ
0.16
µ
µ + 1σ
σ
0.95
0.025
−∞
+∞
µ − 2σ
σ
0.025
µ
µ + 2σ
σ
+∞
x
x
7 / 36
8 / 36
68-95-99.7 Rule III
Different Means
Normal Density
Normal Density, f(x)
99.7% of the density is within three standard deviations of the
mean
0.997
0.0015
−∞
µ − 3σ
σ
0.0015
µ
µ + 3σ
σ
+∞
x
µ1
µ2
µ3
x
Three normal distributions with different means
µ1 < µ2 < µ3
9 / 36
Different Standard Deviations
10 / 36
Standard Normal N(0,1)
σ=1
Normal Density
Normal Density
σ1
σ2
σ3
−4
0
2
4
µ=0
x
Three normal distributions with different standard deviations
σ1 < σ2 < σ3
−2
11 / 36
12 / 36
Example: Birthweights (in grams) of infants in a
population
Normal Probabilities
Density
We are often interested in the probability that z takes on values
between z0 and z1
Z z1
1
2
√ · e −z /2 dz
P(z0 ≤ z ≤ z1 ) =
2π
z0
0
1000
2000
3000
4000
5000
6000
Weights
Continuous data
Mean = Median = Mode = 3000 = µ
Standard deviation = 1000 = σ
The area under the curve represents the probability
(proportion) of infants with birthweights between certain
values
13 / 36
Z Tables
How do we calculate this probability?
Equivalent to finding area under the curve
Continuous distribution, so we cannot use sums to find
probabilities
Performing the integration is not necessary since tables and
computers are available
14 / 36
But...we’ll use R
For standard normal random variables Z ∼ N(0, 1) we’ll use
1
2
pnorm(?) to find P(Z ≤?)
pnorm(?, lower.tail=F) to find P(Z ≥?)
<?
?
>?
?
For any normal random variable X ∼ N(µ, σ 2 )
(but taking X ∼ N(2, 32 ) as an example) we’ll use
1
2
15 / 36
pnorm(?, mean=2, sd=3) to find P(X ≤?)
pnorm(?, mean=2, sd=3, lower.tail=F) to find P(X ≥?)
16 / 36
Question I
Example: Birthweights (in grams)
Density
What is the probability of an infant weighing more than 5000g?
0
1000
2000
3000
4000
5000
6000
X −µ
5000 − 3000
>
)
σ
1000
= P(Z > 2)
Weights
P(X > 5000) = P(
µ = 3000
σ = 1000
X
Z
= birthweight
X −µ
=
σ
= 0.0228
Get this using pnorm(2, lower.tail=F) (since we standardized)
17 / 36
Question II
18 / 36
Question III
What is the probability of an infant weighing between 2500 and
4000g?
What is the probability of an infant weighing less than 3500g?
X −µ
4000 − 3000
2500 − 3000
<
<
)
1000
σ
1000
= P(−0.5 < Z < 1)
P(2500 < X < 4000) = P(
X −µ
3500 − 3000
P(X < 3500) = P(
<
)
σ
1000
= P(Z < 0.5)
= 1 − P(Z > 1) − P(Z < −0.5)
= 0.6915
= 1 − 0.1587 − 0.3085
= 0.5328
19 / 36
20 / 36
Statistical Inference
Definitions
Statistical inference is “the attempt to reach a conclusion
concerning all members of a class from observations of only
some of them.” (Runes 1959)
A population is a collection of observations
Populations and samples
Sampling distributions
A parameter is a numerical descriptor of a population
A sample is a part or subset of a population
A statistic is a numerical descriptor of the sample
21 / 36
Population vs. Sample
22 / 36
Estimating the population mean, µ
Population
population size = N
µ = mean, a measure of center
σ 2 = variance, a measure of dispersion
σ = standard deviation
Sample from the population is used to calculate sample estimates
(statistics) that approximate population parameters
sample size = n
X̄ = sample mean
s 2 = sample variance
s = sample standard deviation
Usually µ is unknown and we would like to estimate it
We use X̄ to estimate µ
We know the sampling distribution of X̄
Definition: Sampling distribution
The distribution of all possible values of some statistic, computed
from samples of the same size randomly drawn from the same
population, is called the sampling distribution of that statistic
Population: parameters
Sample: statistics
23 / 36
24 / 36
The Central Limit Theorem (CLT)
Sampling Distribution of X̄
Distribution of Sample Mean X
Population Distribution of X
Density
X~N(µ
µ,σ
σ2 n)
Density
X~N(µ
µ,σ
σ 2)
n=10
n=30
n=100
µ
µ
X
X
When sampling from a normally distributed population
X̄ will be normally distributed
The mean of the distribution of X̄ is equal to the true mean µ
of the population from which the samples were drawn
The variance of the distribution is σ 2 /n, where σ 2 is the
variance of the population and n is the sample size
We can write: X̄ ∼ N(µ, σ 2 /n)
When sampling from a population whose distribution is not normal
and the sample size is large, use the Central Limit Theorem
Given a population of any distribution with mean, µ, and variance,
σ 2 , the sampling distribution of X̄ , computed from samples of size
n from this population, will be approximately N(µ, σ 2 /n) when
the sample size is large
In general, this applies when n ≥ 25
The approximation of normality becomes better as n increases
25 / 36
What if a random variable has a Binomial distribution?
26 / 36
Binomial CLT
First, recall that a Binomial
P variable is just the sum of n
Bernoulli variable: Sn = ni=1 Xi
Notation:
For a Bernoulli variable
Sn ∼ Binomial(n,p)
Xi ∼ Bernoulli(p) = Binomial(1, p) for i = 1, . . . , n
µ = mean = p
σ 2 = variance = p(1-p)
In this case, we want to estimate p by p̂ where
Pn
Xi
Sn
p̂ =
= i=1
= X̄
n
n
X̄ ≈ N(µ, σ 2 /n) as before
)
Equivalently, p̂ ≈ N(p, p(1−p)
n
p̂ is just a sample mean!
So we can use the central limit theorem when n is large
27 / 36
28 / 36
Distribution of Differences
Distribution of Differences: Notation
Population 1:
Samples of size n1 from Population 1:
Mean = µX̄1 = µ1
Size = N1
Often we are interested in detecting a difference between two
populations
Standard deviation =
√
σ1 / n1 = σX̄1
Mean = µ1
Standard deviation = σ1
Differences in average income by neighborhood
Population 2:
Differences in disease cure rates by age
Size = N2
Samples of size n2 from Population 2:
Mean = µX̄2 = µ2
Mean = µ2
Standard deviation =
√
σ2 / n2 = σX̄2
Standard deviation = σ2
29 / 36
Distribution of Differences: CLT result
Difference in proportions?
We’re done if the underlying variable is continuous. What if
the underlying variable is Binomial?
Now by CLT, for large n:
X̄1 ∼ N(µ1 , σ12 /n1 )
X̄2 ∼ N(µ2 , σ22 /n2 )
and X̄1 − X̄2 ≈ N(µ1 − µ2 ,
30 / 36
Then X̄1 − X̄2 ≈ N(µ1 − µ2 ,
is replaced by:
σ12
n1
+
σ22
n2 )
p̂1 − p̂2 ≈ N(p1 − p2 ,
31 / 36
σ12
n1
+
σ22
n2 )
p1 (1 − p1 ) p2 (1 − p2 )
+
)
n1
n2
32 / 36
Summary of Sampling Distributions
Statistic
X̄
X̄1 − X̄2
p̂
np̂
p̂1 − p̂2
Statistical inference
Sampling Distribution
Mean
Variance
σ2
µ
n
µ1 - µ2
p
np
p1 − p2
σ12
n1
+
pq
n
Two methods
Estimation (Confidence intervals)
Hypothesis testing
σ22
n2
Both make use of sampling distributions
Remember to use CLT
npq
+ pn2 q2 2
p1 q1
n1
33 / 36
Rest of material moved to lecture 4
34 / 36
Lecture 3 Summary
The Normal Distribution
The Central Limit Theorem
Sampling distributions
We didn’t get a chance to cover the rest of the material, so it has
been moved to lecture 4.
Next time, we’ll discuss
Confidence intervals for population parameters
The t-distribution
Hypothesis testing (p-values)
35 / 36
36 / 36