Download Normal Distribution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
STAT 101
Dr. Kari Lock Morgan
10/18/12
Normal Distribution
Chapter 5
• Normal distribution
• Central limit theorem
• Normal distribution for confidence intervals
• Normal distribution for p-values
• Standard normal
Statistics: Unlocking the Power of Data
Lock5
Project 1
• Due Tuesday
• 5 pages, double spaced, including figures
• Hypotheses should not change based on data
• This is a research paper – there should be
text and complete sentences.
Statistics: Unlocking the Power of Data
Lock5
Bootstrap and Randomization Distributions
Correlation: Malevolent
uniforms
Measures from Scrambled Collection 1
Slope :Restaurant
tips
Measures from Scrambled RestaurantTips
-60
-40
Dot Plot
-20
0
20
slope (thousandths)
Mean :Body
Temperatures
Measures from Sample of BodyTemp50
98.2
98.3
98.4
40
-0.4
-0.2
0.0
r
0.2
All
bell-shaped
What
do you
Diff means: Finger taps
distributions!
notice?
0.4
0.6
Dot Plot
Measures from Scrambled CaffeineTaps
98.5
98.6
Nullxbar
98.7
98.8
0.5
phat
0.6
98.9
Dot Plot
Dot Plot
99.0
-4
Proportion : Owners/dogs
0.4
60
-0.6
Measures from Sample of Collection 1
0.3
Dot Plot
-3
-2
-1
0
Diff
1
2
3
Mean : Atlanta commutes
Measures from Sample of CommuteAtlanta
0.7
0.8
Statistics: Unlocking the Power of Data
26
27
28
29
xbar
30
4
Dot Plot
31
32
Lock5
Normal Distribution
• The symmetric, bell-shaped curve we have
1000
0
500
Frequency
1500
seen for almost all of our bootstrap and
randomization distributions is called a
normal distribution
-3
Statistics: Unlocking the Power of Data
-2
-1
0
1
2
3
Lock5
Central Limit Theorem!
For a sufficiently large sample
size, the distribution of sample
statistics for a mean or a
proportion is normal
www.lock5stat.com/StatKey
Statistics: Unlocking the Power of Data
Lock5
Central Limit Theorem
• The central limit theorem holds for ANY
original distribution, although “sufficiently large
sample size” varies
• The more skewed the original distribution is
(the farther from normal), the larger the sample
size has to be for the CLT to work
Statistics: Unlocking the Power of Data
Lock5
Central Limit Theorem
• For distributions of a quantitative variable that
are not very skewed and without large outliers,
n ≥ 30 is usually sufficient to use the CLT
• For distributions of a categorical variable,
counts of at least 10 within each category is
usually sufficient to use the CLT
Statistics: Unlocking the Power of Data
Lock5
Normal Distribution
• The normal distribution is fully
characterized by it’s mean and standard
deviation
N  mean,standard deviation 
Statistics: Unlocking the Power of Data
Lock5
Normal Distribution
N  0.523,0.048
Statistics: Unlocking the Power of Data
Lock5
Bootstrap Distributions
If a bootstrap distribution is
approximately normally distributed, we
can write it as
a)
b)
c)
d)
N(parameter, sd)
N(statistic, sd)
N(parameter, se)
N(statistic, se)
sd = standard deviation of variable
se = standard error = standard deviation of statistic
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
If the bootstrap distribution is normal:
To find a P% confidence interval , we just
need to find the middle P% of the
distribution
N(statistic, SE)
Statistics: Unlocking the Power of Data
Lock5
Best Picture
What proportion of
visitors to
www.naplesnews.com
thought The Artist
should win best
picture?
pˆ  .15
SE  ???
Statistics: Unlocking the Power of Data
Lock5
Best Picture
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
Lock5
Area under a Curve
• The area under the curve of a normal
distribution is equal to the proportion of the
distribution falling within that range
• Knowing just the mean and standard
deviation of a normal distribution allows
you to calculate areas in the tails and
percentiles
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
Lock5
Best Picture
www.lock5stat.com/statkey
Statistics: Unlocking the Power of Data
Lock5
Best Picture
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
For a normal sampling distribution, we
can also use the formula
sample statistic  2  SE
to give a 95% confidence interval.
.156  2  .03
 0.096, 0.216 
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
For normal bootstrap distributions, the
formula
sample statistic  2  SE
gives a 95% confidence interval.
How would you use the N(0,1) normal
distribution to find the appropriate
multiplier for other levels of confidence?
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
For a P% confidence interval, use
sample statistic  z  SE
*
where P% of a N(0,1) distribution is
between –z* and z*
Statistics: Unlocking the Power of Data
Lock5
Confidence Intervals
P%
-z*
Statistics: Unlocking the Power of Data
z*
Lock5
Confidence Intervals
Find z* for a 99% confidence interval.
www.lock5stat.com/statkey
z* = 2.575
Statistics: Unlocking the Power of Data
Lock5
News Sources
“A new national survey shows that the
majority (64%) of American adults use at
least three different types of media every
week to get news and information about
their local community”
The standard error for this statistic is 1%
Find a 99% confidence interval for the
true proportion.
Source: http://pewresearch.org/databank/dailynumber/?NumberID=1331
Statistics: Unlocking the Power of Data
Lock5
News Sources
sample statistic  z  SE
*
0.64  2.575  0.01
0.64  0.026
 0.614,0.666
Statistics: Unlocking the Power of Data
Lock5
Confidence Interval Formula
From N(0,1)
sample statistic  z  SE
*
From original
data
Statistics: Unlocking the Power of Data
From
bootstrap
distribution
Lock5
First Born Children
• Are first born children actually smarter?
• Based on data from last semester’s class
survey, we’ll test whether first born children
score significantly higher on the SAT
X first born  X not first born  30.26
• From a randomization distribution, we find
SE = 37
Statistics: Unlocking the Power of Data
Lock5
First Born Children
X first born  X not first born  30.26, SE  37
What normal distribution should we use
to find the p-value?
a)
b)
c)
d)
N(30.26, 37)
N(37, 30.26)
N(0, 37)
N(0, 30.26)
Statistics: Unlocking the Power of Data
Because this is a
hypothesis test, we want
to see what would happen
if the null were true, so
the distribution should be
centered around the null.
The variability is equal to
the standard error.
Lock5
Hypothesis Testing
Distribution of Statistic Assuming Null
Observed
Statistic
p-value
-3
-2
-1
0
1
2
3
Statistic
Statistics: Unlocking the Power of Data
Lock5
p-values
If the randomization distribution is
normal:
To calculate a p-value, we just need to
find the area in the appropriate tail(s)
beyond the observed statistic of the
distribution
N(null value, SE)
Statistics: Unlocking the Power of Data
Lock5
First Born Children
N(0, 37)
www.lock5stat.com/statkey
p-value =
0.207
Statistics: Unlocking the Power of Data
Lock5
First Born Children
Statistics: Unlocking the Power of Data
Lock5
Standard Normal
• Sometimes, it is easier to just use one
normal distribution to do inference
• The standard normal distribution is
Distribution of Statistic Assuming Null
the normal distribution with mean 0 and
standard deviation 1
N  0,1
-3
-2
Statistics: Unlocking the Power of Data
-1
0
Statistic
1
2
3
Lock5
Standardized Test Statistic
•
The standardized test statistic is the
number of standard errors a statistic is
from the null value
sample statistic  null value
z
SE
•
The standardized test statistic (also
called a z-statistic) is compared to N(0,1)
Statistics: Unlocking the Power of Data
Lock5
p-value
1) Find the standardized test statistic:
sample statistic  null value
z
SE
2) The p-value is the area in the tail(s)
beyond z for a standard normal
distribution
Statistics: Unlocking the Power of Data
Lock5
First Born Children
1) Find the standardized test statistic
sample statistic  null value
z
SE
30.26  0

37
 0.818
Statistics: Unlocking the Power of Data
Lock5
First Born Children
2) Find the area in the tail(s) beyond z
for a standard normal distribution
p-value =
0.207
Statistics: Unlocking the Power of Data
Lock5
z-statistic
sample statistic  null value
z
SE
If z = –3, using  = 0.05 we would
(a) Reject the null
(b) Not reject the null
(c) Impossible to tell
(d) I have no idea
Statistics: Unlocking the Power of Data
About 95% of z-statistics are
within -2 and +2, so
anything beyond those values
will be in the most extreme
5%, or equivalently will give a
p-value less than 0.05.
Lock5
z-statistic
•
Calculating the number of standard
errors a statistic is from the null value
allows us to assess extremity on a
common scale
Statistics: Unlocking the Power of Data
Lock5
Formula for p-values
From original
data
From H0
sample statistic  null value
z
SE
From
randomization
distribution
Compare z to N(0,1) for p-value
Statistics: Unlocking the Power of Data
Lock5
Standard Error
• Wouldn’t it be nice if we could compute
the standard error without doing
thousands of simulations?
• We can!!!
• Or rather, we’ll be able to next week!
Statistics: Unlocking the Power of Data
Lock5
To Do
 Do Project 1 (due 10/23)
 Read Chapter 5
Statistics: Unlocking the Power of Data
Lock5