Download Section 5.1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Section 5.1
Normal
Distributions
Statistics: Unlocking the Power of Data
Lock5
Outline
 Density curves
 Normal distribution
 Finding normal probabilities (technology)
 Finding normal endpoints (technology)
 Standard normal
Statistics: Unlocking the Power of Data
Lock5
Some Bootstrap and
Randomization Distributions
Correlation: Malevolent uniforms
Measures from Scrambled Collection 1
Slope :Restaurant
tips
Measures from Scrambled RestaurantTips
-60
-40
Dot Plot
-20
0
20
slope (thousandths)
40
Dot Plot
60
0.2
0.0
r
-0.2
-0.4
-0.6
What do Diff means: Finger taps
Mean :Body TemperaturesAll bell-shaped
distributions
you notice?
Measures from Sample of BodyTemp50
98.2
98.3
98.4
0.6
0.4
Dot Plot
Measures from Scrambled CaffeineTaps
98.5
98.6
Nullxbar
98.7
98.8
Proportion : Owners/dogs
Measures from Sample of Collection 1
98.9
0.3 Unlocking
0.4
0.5
0.6
0.7 Data
0.8
Statistics:
the Power
of
phat
99.0
-4
Dot Plot
Dot Plot
-3
-2
-1
0
Diff
1
2
3
Mean : Atlanta commutes
Measures from Sample of CommuteAtlanta
26
27
28
29
xbar
30
4
Dot Plot
31
Lock325
Density Curve
A density curve is a theoretical model
to describe a variable’s distribution.
Think of a density curve as an idealized
histogram, where:
(1) The total area under the curve is one.
(2) The proportion of the population in any
interval is the area over that interval.
Statistics: Unlocking the Power of Data
Lock5
Density for Bootstrap Means
for Atlanta Commutes
What proportion are
between 30 and 31?
Area is
about 0.15
Statistics: Unlocking the Power of Data
Lock5
Normal Distribution
A normal distribution has a symmetric
bell-shaped density curve.
Statistics: Unlocking the Power of Data
Lock5
Parameters of a Normal
Two features distinguish one normal density
from another:
• The mean is its center of symmetry (μ).
• The standard deviation controls its spread (σ).
Notation:
X~N(μ,σ)
Statistics: Unlocking the Power of Data
Lock5
N(µ,σ)
σ
µ2σ µσ
Statistics: Unlocking the Power of Data
σ
μ
µ+σ µ+2σ
Lock5
Example: A Population
Verbal SAT ~ N( 580, 70)
Statistics: Unlocking the Power of Data
Lock5
Example: Bootstrap Distribution
Original
sample
𝑥’s for Atlanta commutes ≈ N( 29.11, 0.93)
Bootstrap std. dev. (SE)
Statistics: Unlocking the Power of Data
Lock5
Ex: Randomization Distribution
H0
𝑝’s for dog/owners matches ≈ N( 0.5, .10)
Randomization std. dev. (SE)
Statistics: Unlocking the Power of Data
Lock5
How can we find areas under a
normal density?
N(μ,σ)
We need
technology!
a
Calculus!
b
b
Area  a
Statistics: Unlocking the Power of Data
1
e
2 
( x )2

2 2
dx
Lock5
StatKey
Pick the tail
Adjust μ, σ
Probability
Statistics: Unlocking the Power of Data
Endpoint
Lock5
Example: Verbal SAT scores
Suppose that verbal SAT scores for applicants at
a college follow a normal distribution with
mean µ = 580 and std. dev. σ =70.
What proportion of applicants have SAT scores
above 650?
Statistics: Unlocking the Power of Data
Lock5
Example: Verbal SAT scores
About 4.3%
of applicants
will have
verbal SAT
scores above
700
Statistics: Unlocking the Power of Data
Lock5
Example: Bootstrap Means
Suppose that the bootstrap distribution of
means for samples of size 500 Atlanta commute
times is N(29.11,0.93).
Find an endpoint (percentile) so that just 5% of
the bootstrap means are smaller.
Statistics: Unlocking the Power of Data
Lock5
Example: Bootstrap Means
About 5% of
the bootstrap
means will be
less than
27.58 minutes.
Statistics: Unlocking the Power of Data
Lock5
Note: All that really matters is the number of
Finding
Probabilities
for N(μ,σ)
standard
deviations
from the mean.
About what proportion should be within one
std. dev. of the mean?
≈68%
Statistics: Unlocking the Power of Data
Lock5
Standard Normal
=0, =1  Z~N(0,1)
To convert any X~N(μ,σ) to Z~N(0,1):
𝑋−𝜇
𝑍=
𝜎
(z-score)
“Standardize” the endpoint(s), then use Z~N(0,1)
Statistics: Unlocking the Power of Data
Lock5
Ex: Dog/Owner randomization
proportions, 𝑝 ≈ N(0.5,0.1)
?
Original 𝑝
0.64
X-area above 0.64= Z-area
0.64−0.5
above
0.1
= 1.40
area = 0.081
Statistics: Unlocking the Power of Data
Lock5
Converting Normals
𝑋−𝜇
𝑍=
𝜎
X~N(μ, σ)
Z~N(0,1)
𝑋 = 𝜇 + 𝑍𝜎
Statistics: Unlocking the Power of Data
Lock5
Example: Percentile for Verbal SAT
25%-tile (Q1) for
Z~N(0,1) is -0.674.
Find the 25%-tile for the
Verbal SAT~N(580,70)
distribution
X= μ + Zσ = 580+(-0.674)(70) = 533
Statistics: Unlocking the Power of Data
Lock5
GOALS
If we can approximate a bootstrap
distribution with a normal …
… construct a confidence interval.
If we can approximate a randomization
distribution with a normal …
… compute a p-value.
IF we can find an easy way to estimate SE, we can
even do this without generating the distribution!
Statistics: Unlocking the Power of Data
Lock5
What is the area below -1.50 in a
standard normal distribution?
A. 0.067
B. 0.933
C. 0.241
D. 0.759
E. 0.500
Statistics: Unlocking the Power of Data
Lock5
What is the area above 2.20 in a
standard normal distribution?
A. 0.003
B. 0.997
C. 0.014
D. 0.986
E. 0.037
Statistics: Unlocking the Power of Data
Lock5
What is the area between 0.8 and 1.4
in a standard normal distribution?
A. 0.247
B. 0.185
C. 0.028
D. 0.476
E. 0.131
Statistics: Unlocking the Power of Data
Lock5
What is the endpoint z in a standard
normal distribution if the area to the
right of z is 0.03 ?
A. 0.247
B. 1.881
C. 0.897
D. 2.158
E. 1.751
Statistics: Unlocking the Power of Data
Lock5
What is the endpoint z in a standard
normal distribution if the area to the
left of z is 0.18?
A. 0.915
B. -1.254
C. 1.762
D. -2.158
E. -0.915
Statistics: Unlocking the Power of Data
Lock5
What is the endpoint z in a standard
normal distribution if the area
between z and –z is 0.60?
A. 0.200
B. 0.842
C. 1.168
D. -2.158
E. 0.400
Statistics: Unlocking the Power of Data
Lock5
Summary
 Statistical inference is drawing conclusions about a
population based on a sample
 We use a sample statistic to estimate a population
parameter
 To assess the uncertainty of a statistic, we need to
know how much it varies from sample to sample
 To create a sampling distribution, take many samples
of the same size from the population, and compute
the statistic for each
 Standard error is the standard deviation of a statistic
Statistics: Unlocking the Power of Data
Lock5