Download Confidence Level

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
Stat 31, Section 1, Last Time
Sampling Distributions
•
Binomial Distribution
•
Binomial Probs
•
Normal Approx. to Binomial
•
Counts Scale
vs. Proportion Scale
Important Announcement
2nd Midterm Date Changed,
from:
Tuesday, April 5,
to
Tuesday, April 12.
Section 5.2: Distrib’n of Sample Means
n
1
Idea: Study Probability Structure of X   X i
n i 1
•
Based on X 1 ,..., X n
•
Drawn independently
•
From same distribution,
•
Having Expected Value:
•
EX   X
And Standard Deviation:  X
Expected Value of Sample Mean
How does E  X  relate to  X ?
 X  1
n
 X 1 X n 
1
  X 1 X n 
n
1
1
  X    X    n   X   X
n
n
Sample mean “has the same mean” as the
original data.
Variance of Sample Mean
Study “spread” (i.e. quantify variation) of
 
2
X
2
1
 X 1  X n 
n
X
1 2
 2  X 1 X n 
n
1 2
1
1 2
2
2
 2  X     X   2  n   X   X
n
n
n
1
Variance of Sample mean “reduced by ”
n
S. D. of Sample Mean
Since Standard Deviation is square root of
Variance,
Take square roots to get:
1
X  X
n
1
S. D. of Sample mean “reduced by
”
n
Mean & S. D. of Sample Mean
Summary:
Averaging:
1. Gives same centerpoint
2. Reduces variation by factor of
Called “Law of Averages, Part I”
1
n
Law of Averages, Part I
Some consequences (worth noting):
•
To “double accuracy”, need 4 times as
much data.
•
For 10 times accuracy”, need 100 times
as much data.
Law of Averages, Part I
HW:
5.28
(5.77, 4)
Distribution of Sample Mean
Now know center and spread, what about
“shape of distribution”?
Case 1:
If
CAN SHOW:
X 1 ,, X n are indep. N , 
 

X ~ N  ,

n

(knew these, news is “mound shape”)
Thus work with NORMDIST & NORMINV
Distribution of Sample Mean
Case 2:
If X 1 ,, X n are “almost anything”
STILL HAVE:
X
 

“approximately” N   ,

n

Distribution of Sample Mean
Remarks:
•
Mathematics: in terms of lim
•
Called “Law of Averages, Part II”
•
Also called “Central Limit Theorem”
•
Gives sense in which Normal
Distribution is in the center
•
Hence name “Normal” (ostentatious?)
n 
Law of Averages, Part II
More Remarks:
•
Thus we will work with NORMDIST &
NORMINV a lot, for averages
•
This is why Normal Dist’n is good model
for many different populations
(any sum of small indep. Random pieces)
•
Also explains Normal Approximation to
the Binomial
Normal Approx. to Binomial
Explained by Law of Averages. II, since:
For
X ~ Binomial (n.p)
n
Can represent X as:
X   Xi
i 1
Where:
0 F on trial i
Xi  
1 S on trial i
Thus X is an average (rescaled sum), so
Law of Averages gives Normal Dist’n
Law of Averages, Part II
Nice Java Demo:
http://www.amstat.org/publications/jse/v6n3/applets/CLT.html
1 Dice (think n = 1):
2 Dice (n = 1):
Average Dist’n is flat
Average Dist’n is triangle
…
5 Dice (n = 5): Looks quite “mound shaped”
Law of Averages, Part II
Another cool one:
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
•
Create U shaped distribut’n with mouse
•
Simul. samples of size 2: non-Normal
•
Size n = 5: more normal
•
Size n = 10 or 25: mound shaped
Law of Averages, Part II
Class Example:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg19.xls
Shows:
•
Even starting from non-normal shape,
•
Averages become normal
•
More so for more averaging
•
1
SD smaller with more averaging (
)
n
Law of Averages, Part II
HW:
5.31, 5.33, 5.35, 5.39
And now for something
completely different….
A statistics professor was describing
sampling theory to his class, explaining
how a sample can be studied and used
to generalize to a population.
???
Chapter 6: Statistical Inference
Main Idea:
Form conclusions by
quantifying uncertainty
(will study several approaches,
first is…)
Section 6.1: Confidence Intervals
Background:
The sample mean, X ,
is an “estimate”
of the population mean, 
How accurate?
(there is “variability”, how much?)
Confidence Intervals
Recall the Sampling Distribution:
 

X ~ N  ,

n

(maybe an approximation)
Confidence Intervals
Thus understand error as:
X dist ' n


How to explain to untrained consumers?
(who don’t know randomness,
distributions, normal curves)
n
Confidence Intervals
Approach:
present an interval
With endpoints:
Estimate +- margin of error
I.e.
X m
reflecting variability
How to choose m ?
Confidence Intervals
Choice of “Confidence Interval radius”,
i.e. margin of error, m:
Notes:
•
No Absolute Range (i.e. including
“everything”) is available
•
From infinite tail of normal dist’n
•
So need to specify desired accuracy
Confidence Intervals
Choice of “Confidence Interval radius”, m:
Approach:
•
Choose a Confidence Level
•
Often 0.95
(e.g. FDA likes this number for
approving new drugs, and it
is a common standard for
publication in many fields)
•
And take margin of error to include that
part of sampling distribution
Confidence Intervals
E.g. For confidence level 0.95, want
X distribution
0.95 = Area

m = margin of error
Confidence Intervals
Computation: Recall NORMINV takes
areas (probs), and returns cutoffs
Issue: NORMINV works with lower areas
Note: lower tail
included
Confidence Intervals
So adapt needed probs to lower areas….
When inner area = 0.95,
Right tail = 0.025
Shaded Area = 0.975
So need to compute:
 

NORMINV  0.975,  ,

n

Confidence Intervals
Need to compute:
 

NORMINV  0.975,  , 
n

Major problem:

is unknown
•
But should answer depend on  ?
•
“Accuracy” is only about spread
•
Not centerpoint
•
Need another view of the problem
Confidence Intervals
Approach to unknown :
Recenter, i.e. look at X   dist’n
Key concept:
Centered at 0
Now can calculate as:
 

m  NORMINV  0.975,0, 
n

Confidence Intervals
 

Computation of: m  NORMINV  0.975,0, 
n

Smaller Problem: Don’t know 
Approach 1:
Estimate with s
•
Leads to complications
•
Will study later
Approach 2:
Sometimes know 
Confidence Intervals
138
E.g. Crop researchers plant 15 plots
139.1
113
with a new variety of corn. The
132.5
140.7
yields, in bushels per acre are:
109.7
118.9
134.8
Assume that

109.6
= 10 bushels / acre
127.3
115.6
130.4
130.2
111.7
105.5
Confidence Intervals
E.g. Find:
a) The 90% Confidence Interval for the
mean value , for this type of corn.
b) The 95% Confidence Interval.
c) The 99% Confidence Interval.
d) How do the CIs change as the
confidence level increases?
Solution, part 1 of:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg20.xls
Confidence Intervals
An EXCEL shortcut:
CONFIDENCE
Careful:
parameter

is:
2 tailed outer area
So for level = 0.90,
 = 0.10
Confidence Intervals
HW: 6.1, 6.3, 6.5