Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia, lookup

Transcript
```Stat 31, Section 1, Last Time
Sampling Distributions
•
Binomial Distribution
•
Binomial Probs
•
Normal Approx. to Binomial
•
Counts Scale
vs. Proportion Scale
Important Announcement
2nd Midterm Date Changed,
from:
Tuesday, April 5,
to
Tuesday, April 12.
Section 5.2: Distrib’n of Sample Means
n
1
Idea: Study Probability Structure of X   X i
n i 1
•
Based on X 1 ,..., X n
•
Drawn independently
•
From same distribution,
•
Having Expected Value:
•
EX   X
And Standard Deviation:  X
Expected Value of Sample Mean
How does E  X  relate to  X ?
 X  1
n
 X 1 X n 
1
  X 1 X n 
n
1
1
  X    X    n   X   X
n
n
Sample mean “has the same mean” as the
original data.
Variance of Sample Mean
Study “spread” (i.e. quantify variation) of
 
2
X
2
1
 X 1  X n 
n
X
1 2
 2  X 1 X n 
n
1 2
1
1 2
2
2
 2  X     X   2  n   X   X
n
n
n
1
Variance of Sample mean “reduced by ”
n
S. D. of Sample Mean
Since Standard Deviation is square root of
Variance,
Take square roots to get:
1
X  X
n
1
S. D. of Sample mean “reduced by
”
n
Mean & S. D. of Sample Mean
Summary:
Averaging:
1. Gives same centerpoint
2. Reduces variation by factor of
Called “Law of Averages, Part I”
1
n
Law of Averages, Part I
Some consequences (worth noting):
•
To “double accuracy”, need 4 times as
much data.
•
For 10 times accuracy”, need 100 times
as much data.
Law of Averages, Part I
HW:
5.28
(5.77, 4)
Distribution of Sample Mean
“shape of distribution”?
Case 1:
If
CAN SHOW:
X 1 ,, X n are indep. N , 
 

X ~ N  ,

n

(knew these, news is “mound shape”)
Thus work with NORMDIST & NORMINV
Distribution of Sample Mean
Case 2:
If X 1 ,, X n are “almost anything”
STILL HAVE:
X
 

“approximately” N   ,

n

Distribution of Sample Mean
Remarks:
•
Mathematics: in terms of lim
•
Called “Law of Averages, Part II”
•
Also called “Central Limit Theorem”
•
Gives sense in which Normal
Distribution is in the center
•
Hence name “Normal” (ostentatious?)
n 
Law of Averages, Part II
More Remarks:
•
Thus we will work with NORMDIST &
NORMINV a lot, for averages
•
This is why Normal Dist’n is good model
for many different populations
(any sum of small indep. Random pieces)
•
Also explains Normal Approximation to
the Binomial
Normal Approx. to Binomial
Explained by Law of Averages. II, since:
For
X ~ Binomial (n.p)
n
Can represent X as:
X   Xi
i 1
Where:
0 F on trial i
Xi  
1 S on trial i
Thus X is an average (rescaled sum), so
Law of Averages gives Normal Dist’n
Law of Averages, Part II
Nice Java Demo:
http://www.amstat.org/publications/jse/v6n3/applets/CLT.html
1 Dice (think n = 1):
2 Dice (n = 1):
Average Dist’n is flat
Average Dist’n is triangle
…
5 Dice (n = 5): Looks quite “mound shaped”
Law of Averages, Part II
Another cool one:
http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
•
Create U shaped distribut’n with mouse
•
Simul. samples of size 2: non-Normal
•
Size n = 5: more normal
•
Size n = 10 or 25: mound shaped
Law of Averages, Part II
Class Example:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg19.xls
Shows:
•
Even starting from non-normal shape,
•
Averages become normal
•
More so for more averaging
•
1
SD smaller with more averaging (
)
n
Law of Averages, Part II
HW:
5.31, 5.33, 5.35, 5.39
And now for something
completely different….
A statistics professor was describing
sampling theory to his class, explaining
how a sample can be studied and used
to generalize to a population.
???
Chapter 6: Statistical Inference
Main Idea:
Form conclusions by
quantifying uncertainty
(will study several approaches,
first is…)
Section 6.1: Confidence Intervals
Background:
The sample mean, X ,
is an “estimate”
of the population mean, 
How accurate?
(there is “variability”, how much?)
Confidence Intervals
Recall the Sampling Distribution:
 

X ~ N  ,

n

(maybe an approximation)
Confidence Intervals
Thus understand error as:
X dist ' n


How to explain to untrained consumers?
(who don’t know randomness,
distributions, normal curves)
n
Confidence Intervals
Approach:
present an interval
With endpoints:
Estimate +- margin of error
I.e.
X m
reflecting variability
How to choose m ?
Confidence Intervals
i.e. margin of error, m:
Notes:
•
No Absolute Range (i.e. including
“everything”) is available
•
From infinite tail of normal dist’n
•
So need to specify desired accuracy
Confidence Intervals
Choice of “Confidence Interval radius”, m:
Approach:
•
Choose a Confidence Level
•
Often 0.95
(e.g. FDA likes this number for
approving new drugs, and it
is a common standard for
publication in many fields)
•
And take margin of error to include that
part of sampling distribution
Confidence Intervals
E.g. For confidence level 0.95, want
X distribution
0.95 = Area

m = margin of error
Confidence Intervals
Computation: Recall NORMINV takes
areas (probs), and returns cutoffs
Issue: NORMINV works with lower areas
Note: lower tail
included
Confidence Intervals
So adapt needed probs to lower areas….
When inner area = 0.95,
Right tail = 0.025
So need to compute:
 

NORMINV  0.975,  ,

n

Confidence Intervals
Need to compute:
 

NORMINV  0.975,  , 
n

Major problem:

is unknown
•
But should answer depend on  ?
•
•
Not centerpoint
•
Need another view of the problem
Confidence Intervals
Approach to unknown :
Recenter, i.e. look at X   dist’n
Key concept:
Centered at 0
Now can calculate as:
 

m  NORMINV  0.975,0, 
n

Confidence Intervals
 

Computation of: m  NORMINV  0.975,0, 
n

Smaller Problem: Don’t know 
Approach 1:
Estimate with s
•
•
Will study later
Approach 2:
Sometimes know 
Confidence Intervals
138
E.g. Crop researchers plant 15 plots
139.1
113
with a new variety of corn. The
132.5
140.7
yields, in bushels per acre are:
109.7
118.9
134.8
Assume that

109.6
= 10 bushels / acre
127.3
115.6
130.4
130.2
111.7
105.5
Confidence Intervals
E.g. Find:
a) The 90% Confidence Interval for the
mean value , for this type of corn.
b) The 95% Confidence Interval.
c) The 99% Confidence Interval.
d) How do the CIs change as the
confidence level increases?
Solution, part 1 of:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg20.xls
Confidence Intervals
An EXCEL shortcut:
CONFIDENCE
Careful:
parameter

is:
2 tailed outer area
So for level = 0.90,
 = 0.10
Confidence Intervals
HW: 6.1, 6.3, 6.5
```