Download Area

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Stat 31, Section 1, Last Time
• Time series plots
• Numerical Summaries of Data:
– Center: Mean, Medial
– Spread: Range, Variance, S.D., IQR
• 5 Number Summary
&
Outlier Rule
• Course Organization & Website
https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html
Comments From Grader
I encountered some problems in the grading. These
problems are:
1. the homework pages are not stapled together.
2. the answers are not the same order as the questions.
3. the results, especially in excel tables, are not highlighted.
Could you please emphasize the above problems in your
class? If the students follow the rules, the grading will be
much easier. In the grading of homework #2, I also hope
that you can allow me to enforce the rules by giving zero
points.
Linear Transformations
Idea: What happens to data & summaries,
when data are:
“shifted and scaled”
i.e. “panned and zoomed”
Math:
x1 ,..., xn
Shifted by a
Scaled by b
 ax1  b,..., axn  b
Linear Transformations
Effect on linear summaries:
•
x and M
Centerpoints,
“follow data”:
•
Spreads,
s
ax  b, aM  b .
and IQR
“feel scale, not shift”:
as, aIQR
.
Most Useful Linear Transfo.
“Standardization”
Goal: put data sets on “common scale”
Approach:
1. Subtract Mean
x,
to “center at 0”
2. Divide by S.D.
s,
to “give common SD = 1”
Standardization
Result is called “z-score”:
Note that
xi  x
zi 
s
szi  xi  x ,
x  szi  xi
Thus
zi is interpreted as:
“number of SDs from the mean”
Standardization Example
Buffalo Snowfall Data:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg7Done.xls
•
Standardized data have same (EXCEL
default) histogram shape as raw data.
(Since axes and bin edges just
follow the transformation)
•
i.e. “shape” doesn’t depend on “scaling”
Standardization Example
A look under the hood:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg7Raw.xls
Compute AVERAGE and SD
1. Standardize by:
a. Create Formula in cell B2
b. Drag downwards
c. Keep Mean and SD cells fixed using $s
3. Check stand’d data have mean 0 & SD 1
note that “8.247E-16
=
0”
Standardization HW
C6: For the 18 female scores in 1.49, use EXCEL
to:
a. Give the list of standardized scores
b. Give the Z-score for:
(i)
the mean
(0)
(ii)
the median
(-0.0967)
(iii)
the smallest
(-1.52)
(iv)
the largest
(2.23)
1.79
Modelling Distributions
Text:
Section 1.3
Idea:
Approximate histograms by:
an “idealized curve”
i.e. a “density curve”
that represents the population
Idealized Curve Example
Recall Hidalgo Stamps Data,
Shifting Bin Movie (made # modes change):
https://www.unc.edu/~marron/UNCstat31-2005/StampsHistLoc.mpg
Add idealized curve:
https://www.unc.edu/~marron/UNCstat31-2005/StampsHistLocKDE.mpg
Note: “population curve” shows why
histogram modes appear and disappear
Interpretation of Density
Areas under density curve,
give “relative frequency”
a
b
a&b
Proportion of data between
=
Area under f (x ) =
b
=
a f ( x)dx
Interpretation of Density
Note:
Total Area under density = 1
(since relative freq. of everything is 1)
HW:
•
1.78 (b: 0.8),
1.79
Work with pencil and paper, not EXCEL
Most Useful Density
“Normal Curve”
=
“Gaussian Density”
•
Shape:
“like a mound”
•
E.g. of “sand dumped from a truck”
•
Older, worse, description: “bell shaped”
Normal Density Example
Winter Daily Maximum Temperatures in
Melbourne, Australia
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg9Done.xls
Notes:
•
Top Histogram is “mound shaped”
•
Plus “small scale random variation”
•
So model with “Normal Density”?
Normal Density Curves
Note: there is a family of normal curves,
indexed by:
i. “Center”, i.e. Mean =

ii. “Spread”, i.e. Stand. Deviation =
Terminology:

 &  are called “parameters”
Greek “mu”
Greek “sigma” ~ s
Family of Normal Curves
Think about:

•
“Shifts” (pans) indexed by
•
“Scales” (zooms) indexed by

Nice interactive graphical example:
http://www.stat.sc.edu/~west/applets/normaldemo1.html
(note area under curve is always 1)
Normal Curve Mathematics
The “normal
 , density curve” is:
1
f ( x) 
e
2 
1 x 
 

2  
2
usual “function” of x
circle constant = 3.14…
natural number = 2.7…
Normal Curve Mathematics
Main Ideas:
•
•
Basic shape is:
e
“Shifted to mu”:
1
 x2
2
e

1
 x   2
2
1 x 
 

2  
2
•
“Scaled by sigma”:
•
Make Total Area = 1: divide by
•
f ( x )  0 as x   , but never  0
e
2 
Idea:
Normal Model Fitting
Choose  ,  to give:
“good” fit to data
x1 ,..., xn .
Approach:
IF the distribution is “mound shaped”
& outliers are negligible
THEN a “good” choice of normal model is:
  x,   s
Normal Fitting Example
Revisit Melbourne Daily Max Temps
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg9Done.xls
  x,   s
•
Fit curve, using
•
“Visually good” approximation
Normal Fitting Example
A look under the hood
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg9Done.xls
•
•
•
•
Use chosen (not default) histogram bins
for nice comparison bins
Use longer range to avoid the “More” bin
Can compute with density formula
(Two steps, in cols F and G)
Or use NORMDIST function
(col J, check same as col G)
Normal Curve HW
C7: A study of distance runners found a
mean weight of 63.1 kg, with a standard
deviation of 4.8 kg. Assuming that the
distribution of weights is normal, use
EXCEL to draw the density curve of the
weight distribution.
2 Views of Normal Fitting
1. “Fit Model to Data”
Choose
x
&
  s.
2. “Fit Data to Model”
First Standardize Data
Then use Normal   0,   1.
Note: same thing, just different rescalings
(choose scale depending on need)
Normal Distribution Notation
The “normal distribution,
with mean
 & standard deviation  ”
is abbreviated as:
s
N  ,  
Interpretation of Z-scores
Idea:
Z-scores are on N 0, 1 scale,
so use areas to interpret
Important Areas:
•
Within 1 sd of mean
 68% “the majority”
Interpretation of Z-scores
2. Within 2 sd of mean
 95%
“really most”
3. Within 3 sd of mean
 99.7%
“almost all”
Interpretation of Z-scores
Interactive Version (used for above pics)
From Webster West’s Website:
http://www.stat.sc.edu/~west/applets/empiricalrule.html
Interpretation of Z-scores
Summary:
These relations are called the
“68 - 95 - 99.7 %
HW:
1.82 (a: 234-298,
1.83
Rule”
b: 234, 298),