Download Percentiles

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Misuse of statistics wikipedia , lookup

World Values Survey wikipedia , lookup

Time series wikipedia , lookup

Transcript

Percentiles
– Quartiles
» Order data
» median is 50% value


50th percentile
can have other percentiles
» 25% and 75% are lower and upper quartiles



1st and 3rd quartiles
median is mid-quartile
0th and 4th quartiles are min and max
» interquartile range = 75th percentile - 25th

IQR = Q3 - Q1

Use of percentiles
– box plots

visual summary of: center of data,variation,
spread, skewness, unusual values
» simple boxplot - “box and whisker”


median, quartiles, “whiskers” to range
text and PHstat
» standard


whiskers to 1 step (1.5 x IQR) beyond box
* for values between 1 and 2 steps, o beyond 2
» truncated

whiskers to 90th and 10th percentiles
1
.
0
0
.
8
0
.
6
0
.
4
relativsdual
0
.
2
0
.
0
0
.
2
0
.
4
0
1
5
3
0
4
5
s
a
m
p
l
i
n
g
i
n
t
e
r
v
a
l
6
0
7
5

Use of percentiles
– trimmed mean
» 25% trimmed mean

trim off top 25% and bottom 25%
– Median Absolute Deviation (MAD)
» MAD = median [Xi - median(Xi)]
– quantile plots (cumulative frequency)


order data
compute plotting position
–
–
–
–
–




Weibull: p=i/(n+1)
Cunnane: p=(i-0.4)/(n+2)
other
Phstat: i/n
Excel: ??? (implications of 0, 100%)
can compare to normal probability plot
can compare data sets
can calculate frequencies of exceedence
advantages
– arbitrary categories not required
– all of data are displayed
– every point has a position, without overlap

notes on outliers
– rule of thumb: +/- 3 stand dev
– outliers can have one of three causes:
» measurement or recording error
» observation from a population not similar to that
of most of the data
» a rare event from a single skewed population
– mean is sensitive to outliers
» influence of any one observation can be seen by
looking at the mean with and without that value
– outlier may be most important point in data set
» ozone hole
» don’t automatically trim
– use transforms to normalize, make symmetric

transforms
– three purposes - make data more symmetric,
more linear, more constant in variance
– e.g., ln(X); pH; raise to + exponent for negative
skewness, - exponent for + skewness

Bias, precision, accuracy
– Bias
» measurement bias is a consistent under- or
overestimation of the true values
– Precision
» precision is a measure of the closeness of
agreement among individual measurements
» random measurement uncertainties are random
(unpredictable) deviations from the true value
– Accuracy
» accuracy is a measure of the closeness of
measurements to the true values

Flood recurrence interval
– Magnitude (m)
» put floods in order of decreasing “highest daily
mean” with the highest = 1 and lowest = n
– Recurrence interval (T)
» T = (n+1)/m



T = recurrence interval
n = number of years of record
magnitude of the flood
» note:



m = reverse of i
T = 1/[m/(n+1)] = 1/p (Weibull)
T = inverse of probability
– i.e., a storm which has a probability 0.01 of
happening in a single year will happen about
1/0.01 = once every 100 years.
– Plot
» plot (semilogarithmic) highest daily mean
discharge (linear) against recurrence interval (T)
in years (logarithmic) using a range of 1 to 100
years.
» plot gage height (ft) against instaneous peak
discharge (ft3/s).