Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Percentiles – Quartiles » Order data » median is 50% value 50th percentile can have other percentiles » 25% and 75% are lower and upper quartiles 1st and 3rd quartiles median is mid-quartile 0th and 4th quartiles are min and max » interquartile range = 75th percentile - 25th IQR = Q3 - Q1 Use of percentiles – box plots visual summary of: center of data,variation, spread, skewness, unusual values » simple boxplot - “box and whisker” median, quartiles, “whiskers” to range text and PHstat » standard whiskers to 1 step (1.5 x IQR) beyond box * for values between 1 and 2 steps, o beyond 2 » truncated whiskers to 90th and 10th percentiles 1 . 0 0 . 8 0 . 6 0 . 4 relativsdual 0 . 2 0 . 0 0 . 2 0 . 4 0 1 5 3 0 4 5 s a m p l i n g i n t e r v a l 6 0 7 5 Use of percentiles – trimmed mean » 25% trimmed mean trim off top 25% and bottom 25% – Median Absolute Deviation (MAD) » MAD = median [Xi - median(Xi)] – quantile plots (cumulative frequency) order data compute plotting position – – – – – Weibull: p=i/(n+1) Cunnane: p=(i-0.4)/(n+2) other Phstat: i/n Excel: ??? (implications of 0, 100%) can compare to normal probability plot can compare data sets can calculate frequencies of exceedence advantages – arbitrary categories not required – all of data are displayed – every point has a position, without overlap notes on outliers – rule of thumb: +/- 3 stand dev – outliers can have one of three causes: » measurement or recording error » observation from a population not similar to that of most of the data » a rare event from a single skewed population – mean is sensitive to outliers » influence of any one observation can be seen by looking at the mean with and without that value – outlier may be most important point in data set » ozone hole » don’t automatically trim – use transforms to normalize, make symmetric transforms – three purposes - make data more symmetric, more linear, more constant in variance – e.g., ln(X); pH; raise to + exponent for negative skewness, - exponent for + skewness Bias, precision, accuracy – Bias » measurement bias is a consistent under- or overestimation of the true values – Precision » precision is a measure of the closeness of agreement among individual measurements » random measurement uncertainties are random (unpredictable) deviations from the true value – Accuracy » accuracy is a measure of the closeness of measurements to the true values Flood recurrence interval – Magnitude (m) » put floods in order of decreasing “highest daily mean” with the highest = 1 and lowest = n – Recurrence interval (T) » T = (n+1)/m T = recurrence interval n = number of years of record magnitude of the flood » note: m = reverse of i T = 1/[m/(n+1)] = 1/p (Weibull) T = inverse of probability – i.e., a storm which has a probability 0.01 of happening in a single year will happen about 1/0.01 = once every 100 years. – Plot » plot (semilogarithmic) highest daily mean discharge (linear) against recurrence interval (T) in years (logarithmic) using a range of 1 to 100 years. » plot gage height (ft) against instaneous peak discharge (ft3/s).