• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Time series wikipedia, lookup

Misuse of statistics wikipedia, lookup

Transcript
```CIS 2033
Based on Textbook: A Modern Introduction to
Probability and Statistics. 2007
Slides: QUINCY R WALKER
Modified by the instructor: Dr. Longin Jan Latecki
Chapter 16
Exploratory data analysis: numerical
summaries
16.1 The Center of the Data Set
Center of the Data= sample mean:
n = the sample size
Example:
Sample mean of the following data is 44.7
43, 43, 41, 41, 41, 42, 43, 58, 58, 41, 41
Outliers
an outlier is an observation that is
numerically distant from the rest of the data
Sample median is more robust in the presence of outliers.
Variability in A Data Set
Variance:
Standard Deviation:
where n is the number samples
Why we choose the factor 1/(n−1) instead of 1/n will be
explained later (in Chapter 19).
Variability cont.
The Median of the Absolute Deviations of a Sample.
Medn= median of sample
Absolute Deviation:
Absolute Deviation:
The absolute value of the distance
Of a point xi in a data set from
the median
Empirical quantiles
The order statistics consist of the same elements as the original
dataset x1, x2 x3,…, xk , but in ascending order. Denote by the kth
element in the ordered list. Then:
The pth quartile corresponds to
pth quartile of a cdf:
Finv(p) where F(p) is the
cumulative distribution
function of the data
Quartiles
• Lower quartile: qn(.25)
• Upper quartile: qn(.75)
• Interquartile Range (IQR)
• IQR = qn(0.75) − qn(0.25)
• Median(Middle Quartile): qn(.50)
The box-and-whisker plot