Download Sep.10

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Review
• Range: the difference between the largest and the smallest
observations
Range is not resistant and it ignores the numerical values of
nearly all the data.
• deviation from the mean: the difference between the
observation and the sample mean
d = x − x̄
• variance: the average of the squared deviations
P
sum of squared deviations
(x − x̄)2
2
=
s =
n−1
sample size − 1
• standard deviation: square root of the variance
sP
(x − x̄)2
s=
n−1
Properties of the Standard Deviation, s:
• The greater the spread of the data, the larger is the value of s.
• s = 0 only when all observations take the same value.
• s can be influenced by outliers.
Empirical Rule for bell-shaped distribution:
• 68% of the observations fall within 1 standard deviation of the
mean, that is, between x̄ − s and x̄ + s (denoted x̄ ∓ s).
• 95% of the observations fall within 2 standard deviation of the
mean(x̄ ∓ 2s).
• All or nearly all observations fall within 3 standard deviations
of the mean (x̄ ∓ 3s).
sample
statistics
sample mean x̄
sample s.d. s
−→
−→
−→
−→
population
parameters
population mean µ
population s.d. σ
• percentile: the pth percentile is a value such that p percent
of the observations fall below or at that value.
The 50th percentile is usually referred to as the median.
• quartiles:
first quartile (Q1): the 25th percentile
second quartile (Q2): the 50th percentile, which is the
median
third quartile (Q3): the 75th percentile
The quartiles split the distribution into four parts, each
containing one quarter (25%) of the observations.
Finding Quartiles:
• Arrange the data in order
• Find the median of the data. This is the second quartile, Q2.
• For the lower half of the observations, find the median. This
is the first quartile, Q1.
• For the upper half of the observations, find the median. This
is the third quartile, Q3.
• interquartile range (IQR): the distance between the third
and first quartiles.
IQR = Q3 − Q1
• The 1.5× IQR Criterion for Identifying Potential Outliers
An observation is a potential outlier if it falls more than 1.5×
IQR below the first quartile or more than 1.5× IQR above the
third quartile.
• The Five-Number Summary of Positions
minimum value, first quartile Q1, median, third quartile Q3,
and the maximum value.
• box plot:
Constructing A Box Plot
• A box goes from the lower quartile Q1 to the upper quartile
Q3.
• A line is drawn inside the box at the median.
• A line goes from the lower end of the box to the smallest
observation that is not a potential outlier. A separate line
goes from the upper end of the box to the largest observation
that is not a potential outlier. These lines are called whiskers.
The potential outliers are shown seqarately.