Download Chapter Two: Describing Distributions with Numbers Besides the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Law of large numbers wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Addition wikipedia , lookup

Wiles's proof of Fermat's Last Theorem wikipedia , lookup

John Wallis wikipedia , lookup

Central limit theorem wikipedia , lookup

Multimodal distribution wikipedia , lookup

Transcript
1
Chapter Two: Describing Distributions with Numbers
Besides the mean, median, variance, and standard
deviation that we already introduced in the last chapter, in this chapter we need to introduce mores numbers to describe a distribution.
Definition: The first quartile Q1 is the median of the
observations whose position in the ordered list is to the
left of the location of the overall median. The second
quartile Q2 is just the overall median and the third
quartile Q3 is the median of the observations whose
position in the ordered list is to the right of the location of the overall median. For the ordered list,
Y1 ≤ Y2 ≤ · · · ≤ Yn, Y1 is the minimum and Yn is the
maximum. The graph for the five numbers is
z }| {
Y
| 1, · · · , Q
{z1, · · · , Q}2, · · · , Q3, · · · , Yn.
The five-number summary of a distribution consists of
minimum, Q1, Q2 = M , Q3, and maximum.
The interquartile range IQR is the distance between
the first and third quartiles:
IQR = Q3 − Q1.
The interquartile range is a measure of spread which
is mainly used as the basis for identifying suspected
outliers.
The 1.5IQR Rule for outliers An observation, x, is
called a suspected outlier if
x < Q1 − (1.5 × IQR)
2
or
x > Q3 + (1.5 × IQR)
Normal Distribution:
Definition: A density curve is a curve that is always
on or above the horizontal axis and has area exactly
one underneath it. A density curve describes the overall pattern of a distribution. The area under the curve
and above any range of values is the proportion of all
observations that fall in that range. We can see the
following figure.
0.4
density curve
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−3
−2
−1
0
1
2
3
The area between 0.5 and 2
The median of a density curve is the equal-areas
point, the point that divides the area under the curve
in half. The mean of a density curve is the balance
point, at which the curve would balance if made of
solid material.
Where a density curve comes from?
Usually it comes from the limit of a sequence of
3
histograms when the numbers of observations and bins
go to infinity. Look at the following figures.
relative frequency
1.4
1.2
1
0.8
0.6
0.4
0.2
0
0
0.5
1
1.5
2
bin size=0.2, sample size = 100
relative frequency
1.4
1.2
1
0.8
0.6
0.4
0.2
0
−0.5
0
0.5
1
1.5
bin size=0.1, sample size = 1000
2
4
relative frequency
1.4
1.2
1
0.8
0.6
0.4
0.2
0
−0.5
0
0.5
1
1.5
2
2.5
bin size=0.05, sample size = 10000
Definition: A normal density curve is a symmetric,
single-peaked, and bell-shaped density curev. More
precisely, it is the curve or graph of a function
(x−µ)2
1
−
f (x) = √ e 2σ2 ,
σ 2π
−∞ < x < ∞,
where µ is the mean and σ is the standard deviation.
We can see the following normal curves with different
mean(µ) and standard deviations (σ).
5
1.4
N(0,1)
N(−1,0.3)
density curves
1.2
1
0.8
0.6
0.4
0.2
0
−3
−2
−1
0
1
2
3
N(−1,0.3) and N(0,1)
Definition: A normal distribution is a distribution
described by a normal density curve. A normal distribution is completely specified by two numbers, µ and
σ.