* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter Two: Describing Distributions with Numbers Besides the
Survey
Document related concepts
Transcript
1 Chapter Two: Describing Distributions with Numbers Besides the mean, median, variance, and standard deviation that we already introduced in the last chapter, in this chapter we need to introduce mores numbers to describe a distribution. Definition: The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. The second quartile Q2 is just the overall median and the third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median. For the ordered list, Y1 ≤ Y2 ≤ · · · ≤ Yn, Y1 is the minimum and Yn is the maximum. The graph for the five numbers is z }| { Y | 1, · · · , Q {z1, · · · , Q}2, · · · , Q3, · · · , Yn. The five-number summary of a distribution consists of minimum, Q1, Q2 = M , Q3, and maximum. The interquartile range IQR is the distance between the first and third quartiles: IQR = Q3 − Q1. The interquartile range is a measure of spread which is mainly used as the basis for identifying suspected outliers. The 1.5IQR Rule for outliers An observation, x, is called a suspected outlier if x < Q1 − (1.5 × IQR) 2 or x > Q3 + (1.5 × IQR) Normal Distribution: Definition: A density curve is a curve that is always on or above the horizontal axis and has area exactly one underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range. We can see the following figure. 0.4 density curve 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 −3 −2 −1 0 1 2 3 The area between 0.5 and 2 The median of a density curve is the equal-areas point, the point that divides the area under the curve in half. The mean of a density curve is the balance point, at which the curve would balance if made of solid material. Where a density curve comes from? Usually it comes from the limit of a sequence of 3 histograms when the numbers of observations and bins go to infinity. Look at the following figures. relative frequency 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 bin size=0.2, sample size = 100 relative frequency 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.5 0 0.5 1 1.5 bin size=0.1, sample size = 1000 2 4 relative frequency 1.4 1.2 1 0.8 0.6 0.4 0.2 0 −0.5 0 0.5 1 1.5 2 2.5 bin size=0.05, sample size = 10000 Definition: A normal density curve is a symmetric, single-peaked, and bell-shaped density curev. More precisely, it is the curve or graph of a function (x−µ)2 1 − f (x) = √ e 2σ2 , σ 2π −∞ < x < ∞, where µ is the mean and σ is the standard deviation. We can see the following normal curves with different mean(µ) and standard deviations (σ). 5 1.4 N(0,1) N(−1,0.3) density curves 1.2 1 0.8 0.6 0.4 0.2 0 −3 −2 −1 0 1 2 3 N(−1,0.3) and N(0,1) Definition: A normal distribution is a distribution described by a normal density curve. A normal distribution is completely specified by two numbers, µ and σ.