Download Normal curve and the notion of symmetric and skewed data The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Normal curve and the notion of symmetric and skewed data
The normal curve
The normal curve was discovered around 1720 by Abraham de Moivre while he was
developing the mathematics of chance. Around 1870, Adolph Quetelet had the idea of using
normal curve as an ideal histogram, with which other histograms. The normal curve is the
plot of the following function
𝑓(𝑥) =
1
√2𝜋
𝑥2
𝑒 − 2 , −∞ < 𝑥 < ∞ .
The area under this curve is 1. However 99.73 percent of the area is contained between -3
and 3. And only 0.006 percent of the area is outside the interval [-4, 4].
The curve is symmetric around the origin, in the sense that 𝑓(𝑥) = 𝑓(−𝑥).
The curve never touches the horizontal axis, and is always positive valued.
Excercise1. Show that the normal curve has only one peak, and the peak is at x=0.
Some useful facts.
The area under the normal curve between -1 and 1 is approximately 68 percent.
The area under the normal curve between -2 and 2 is approximately 95 percent.
The area under the normal curve between -3 and 3 is about 99.73 percent.
Since the normal curve is non negative valued, and the total area under curve is 1, the area
under the curve contained between any two numbers a and b is nonnegative and cannot
exceed 1. Therefore
The area under the normal curve represents a probability distribution, called the standard
normal distribution.
The shape of normal curve is symmetric about zero. Therefore, there is equal area under the
normal curve between [a, b] and [-b, -a] where a, b are two positive numbers.
−𝑎
𝑏
(Convince yourself that ∫−𝑏 𝑓(𝑥)𝑑𝑥 = ∫𝑎 𝑓(𝑥)𝑑𝑥 , where 𝑓 is the normal curve)
Normal distribution is said to be a good fit or good model for a data set
if the percentage of the data in any interval is approximately equal to the percentage of the
area under the normal curve within that interval.
Symmetric and Skewed data
In order to compare the shape of the histogram based on a data set to the standard normal
curve. The data is standardized as follows.
Let 𝑥1 , … . , 𝑥𝑛 be the given data with mean 𝑥̅ and standard deviation 𝑆. Define
𝑦𝑖 =
(𝑥𝑖 −𝑥̅ )
𝑆
, 𝑖 = 1, … . . , 𝑛.
Verify that the new data set 𝑦1 , … . , 𝑦𝑛 has mean zero and standard deviation equal to 1.
𝑦1 , … . , 𝑦𝑛 are the standardized data.
Plot the histogram using frequency density, based on the standardized data
𝑦1 , … . , 𝑦𝑛 .
If the shape of the standardized data 𝑦1 , … . , 𝑦𝑛 resemble the standard normal
curve ( e.g. the class interval containing zero is the modal class, mean=median=mode, and
frequency densities on either side of the modal class decrease equally) the data is said to
be symmetric.
For symmetric data, the frequency curve of the standardized data has peak at
zero, and is symmetric and both right and left tails are of equal length.
A data set is said to be skewed if the shape of the histogram, based on frequency
density, of the standardized data is not symmetric, i.e. the right or the left tail is
longer.
Excercise. Is the income distribution symmetric? Justify.
Note. For symmetric data, mean=median=mode. (why?)
If the data are positively skewed, i.e. the right tail of the frequency curve is longer.
The left tail of the histogram or frequency curve is longer, the data set is said to be negatively
skewed.
Coefficient of Skewness =
𝑚3
𝑆
3
,
1
where 𝑚3 = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )3
𝑥̅ is the sample mean and 𝑆 is the standard deviation.
For symmetric data the Coefficient of Skewness is equal to zero.
For positively skewed data Coefficient of Skewness is positive.
For negatively skewed data Coefficient of Skewness is negative.
Are the converse true? That is if
symmetric? Not necessarily.
Coefficient of skewness =0, is data necessarily
If the data is symmetric then mean=median=mode. But the converse is not
necessarily true.
Ex 1 Let us consider the numbers -2, -2,-2, 0,0,0,0, 1, 5.
Mean=Mode=0 (easy to verify) and also meadin=0..
But is this data symmetric (Draw the histogram)