Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Gaussian (Normal) Distribution: Briefly, More Details & Some Applications The Gaussian (Normal) Distribution • The Gaussian Distribution is one of the most used distributions in all of science. It is also called the “bell curve” or the Normal Distribution. If this is the “Normal Distribution”, logically, shouldn’t there also be an “Abnormal Distribution”? Johann Carl Friedrich Gauss (1736–1806, Germany) • Mathematician, Astronomer & Physicist. • Sometimes called the “Prince of Mathematics" (?) • A child prodigy in math. (Do you have trouble believing some of the following? I do!) • Age 3: He informed his father of a mistake in a payroll calculation & gave the correct answer!! • Age 7: His teacher gave the problem of summing all integers 1 - 100 to his class to keep them busy. Gauss quickly wrote the correct answer 5050 on his slate!! • Whether or not you believe all of this, it is 100% true that he Made a HUGE number of contributions to Mathematics, Physics, & Astronomy!! Johann Carl Friedrich Gauss Genius! Made a HUGE number of contributions to Math, Physics, & Astronomy 1. Proved the Fundamental Theorem of Algebra, that every polynomial has a root of the form a+bi. 2. Proved the fundamental Theorem of Arithmetic, that every natural number can be represented as a product of primes in only one way. 3. Proved that every number is the sum of at most 3 triangular numbers. 4. Developed the method of least squares fitting & many other methods in statistics & probability. 5. Proved many theorems of integral calculus, including the divergence theorem (when applied to the E field, it is what is called Gauss’s Law). 6. Proved many theorems of number theory. 7. Made many contributions to the orbital mechanics of the solar system. 8. Made many contributions to Non-Euclidean geometry 9. One of the first to rigorously study the Earth’s magnetic field Characteristics of a Normal or Gaussian Distribution r . 4 0 . 3 0 . 2 0 . 1 l i t r b u i o n : m = 0 , s2 = 1 f ( x 0 a . 0 - 5 x It is Symmetric It’s Mean, Median, & Mode are Equal a A 2-Dimensional Gaussian Gaussian or Normal Distribution • It is a symmetrical, bell-shaped curve. • It has a point of inflection at a position 1 standard deviation from the mean. l Formula: f (X ) m 1 f (X ) ( e) s 2 ( X m )2 2s 2 X The Normal Distribution f ( x) 1 s 2 Note the constants: = 3.14159 e = 2.71828 1 xm 2 ( ) 2 s e This is a bell shaped curve with different centers & spreads depending on m & s • There are only 2 variables that determine the curve, the mean m & the variance s. The rest are constants. • For “z scores” (m = 0, s = 1), the equation becomes: 1 z2 / 2 f ( z) e 2 • The negative exponent means that big |z| values give small function values in the tails. Normal Distribution • It’s a probability function, so no matter what the values of m and s, it must integrate to 1! s 1 2 1 xm 2 ( ) e 2 s dx 1 The Normal Distribution is Defined by its Mean & Standard Deviation. m= x s2 = (x l 2 1 s 2 1 s 2 1 xm 2 ( ) e 2 s dx 1 xm 2 ( ) e 2 s dx) m Standard Deviation = s 2 Normal Distribution • Can take on an infinite number of possible values. • The probability of any one of those values occurring is essentially zero. • Curve has area or probability = 1 7-6 • A normal distribution with a mean m = 0 & a standard deviation s = 1 is called The standard normal distribution. • Z Value: The distance between a selected value, designated X, and the population mean m, divided by the population standard deviation, s X m Z s 7-7 Example 1 • The monthly incomes of recent MBA graduates in a large corporation are normally distributed with a mean of $2000 and a standard deviation of $200. What is the Z value for an income of $2200? An income of $1700? • For X = $2200, Z= (2200-2000)/200 = 1. • For X = $1700, Z = (1700-2000)/200 = -1.5 • A Z value of 1 indicates that the value of $2200 is 1 standard deviation above the mean of $2000, while a Z value of $1700 is 1.5 standard deviation below the mean of $2000. Probabilities Depicted by Areas Under the Curve • Total area under the curve is 1 • The area in red is equal to p(z > 1) • The area in blue is equal to p(-1< z <0) • Since the properties of the normal distribution are known, areas can be looked up on tables or calculated on a computer. Probability of an Interval F (2) F (1) p(1 X 2) Cumulative Probability F (a) p( X a) Normal Curv e probability density Cumulative Probability 1 F (a) p(a X ) -3 -1 0 Z a=X 2 3 • Given any positive value for z, the corresponding probability can be looked up in standard tables. A table will give this probability Given positive z The probability found using a table is the probability of having a standard normal variable between 0 & the given positive z. Areas Under the Standard Normal Curve Areas and Probabilities • The Table shows cumulative normal probabilities. Some selected entries: z F(z) z F(z) z F(z) 0 .1 .2 .50 .54 .58 .3 .4 .5 .62 .66 .69 1 2 3 .84 .98 .99 • About 54 % of scores fall below z of .1. About 46 % of scores fall below a z of -.1 (1-.54 = .46). About 14% of scores fall between z of 1 and 2 (.98-.84). 7-9 Areas Under the Normal Curve • About 68 percent of the area under the normal curve is within one standard deviation of the mean: -s < m < s • About 95 percent is within two standard deviations of the mean: -2s < m < 2s • About 99.74 percent is within three standard deviations of the mean: -3s < m < 3s 7-10 r a l i t r b u i o n : m = 0 , s2 = 1 Areas Under the Normal Curve . 4 0 . 3 0 . 2 0 . 1 Between: 1.68.26% 2.95.44% 3.99.74% f ( x 0 . 0 - 5 m 3s Irwin/McGraw-Hill m m 2s x m 1s m 2s m 1s m 3s © The McGraw-Hill Companies, Inc., 1999 Key Areas Under the Curve For normal distributions + 1 s ~ 68% + 2 s ~ 95% + 3 s ~ 99.9% “68-95-99.7 Rule” 68% of the data 95% of the data 99.7% of the data 68.26 -95.44-99.74 Rule For a Normally distributed variable: 1. > 68.26% of all possible observations lie within one standard deviation on either side of the mean (between ms and ms). 2. > 95.44% of all possible observations lie within two standard deviations on either side of the mean (between m2s and m2s). 3. > 99.74% of all possible observations lie within three standard deviations on either side of the mean (between m3s and m3s). • Using the unit normal (z), we can find areas and probabilities for any normal distribution. • Suppose X = 120, m =100, s =10. • Then z = (120-100)/10 = 2. • About 98 % of cases fall below a score of 120 if the distribution is normal. In the normal, most (95%) are within 2 s of the mean. Nearly everybody (99%) is within 3 s of the mean. 68.26-95.44-99.74 Rule 68-95-99.7 Rule in Math terms… m s m s s m 2s m s s 2 m 3s m s s 3 1 2 1 2 1 2 1 xm 2 ( ) e 2 s dx .68 1 xm 2 ( ) e 2 s dx .95 1 xm 2 ( ) e 2 s dx .997 7-11 Example 2 • The daily water usage per person in New Providence, New Jersey is normally distributed with a mean of 20 gallons and a standard deviation of 5 gallons. • About 68% of the daily water usage per person in New Providence lies between what two values? • That is, about 68% of the daily water usage will lie between 15 and 25 gallons. m 1s 20 1(5). 7-18 Normal Approximation to the Binomial • Using the normal distribution (a continuous distribution) as a substitute for a binomial distribution (a discrete distribution) for large values of n seems reasonable because as n increases, a binomial distribution gets closer and closer to a normal distribution. • The normal probability distribution is generally deemed a good approximation to the binomial probability distribution when n and n - 1 are both greater than 5. 7-20 Binomial Distribution for n = 3 & n = 20 n=20 0.4 0.2 0.3 0.15 P(x) P(x) n=3 0.2 0.1 0.1 0.05 0 0 0 1 2 3 number of occurences 2 4 6 8 10 12 14 16 18 20 number of occurences Central Limit Theorem • Flip coin N times • Each outcome has an associated random variable Xi (= 1, if heads, otherwise 0) • Number of heads: NH = x1 + x2 + …. + xN • NH is a random variable Central Limit Theorem • Coin flip problem. • Probability function of NH – P(Head) = 0.5 (fair coin) N=5 N = 10 N = 40 Central Limit Theorem The distribution of the sum of N random variables becomes increasingly Gaussian as N grows. Example: N uniform [0,1] random variables. 112.3 127.8 143.3 25 20 P e r c e n t 15 10 5 0 80 90 100 110 120 POUNDS 130 140 150 160 Normal Distribution % Probability / % % Normal Distribution Why are normal distributions so important? • Many dependent variables are commonly assumed to be normally distributed in the population • If a variable is approximately normally distributed we can make inferences about values of that variable • Example: Sampling distribution of the mean • So what? • Remember the Binomial distribution – With a few trials we were able to calculate possible outcomes and the probabilities of those outcomes Normal Distribution Why are normal distributions so important? • Remember the Binomial distribution – With a few trials we were able to calculate possible outcomes and the probabilities of those outcomes • Now try it for a continuous distribution with an infinite number of possible outcomes. Yikes! • The normal distribution and its properties are well known, and if our variable of interest is normally distributed, we can apply what we know about the normal distribution to our situation, and find the probabilities associated with particular outcomes. • Since we know the shape of the normal curve, we can calculate the area under the curve • The percentage of that area can be used to determine the probability that a given value could be pulled from a given distribution. • The area under the curve tells us about the probability- in other words we can obtain a pvalue for our result (data) by treating it as a normally distributed data set.