Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Normal Distribution Introduction Compare to Discrete Variables • No. of Doctor’s Visits During the Year • No. of Patients P No. of Visits 400 0.14 0 950 0.34 1 850 0.30 2 600 0.21 3+ 2800 0.99 Histograms • The height of each bar represents the probability of that event • If each bar is one unit in width, then the area also equals the probability • The total area under all the bars has to add to 1. Doctor's Visits Probability 1000 800 0 1 2 3+ 600 400 200 0 No. of Visits Continuous Variables Patient’s Weight 300 290 280 Frequency 2 3 7 But… Can take on any value • Can make the weight intervals as small as we want: every 10 lbs or 5 or 1, or … 0.5, 0.1, 0.001 • Histogram: As the intervals get smaller, the bars decrease in width Line Graph • Completely continuous, no width at all. Just connected points Line Graph 100 90 80 70 60 50 40 30 20 10 0 Infinitesimally Small Intervals • Then really just points on a smooth curve. • We can also have n, the number of cases, increase to infinity. • The total probability is still one. Infinitesimally Small Intervals Smoother Curve Area under the curve = 1. Probabilities • Can no longer read the probability of a single event. • In a continuous distribution, can only measure the probability of a value falling within some range Probability Within a Range Probability of a value falling within the range is equal to the area under the curve. Bad News • To calculate the area under the curve we would need to use calculus • But not so bad news, others have done the calculations and set up tables for us • Applause Diversity of Continuous Distributions • Lots of different distributions • Lots of different shaped curves • Would need lots of different tables, however…. The Most Important Distribution Introducing the Normal Distribution “Bell-Shaped Curve” What are its characteristics? Normal Distribution • First described in 1754. • A lot of the relevant math done by Carl Gauss, therefore “Gaussian Curve” Properties • Symmetrical about the mean • Mean, Median & Mode are all equal • Asymptotic, height never reaches zero. • What’s the total area under the curve? Ranges & Probabilities • 50% of all values fall above the mean & 50% below it. • All probabilities depend on how far the values lie from the mean • Distance measured in number of standard deviations from the mean Probabilities related to S.D. One S.D. on either side of the mean Area = Other Distances • 1 S.D. on either side of the mean includes 68% of the cases • 2 S.D. on either side of the mean includes 95% of the cases • 3 S.D. on either side of the mean includes 99.7% of the cases Many Different Normal Distributions • Determined by their mean and standard deviation Mean gives location. Standard Deviation gives shape – more or less dispersed. Proportions remain Same • Relationships between probability and standard deviation are the same in all Normal Distributions • However in order to use the tables provided, we have to convert to the “Standard Normal Distribution” The Standard Normal Distribution Mean = 0. Standard Deviation = 1. Z-values • Converts values in any normal distribution to the standard normal distribution. • It’s a way to express the distance from the mean in units of S.D. • Z=X–X s.d. Compare this to 18 eggs. How many dozen? From Z find Probabilities Use Table A-3. Gives areas in the upper tail of the S.N.D. What is the area above Z = 1.28? Go to the Table. Go to 1.2 in Left-hand column & across to 0.08 A = 0.10. The probability that a value will fall above Z = 1.28 is 10% S.N.D. mean = 0. S.D. = 1 Test It • • • • • • Let’s look up the ones we already know. Range = 1 S.D. on either side of the mean Z = 1. Find 1.0 in the right hand column Go across to 0.00 Reads 1.59. So area in the tail is 1.59. What’s the area between 1.59 and the mean? Always draw the N.D. A = .159 If Area above z = 1 is 0.159, what is the area between Z and the mean? A = 0.500 - 0.159 = 0.341 We need to add an equal area on the other side of the mean. Total shaded area = 0.682 You Try It • What is the probability that a value will fall within 2 s.d. of the mean? • Draw the N.D • Look up area that corresponds to Z = 2. • A = 0.023 • Find the area between mean & Z = 2. • 0.500 – 0.023 = 0.477 • Double it. A = 0.954 Try the Reverse • I want to find the value above which 10% of the population falls. • This time, area = 0.100 • Look in body of table for 0.100 • Read across and up. Z = 1.28 • Would have to use the formula for Z in reverse in order to get the value for X Finding X Z=X–X S.D. 1.28 = X – X S.D. S.D. * 1.28 + X = X To convert to X, have to know mean & S.D. Example • Weights of 40-yr old women are normally distributed with a mean of 150 and an S.D. of 10. • What is the value above which the highest 10% of weights falls? • X = 1.28 * 150 + 10 = 202 Application • Studying a progressive neurological disorder. At autopsy, we weigh the brains. Find the wts are normally distributed with a mean of 1100 grams and an S.D. of 100 g. • Find the probability that one of the brains weighs less than 850 g. Draw the N.D. 800 1100 Z = (800 – 1100)/100 = -3 P(X<800) = Area = 0.0001 The End For Now More Ranges • The cholesterol levels for a certain population are approximately normally distributed with a mean of 200 mg/100 ml & an S.D. of 20 mg/100 ml. • Find the probabilities for an individual picked at random to have cholesterol levels in the following ranges Mean = 200 mg/100ml S.D. = 20 mg/100 ml A. Between 180 & 200 B. Greater than 225 C. Between 190 & 210 Mean = 200 mg/100ml S.D. = 20 mg/100 ml A. Between 180 & 200 • Z1 = 0. Z2 = (180 – 200)/20 = -1 So the area is from the mean to one S.D. If it was both sides, would be .68. Since only one side = 0.32. P = 0.32. Mean = 200 mg/100ml S.D. = 20 mg/100 ml B. Greater than 225 • Z = (225 – 200)/20 = 1.25 • Look it up. Area = 0.106 • P(X>225) = 0.106 Mean = 200 mg/100ml C. Between 190 & 210 S.D. = 20 mg/100 ml • Z1 = (190 – 200)/20 = -0.5 Look up = 0.309. But that is the tail. What is Z = 0.5 to mean? 0.500 – 0.309 = 0.191 • Z2 = 0.5. Symmetrical. So Z2 to the mean is also 0.191. • P = 2 times 0.191 = 0.382