Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NORMAL DISTRIBUTION AND ITS APPL ICATION INTRODUCTION Statistically, a population is the set of all possible values of a variable. Random selection of objects of the population makes the variable a random variable ( it involves chance mechanism) Example: Let ‘x’ be the weight of a newly born baby. ‘x’ is a random variable representing the weight of the baby. The weight of a particular baby is not known until he/she is born. Discrete random variable: If a random variable can only take values that are whole numbers, it is called a discrete random variable. Example: No. of daily admissions No. of boys in a family of 5 No. of smokers in a group of 100 persons. Continuous random variable: If a random variable can take any value, it is called a continuous random variable. Example: Weight, Height, Age & BP. Continuous Probability Distributions Continuous distribution has an infinite number of values between any two values assumed by the continuous variable As with other probability distributions, the total area under the curve equals 1 Relative frequency (probability) of occurrence of values between any two points on the x-axis is equal to the total area bounded by the curve, the x-axis, and perpendicular lines erected at the two points on the x-axis The Normal or Gaussian distribution is the most important continuous probability distribution in statistics. The term “Gaussian” refers to ‘Carl Freidrich Gauss’ who develop this distribution. The word ‘normal’ here does not mean ‘ordinary’ or ‘common’ nor does it mean ‘disease-free’. It simply means that the distribution conforms to a certain formula and shape. Histograms A kind of bar or line chart Values on the x-axis (horizontal) Numbers on the y-axis (vertical) Normal distribution is defined by a particular shape Symmetrical Bell-shaped Histogram F r e q u e n cy 20 10 0 1 1 .5 2 1 .5 3 1 .5 4 1 .5 5 1 .5 6 1 .5 7 1 .5 Age Figure 1 Histogram of ages of 60 subjects A Perfect Normal Distribution Gaussian Distribution Many biologic variables follow this pattern Hemoglobin, Cholesterol, Serum Electrolytes, Blood pressures, age, weight, height One can use this information to define what is normal and what is extreme In clinical medicine 95% or 2 Standard deviations around the mean is normal Clinically, 5% of “normal” individuals are labeled as extreme/abnormal We just accept this and move on. Normal distribution Most important distribution in statistics Also called the Gaussian distribution Density given by f ( x) 1 2 ( x )2 e 2 2 for - < x < where is the mean and the standard deviation Gaussian or Normal Distribution Curve Characteristics of Normal Distribution Symmetrical about mean, Mean, median, and mode are equal Total area under the curve above the xaxis is one square unit 1 standard deviation on both sides of the mean includes approximately 68% of the total area 2 standard deviations includes approximately 95% 3 standard deviations includes approximately 99% Characteristics of the Normal Curve Values on the horizontal axis are Z values ranging from 0< to <1 (probability units) The mean is the center and the values in Standard Deviations account for proportions of the population 1 SD = 68% of the sample 2 SD= 95% of the sample 3 SD = 99% of the sample Characteristics of the Normal Distribution Normal distribution is completely determined by the parameters and values of shift the distribution along the x-axis Different values of determine degree of flatness or peakedness of the graph Different Applications of Normal Distribution Frequently, data are normally distributed Essential for some statistical procedures If not, possible to transform to a more normal form Approximations for other distributions Because of the frequent occurrence of the normal distribution in nature, much statistical theory has been developed for it. What’s so Great about the Normal Distribution? If you know two things, you know everything about the distribution Mean Standard deviation You know the probability of any value arising Standardised Scores My diastolic blood pressure is 100 So what ? Normal is 90 (for my age and sex) Mine is high But how much high? Express it in standardised scores How that? many SDs above the mean is Mean = 90, SD = 4 (my age and sex) My Score - Mean Score 100-90 2.5 SD 4 This is a standardised score, or z-score Can consult tables (or computer) See how often this high (or higher) score occur 99.38% of people have lower scores A Z-score Table Z-Score 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 Proportion Scoring Lower 0.9998 0.9987 0.9938 0.9772 0.9332 0.8413 0.6915 0.5000 % (Rounded to whole number) 100% 100% 99% 98% 93% 84% 69% 50% Standard Normal Distribution Normal distribution is really family of curves determined by and Standard normal distribution is one with a = 0 and = 1 Standard normal density given by: f ( x) 1 z2 2 e 2 for - < x < where z = (x - ) / Standard Normal Distribution To find probability that z takes on a value between any two points on the z-axis, need to find area bounded by perpendiculars erected at these points , the curve, and the z-axis Values are tabled. Standard normal distribution is symmetric Examples of Standard Normal Distribution Height and weight Calculate z-statistics Pr(X < x) Pr(X > x) Pr(x1 < X < x2) Why? Determine percentiles Comparisons between different distributions Normal Distributions Go Wrong Wrong shape Non-symmetrical Skew Too fat or too narrow Kurtosis Aberrant values Outliers Effects of Non-Normality Skew Bias parameter estimates E.g. mean Kurtosis Doesn’t Does effect parameter estimates effect standard errors Outliers Depends Distributions Bell-Shaped (also known as symmetric” or “normal”) Skewed: positively (skewed to the right) – it tails off toward larger values negatively (skewed to the left) – it tails off toward smaller values Kurtosis Outliers 20 10 0 Value Dealing with Outliers Error Data entry error Correct it Real value Difficult Delete it ANY QUESTIONS