Survey

Transcript

Normal Distributions 1 Continuous Distribution • For a discrete distribution, for example Binomial distribution with n=5, and p=0.4, the probability distribution is x 0 1 2 3 4 5 f(x) 0.07776 0.2592 0.3456 0.2304 0.0768 0.01024 A probability histogram 0.3 0.2 P(x) 0.1 0.0 0 1 2 3 x 4 5 How to describe the distribution of a continuous random variable? • For continuous random variable, we also represent probabilities by areas—not by areas of rectangles, but by areas under continuous curves. • For continuous random variables, the place of histograms will be taken by continuous curves. • Imagine a histogram with narrower and narrower classes. Then we can get a curve by joining the top of the rectangles. This continuous curve is called a probability density (or probability distribution). Continuous distributions • For any x, P(X=x)=0. (For a continuous distribution, the area under a point is 0.) • Can’t use P(X=x) to describe the probability distribution of X • Instead, consider P(a≤X≤b) Density function 0.20 0.15 0.00 • P(a≤X≤b) is the area between a and b 0.05 0.10 y • The area under the curve is 1 0.25 • A curve f(x): f(x) ≥ 0 0 2 4 6 x 8 10 0.00 0.05 0.10 y 0.15 0.20 0.25 P(2≤X≤4)= P(2≤X<4)= P(2<X<4) 0 2 4 6 x 8 10 Properties Of Normal Curve • • • • Normal curves are symmetrical. Normal curves are unimodal. Normal curves have a bell-shaped form. Mean, median, and mode all have the same value. • Total area = 1 • Defined by mean and standard deviation (centered at mean) 8 Percent of Values Within One Standard Deviations 68.26% of Cases 9 Percent of Values Within Two Standard Deviations 95.44% of Cases 10 Percent of Values Within Three Standard Deviations 99.72% of Cases 11 Percent of Values Greater than 1 Standard Deviation 12 Data in Normal Distribution (X 1S ) contains about 68% of the scores (X 2S ) contains about 95% of the scores (X 3S ) contains about 99% of the scores 13 Standard Scores (Z-scores) • Expressed in standard deviations from mean • There are many kinds of Standard Scores. The most common standard score is the ‘z’ scores. • A ‘z’ score states the number of standard deviations by which the original score lies above or below the mean of a normal curve. z x 14 Commonly used probabilities and z-scores: Middle 90%: between -1.645 and +1.645 Middle 95%: between -1.96 and +1.96 Middle 99%: between -2.576 and +2.576 90th percentile: 1.28 95th percentile: 1.645 99th percentile: 2.33 15 Area When Score is Known • For a normal distribution with mean of 100 and standard deviation of 20, what proportion of cases fall below 80? • ~16% 16 Calculator: Normal cumulative density function: normalcdf(left bound, right bound, mean, st. dev.) (mean and st. dev. default to 0 and 1) To find z-scores given the area in the left tail of a normal distribution: invNorm(area, mean, standard deviation) 17 Score When Area Is Known • For a normal distribution with mean of 100 and standard deviation of 20, find the score that separates the upper 20% of the cases from the lower 80% • Answer = 116.8 18 Ways to Assess Normality • Use graphs (dotplots, boxplots, or histograms) • Normal probability (quantile) plot Normal Probability (Quartile) plots • The observation (x) is plotted against known normal z-scores • If the points on the quantile plot lie close to a straight line, then the data is normally distributed • Deviations on the quantile plot indicate nonnormal data • Points far away from the plot indicate outliers • Vertical stacks of points (repeated observations of the same number) is called granularity Are these approximately distributed? 50 48 54 47 51 52 52 51 48 48 54 55 53 50 47 49 50 56 normally 46 53 57 45 53What 52is this called? Both the histogram & boxplot are approximately symmetrical, so these data are approximately normal. The normal probability plot is approximately linear, so these data are approximately normal.