Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 Continuous Distributions Continuous random variables • Are numerical variables whose values fall within a range or interval • Are measurements • Can be described by density curves Density curves • Is always on or above the horizontal axis • Has an area exactly equal to one underneath it • Often describes an overall distribution • Describe what proportions of the observations fall within each range of values Unusual density curves • Can be any shape • Are generic continuous distributions • Probabilities are calculated by finding the area under the curve .5 How do you find the area of a triangle? .25 1 2 3 4 2.25 .25 P(X < 2) = 2 5 What is the area of a line segment? .5 .25 1 2 P(X = 2) = 0 P(X < 2) = .25 3 4 5 In continuous distributions, P(X < 2) & P(X < 2) Hmmmm… are the same Is this different than discrete answer. distributions? Shape is a trapezoid – .5 b1How = .5long are the bases? .25 b2 = .375 1 2 4 h = 1 3 b1 b2 h Area 5 2 P(X > 3) = .5(.375+.5)(1)=.4375 P(1 < X < 3) =.5(.125+.375)(2) =.5 P(X > 1) = .75 0.50 .5(2)(.25) = .25 0.25 (2)(.25) = .5 1 2 3 4 0.50 P(0.5 < X < 1.5) = .28125 .5(.25+.375)(.5) = .15625 0.25 (.5)(.25) = .125 1 2 3 4 Special Continuous Distributions Normal Distributions • • • • • Symmetrical bell-shaped (unimodal) density curve How is this done Above the horizontal axis mathematically? N(m, s) The transition points occur at m + s Probability is calculated by finding the area under the curve • As s increases, the curve flattens & spreads out • As s decreases, the curve gets taller and thinner Normal distributions occur frequently. • • • • • • • Length of newborn child Height Weight ACT or SAT scores Intelligence Number of typing errors Chemical processes A 6 B s s Do these two normal curves have the same mean? If so, what is it? YES Which normal curve has a standard deviation of 3? B Which normal curve has a standard deviation of 1? A Empirical Rule • Approximately 68% of the observations fall within s of m • Approximately 95% of the observations fall within 2s of m • Approximately 99.7% of the observations fall within 3s of m Suppose that the height of male students at PWSH is normally distributed with a mean of 71 inches and standard deviation of 2.5 inches. What is the probability that the height of a randomly selected male student is more than 73.5 inches? 1 - .68 = .32 P(X > 73.5) = 0.16 68% 71 Standard Normal Density Curves Always has m = 0 & s = 1 To standardize: x m z s Must have this memorized! Strategies for finding probabilities or proportions in normal distributions 1. State the probability statement 2. Draw a picture 3. Calculate the z-score 4. Look up the probability (proportion) in the table The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standardDraw deviation of 15 & shade Write the hours. What proportion of these the curve probability batteries can be expected to last less statement than 220 hours? P(X < 220) = .9082 Look up z220 200 score in z 1.33 table 15 Calculate z-score The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 hours. What proportion of these batteries can be expected to last more than 220 hours? P(X>220) = 1 - .9082 = .0918 220 200 z 1.33 15 The lifetime of a certain type of battery is normally distributed with a mean of 200 hours and a standard deviation of 15 Look up in table 0.95 hours. How long must a battery last to be in the top 5%? to find z- score P(X > ?) = .05 x 200 1.645 15 x 224.675 .95 .05 1.645 The heights of the female students at PWSH are normally distributed with a What is the zmean of 65 inches. What is the for the standard deviation of this score distribution 63? if 18.5% of the female students are shorter than 63 inches? P(X < 63) = .185 63 65 .9 s 2 s 2.22 .9 -0.9 63 The heights of female teachers at PWSH are normally distributed with mean of 65.5 inches and standard deviation of 2.25 inches. The heights of male teachers are normally distributed with mean of 70 inches and standard deviation of 2.5 inches. •Describe the distribution of differences of heights (male – female) teachers. Normal distribution with m = 4.5 & s = 3.3634 • What is the probability that a randomly selected male teacher is shorter than a randomly selected female teacher? P(X<0) = .0901 0 4.5 z 1.34 3.3634 4.5 Will my calculator do any of this normal stuff? • Normalpdf – use for graphing ONLY • Normalcdf – will find probability of area from lower bound to upper bound • Invnorm (inverse normal) – will find z-score for probability Ways to Assess Normality • Use graphs (dotplots, boxplots, or histograms) • Normal probability (quantile) plot Normal Scores Suppose we have the following To construct a normal probability plot, Sketch a scatterplot by pairing the Think of selecting sample after sample of observations of widths of contact you cansmallest use quantities called normal normal score with the size 10 from a standard normal windows in integrated circuit chips: What should score. The values of the normal scores smallest observation from the the 1distribution. Then -1.539 is data happen if sample size n. The normal depend on the set smallest & so on observation average of the our when data n = 10 are below: scores from each sample & so on . . . is 3.21 set2.49 2.94 4.38 4.02 2 33.34 4 3.81 5 3.62normally 3.301 2.85 distributed? -1 -1.539 -1.001 -0.656 -0.376 -0.123 Contact 1.001 Windows1.539 0.123 Widths 0.376of 0.656 Normal Probability (Quantile) plots • The observation (x) is plotted against known normal z-scores • If the points on the quantile plot lie close to a straight line, then the data is normally distributed • Deviations on the quantile plot indicate nonnormal data • Points far away from the plot indicate outliers • Vertical stacks of points (repeated observations of the same number) is called granularity Are these approximately normally distributed? 50 48 54 47 51 52 46 53 What 52 51 48 48 54 55 57is this 45 53 50 47 49 50 56 called? 53 52 Both the histogram & boxplot are approximately symmetrical, so these data are approximately normal. The normal probability plot is approximately linear, so these data are approximately normal. Normal Approximation to the Binomial Before widespread use of technology, binomial probability calculations were very tedious. Let’s see how statisticians estimated these calculations in the past! Premature babies are those born more than 3 weeks early. Newsweek (May 16, 1988) reported that 10% of the live births in the U.S. are premature. Suppose that 250 live births are randomly selected and that the number X of the “preemies” is determined. What is the probability that there are between 15 and 30 preemies, inclusive? (POD, p. 422) 1) Find this probability using the binomial distribution.P(15<X<30) = binomialcdf(250,.1,30) – binomialcdf(250,.1,14) =.866 2) What is the mean and standard deviation of the above distribution? m = 25 & s = 4.743 3) If we were to graph a histogram for the above binomial distribution, what shape do –you Let’s graph this distribution think it will have? •Put the numbers 1-45 in L1 Since the probability is only 10%, we expect the histogram be •Inwould L2, use binomialpdf to to find strongly skewed right. the probabilities. 4) What do you notice about the shape? Overlay a normal curve on your histogram: •In Y1 = normalpdf(X,m,s) Normal distributions can be used to estimate probabilities for binomial distributions when: 1) the probability of success is close to .5 or 2) n is sufficiently large Rule: if n is large enough, then np > 10 & n(1 –p) > 10 Why 10? Normal distributions extend infinitely in both directions; however, binomial distributions are between 0 and n. If we use a normal distribution to estimate a binomial distribution, we must cut off the tails of the normal distribution. This is OK if the mean of the normal distribution (which we use the mean of the binomial) is at least three standard deviations (3s) from 0 and from n. (BVD, p. 334) We require: m 3s 0 Or m 3s As binomial: np 3 np 1 p Square: n 2 p 2 9np 1 p Simplify: np 91 p Since (1 - p) < 1: np 9 n 1 p 9 And p < 1: Therefore, we say the np should be at least 10 and n (1 – p) should be at least 10. Normal can be used to Thinkdistributions about how discrete histograms estimate probabilities for binomial are made. Each bar is centered distributions when: over the discrete values. The bar 1) the probability of success is close to .5 for “1” actually goes from 0.5 to or 1.5 & the bar for “2” goes from 1.5 2) n is sufficiently large to 2.5. Therefore, by adding or Rule: if n is large enough, subtracting .5 from the discrete then np > 10 & n(1 –p) > 10 Why? values, you find the actually width of the bars that you need to Since a continuous distribution is used to estimate with the normal curve. estimate the probabilities of a discrete distribution, a continuity correction is used to make the discrete values similar to continuous values.(+.5 to discrete values) (Back to our example) Since P(preemie) = .1 which is not close to .5, is n large enough? np = 250(.1) = 25 & n(1-p) = 250(.9) = 225 Yes, Ok to use normal to approximate binomial 5) Use a normal distribution with the binomial mean and standard deviation above to estimate the probability that between 15 & 30 preemies, inclusive, are born in the 250 randomly selected babies. Binomial written as Normal (w/cont. correction) P(15 < X < 30) P(14.5 < X < 30.5) = Normalcdf(14.5,30.5,25,4.743) = .8635 6) How does the answer in question 6 compare to the answer in question 1 (Binomial answer =0.866)?