Download Stat 281 Chapter 6

Chapter 6 Continuous Distributions The Gaussian (Normal) Distribution When Discrete Distributions Aren’t Enough • Discrete distributions are used in situations involving counts. (Others are possible but this is the vast majority.) • What happens when you want to measure things? – Height – Weight – Miles per Gallon • These aren’t counts. (Why not?) • Measurements involve rounding and precision. • When any level of precision is theoretically possible, we call this a “continuous” variable. • The values come from the set of Real Numbers, ie, the number line. Real Numbers • ------------------------------------------------------- -∞ … -3 -2 -1 0 1 2 3 … ∞ • The Real Numbers include all possible values between the pictured integers. • That includes rational numbers like ½, 1/3, 237/573, etc. • It also includes irrational numbers like π and √2. • Real numbers have an infinite string of decimal places. • There are “uncountably many” real numbers between any two specified real numbers. Intervals • An interval is a “piece” of the number line, or a subset of the Real numbers. • There are no “gaps.” For any two numbers in it, all real numbers between them are included. • Therefore an interval is described by its endpoints—with a few special considerations. • The endpoints may or may not be included. Round brackets are used to exclude the endpoints, square brackets to include them. Write in order. Ex.: [0,1], (9, 100), [3,6), (0,7]. • If an interval goes on to infinity, the ∞ or -∞ symbol is used with a round bracket, since infinity is not a number. Ex.: [0,∞), (-∞,-10). Definition of Continuous R.V. • A continuous random variable takes on values in some Real Interval. • ------------------------------------------------------- -∞ … -3 -2 -1 0 1 2 3 … ∞ • Suppose a r.v. X takes values in [0,1]. How many different values are there? • Suppose you assign some tiny probability to each Real Number in [0,1]. What is the total probability? • Suppose you divide [0,1] up into 10 subintervals. • Can you assign probabilities to these so the total is 1? Definition of Continuous R.V. • This illustrates the problem with assigning probabilities to individual numbers, and the contrasting ease of assigning probability to intervals. • Summary: – Any continuous distribution has infinitely many values. – No single point has a positive probability. – Said another way: Every individual value of a continuous random variable has probability zero, and as such is an impossible event. – Intervals can be assigned positive probability. The Paradox • Obviously, a r.v., X, must take on some value, and if it does, that value is not impossible (it has P>0). • We never actually “mean” a single value. Measurements are given with a certain precision. • Example: temperature is continuous, but measured to the nearest degree, “70” really means the interval [69.5,70.5). • Intervals can have positive probability, and we can make them as small as we like. • The fact that a continuous r.v. cannot take a single value agrees nicely with the fact that it is impossible to measure anything to the exact real number value. • Instead, we divide up our scale using equal-width subintervals based on the precision of the measuring device. These subintervals have positive probability. Continuous Probabilities • Probabilities for a continuous random variable, X, are given by a probability function, P. • P(X=k)=0 for any k. • We might find positive probabilities for expressions like – P(X>k), – P(X<k), or – P(a<X<b). Note: the interval is (k,∞) Note: the interval is (-∞,k) Note: the interval is (a,b) • A formula that gives probabilities for X would need to give probabilities for intervals, rather than single values. Has anything prepared us for this? • Tables of probability for discrete r.v.’s? Not if only individual values were given. • Ungrouped histograms? No, same. • Grouped histograms? Let’s see…. • Each bar represents a frequency for an interval, even though this is a discrete example. What about relative histograms? • Look at the histogram for the number of three’s showing in a two-dice toss. • Notice it shows the probabilities for 3 discrete values. • Replace the discrete values with intervals, [0,1), [1,2), and [2,3). • Then this histogram looks like it belongs to a continuous distribution with values in [0,3). 0.7 Relative Frequency 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0 1 Three's Face Up 2 Making the Leap • Change the horizontal axis to show that the bars belong to each interval. • Each bar is 1 unit wide and its height represents the probability for that interval. • Each bar is a rectangle, whose area is 1 x height. • Since the heights add up to 1, the total area of the shaded region is 1. • Make the transition to the continuous case: Instead of representing probability by height, use area. What did we leap over? • This has been more of an analogy than an explanation. Many details that require calculus are glossed over. • The problem: can’t represent probabilities by height at a point, because points all have probability zero. • Solution: switch to areas, where the bottom boundary (on the x axis) represents an interval for which we want to determine probability. The area of the graph above that interval represents its probability. • In calculus, these areas are called “definite integrals.” You don’t really need to know that, but you may come across the following symbol, which means “the integral from a to b.” b  a Uniform Distribution • A uniform distribution is defined for an interval outside of which there is no positive probability. (This is to prevent the area from being infinite.) • Inside that interval, it has the same probabilities for any sub-interval of a given size (they are “always the same”). • A uniform distribution on the interval [0,3] is shown here. Note that the height is 1/3, because 3 x 1/3 = 1. • However, we should not say that 1/3 is the probability of anything in particular. Uniform Examples • Let X be a uniform r.v. on the interval [1,5]. • Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3). • Solution: The width of the distribution is 4, so the height of the graph is ¼ between 1 and 5. The area for any interval will be ¼ x the width of the interval. – – – – P(X>3)=(5-3)/4=1/2 P(X<5)=(5-1)/4=1 P(2<X<3)=(3-2)/4=1/4 Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2 Uniform Examples • Let X be a uniform r.v. on the interval [1,5]. • Find P(X>3), P(X<5), P(2<X<3), and P(0<X<3). • Solution: The width of the distribution is 4, so the height of the graph is ¼ between 1 and 5. The area for any interval will be ¼ x the width of the interval. – – – – P(X>3)=(5-3)/4=1/2 P(X<5)=(5-1)/4=1 P(2<X<3)=(3-2)/4=1/4 Careful! P(0<X<3)=P(1<X<3)=(3-1)/4=1/2 More Uniform Examples • Let X be a uniform r.v. on the interval [0,8]. – Find P(X>3), P(X<5), and P(2<X<3 or 7<X<8). – Find the median and the 90th percentile. • Solution: The width of the distribution is 8, so the height of the graph is 1/8. – – – – P(X>3)=(8-3)/8=5/8. P(X<5)=(5-0)/8=5/8. P(2<X<3 or 7<X<8)=P(2<X<3)+P(7<X<8)=1/4. The median must have half the probability above it and half below. Therefore the median is 4. – P90 is a number such that 90% of the probability is below it, so we have (P90-0)/8=.9, so P90 =7.2. Probability Density Function • We have been dealing with the uniform distribution in terms of graphs. Before moving on, we need to put these ideas into the form of mathematical notation. • We were focusing on the areas of portions of a graph like the one below. But how do we define the region we want the area for? – The bottom boundary is the x axis – The sides are vertical lines going through the x values we want – The top of the region is a special “curve” (straight lines are curves too). • This curve is defined by a function, called the probability density function, or pdf. For our graph, it is: 1/ 4 if 1  x  5 f ( x)    0 otherwise Normal Probability Distributions • The normal probability distribution (Gaussian Distribution) is the most important distribution in all of statistics. • Many continuous random variables have normal or approximately normal distributions. • A normal distribution is defined by its pdf. The Normal pdf 1 f ( x)   2 • • • • • e 1  x     2   2 The parameters are μ and σ. The mean of the distribution is μ. The standard deviation is σ. The median and mode are also μ. There is a normal distribution for every combination of values of μ and σ Basic Shape • Here we see the basic shape of a normal distribution. • The blue band is an example of an “area under the curve” that we might want to calculate. • This particular distribution has μ=110 and σ=10. • The “x” axis represents values of the r.v. X. Effect of Changing μ Changing μ just causes a horizontal shift, centering the graph in a different place. Effect of Changing σ • Changing σ causes the graph to stretch out or squeeze together around the mean. What does this mean? • The normal pdf is a complicated formula. It is not easy to calculate probabilities from it, even if you know calculus. So, we use tables (or computers). • We can’t have a table for every possible normal distribution. • We have one table for the “standard” normal distribution, which has μ=0 and σ=1. This r.v. is called Z. • It is easy to convert probability statements from other normal distributions to Z. Table 3, Appendix B entries: 0 z The table contains the area under the standard normal curve between 0 and a specific value of z. Example: Find the area under the standard normal curve between z = 0 and z = 1.45. 0 1.45 A portion of Table 3: z 0.00 0.01 1.4 P(0  z  1.45)  0.4265 0.02 0.03 0.04 0.05 0.4265 0.06 Example: Find the area under the normal curve to the right of Z = 1.45; P(Z > 1.45). Area asked for 0.4265 0 1.45 P( Z  1.45)  0.5000  0.4265  0.0735 Example: Find the area to the left of Z = 1.45; P(Z < 1.45). 0.5000 0.4265 0 1.45 P( Z  1.45)  0.5000  0.4265  0.9265 Example: Find the area between Z = 1.26 and the mean (Z = 0). Area from table 0.3962 Area asked for 1.26 0 P(1.26  Z  0)  0.3962 1.26 Example: Find the area to the left of .98; P(Z < .98). Area from table 0.3365 Area asked for Same as area asked for .98 0 .98 P( Z  .98)  0.5000  0.3365  0.1635 Applications of Normal Distributions • Apply the techniques learned for the Z distribution to all normal distributions. • Start with a probability question in terms of x-values. • Convert, or transform, the question into an equivalent probability statement involving z-values. Standardization Suppose X is a normal r.v. with mean  and standard deviation . X  The r.v. Z  has a standard normal distribution.   0 x x  Example: A bottling machine is adjusted to fill bottles with a mean of 32.0 oz of soda and standard deviation of 0.02. Assume the amount of fill is normally distributed and a bottle is selected at random. 1. Find the probability the bottle contains between 32 oz and 32.025 oz. 2. Find the probability the bottle contains more than 31.97 oz. 32   32  32 When x  32; z   0  .02 32   32.025  32 When x  32.025; z    1.25  .02  32  32 X  32 32.025  32  P(32  X  32.025)  P     .02 .02  .02   P(0  Z  1.25)  .3944 Other Normal Applications Find a cutoff point: a value of X such that there is a certain probability in a specified interval defined by x. Example: The waiting time X at a certain bank is approximately normally distributed with a mean of 3.7 minutes and a standard deviation of 1.4 minutes. The bank would like to claim that 95% of all customers are waited on by a teller within c minutes. Find the value of c that makes this statement true. Solution: 0.0500 0.5000 0.4500 3.7 0 P( X  c)  .95  X  3.7 c  3.7  P    .95 1.4   1.4 c  3.7   P Z    .95 1.4   c 1645 . x z c  3.7  1645 . 14 . c  (1645 . )(14 . )  3.7  6.003 c  6 minutes Notation: If X is a normal random variable with mean  and standard deviation , this is often denoted: X ~ N(, 2). Example: Suppose X is a normal random variable with  = 35 and  = 6. A convenient notation to identify this random variable is: x ~ N(35, 36). z(a) and za are commonly used notations for the zscore (point on the z axis) such that there is a of the area (probability) to the right of z(a) or za . Illustrations: z(0.10) represents the value of Z such that the area to the right under the standard normal curve is 0.10 010 . 0 z(0.80) represents the value of Z such that the area to the right under the standard normal curve is 0.80 z(010 . ) z 0.80 z(0.80) 0 z Example: Find the numerical value of z(0.10). Table shows this area (0.4000) 0.10 (area information from notation) 0 z(010 . ) z Use Table 3: look for an area as close as possible to 0.4000 z(0.10) = 1.28 Note: The values of Z that will be used regularly come from one of the following situations: 1. The z-score such that there is a specified area in one tail of the normal distribution. 2. The z-scores that bound a specified middle proportion of the normal distribution. Example: Find the z-scores that bound the middle 0.99 of the normal distribution. 0.005 0.005 0.495 z(0.995) or  z(0.005) 0.495 0 z(0.005) Use Table 3: z(0.005)  2.575 and z(0.995)   z(0.005)  2.575

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Stat 281 Chapter 6