Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Learning Objectives Continuous Random Variables & The Normal Probability Distribution 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform probability distribution 3. Graph a normal curve 4. State the properties of a normal curve 5. Understand the role of area in the normal density function 6. Understand the relation between a normal random variable and a standard normal random variable Continuous Random Variable • The outcomes of a continuous random variable consist of all possible values made up an interval of a real number line. Continuous Random Variable & Continuous Probability Distribution • In other words, there are infinite number of possible outcomes for a continuous random variable. Continuous Random Variable Continuous Random Variable • For instance, the birth weight of a randomly selected baby. The outcomes are between 1000 and 5000 grams with all 1-gram intervals of weight between1000 and 5000 grams equally likely. • To resolve this problem, we compute probabilities of continuous random variables over an interval of values. For instance, instead of getting exactly weight of 3250.326144 grams we may compute the probability that a selected baby’s weight is between 3250 to 3251 grams. • The probability that an observed baby’s weight is exactly 3250.326144 grams is almost zero. This is because there may be one way to observe 3250.326144, but there are infinite number of possible values between 1000 and 5000. According to the classical probability approach, the probability is found by dividing the number of ways an event can occur by the total number of possibilities. So, we get a very small probability almost zero. • To find probabilities of continuous random variables, we use probability distribution (or so called density) function. 1 Uniform Random Variable Uniform Random Variable & Uniform Probability Distribution Uniform Probability Distribution • When “every number” is equally likely in an interval, this is a uniform probability distribution • Sometimes we want to model a continuous random variable that is equally likely between two limits • Examples – Choose a random time … the number of seconds past the minute is random number in the interval from 0 to 60 – Observe a tire rolling at a high rate of speed … choose a random time … the angle of the tire valve to the vertical is a random number in the interval from 0 to 360 Example • For the seconds after the minute example • Every interval of length 3 has probability 3/60 – Any specific number has a zero probability of occurring – The mathematically correct way to phrase this is that any two intervals of equal length have the same probability – The chance that it will be between 14.4 and 17.3 seconds after the minute is 3/60 – The chance that it will be between 31.2 and 34.2 seconds after the minute is 3/60 – The chance that it will be between 47.9 and 50.9 seconds after the minute is 3/60 Probability Density Function Probability Density Function • A probability density function is an equation used to specify and compute probabilities of a continuous random variable • This equation must have two properties – The total area under the graph of the equation is equal to 1 (the total probability is 1) – The equation is always greater than or equal to zero (probabilities are always greater than or equal to zero) • This function method is used to represent the probabilities for a continuous random variable • For the probability of X between two numbers – Compute the area under the curve between the two numbers – That is the probability 2 Area is the Probability • The probability of being between 4 and 8 Probability Density Function • An interpretation of the probability density function is The probability From 4 (here) – The random variable is more likely to be in those regions where the function is larger – The random variable is less likely to be in those regions where the function is smaller – The random variable is never in those regions where the function is zero To 8 (here) Uniform Probability Density Function Probability Density Function • A graph showing where the random variable has more likely and less likely values More likely values • The time example … uniform between 0 and 60 – All values between 0 and 60 are equally likely, thus the equation must have the same value between 0 and 60 Less likely values Uniform Probability Density Function • The time example … uniform between 0 and 60 – Values outside 0 and 60 are impossible, thus the equation must be zero outside 0 to 60 Uniform Probability Density Function • The time example … uniform between 0 and 60 – Because the total area must be one, and the width of the rectangle is 60, the height must be 1/60. Therefore the uniform 1 probability density is a constant ( the equation is y = f ( x ) = ) 60 1/60 3 Uniform Probability Density Function • The time example … uniform between 0 and 60 – The probability that the variable is between two numbers is the area under the curve between them 1/60 Overview • The normal distribution models bell shaped variables • The normal distribution is the fundamental distribution underlying most of inferential statistics Normal Random Variable & Normal Probability Distribution Chapter 7 – Section 1 • The normal curve has a very specific bell shaped distribution • The normal curve looks like Normal Random Variable Normal Density Curve • A normally distributed random variable, or a variable with a normal probability distribution, is a random variable that has a relative frequency histogram in the shape of a normal curve • This curve is also called the normal density curve/function or normal curve (a particular probability density function) • In drawing the normal curve, the mean µ and the standard deviation σ have specific roles – The mean µ is the center of the curve – The values (µ – σ) and (µ + σ) are the inflection points of the curve, where the concavity of the curve changes. • The normal distribution models bell shaped variables • The normal distribution is the fundamental distribution underlying most of inferential statistics 4 Normal Density Curve • There are normal curves for each combination of µ and σ • The curves look different, but the same too • Different values of µ shift the curve left and right • Different values of σ shift the curve up and down Normal Curve • Two normal curves with different means (but the same standard deviation) – The curves are shifted left and right Normal Density Curve Properties of Normal Curve • • Two normal curves with different standard deviations (but the same mean) – The curves are shifted up and down Properties of the normal density curve – The curve is symmetric about the mean – The mean = median = mode, and this is the highest point of the curve – The curve has inflection points at (µ – σ) and (µ + σ) – The total area under the curve is equal to 1. The total area is equal to 1. (It is complicated to show this. But it is true.) – The area under the curve to the left of the mean is equal to the area under the curve to the right of the mean Properties of Normal Curve • • Properties of the normal density curve – As x increases, the curve getting close to zero (never goes to zero, though)… as x decreases, the curve getting close to zero (never goes to zero) In addition, – The area within 1 standard deviation of the mean is approximately 0.68 – The area within 2 standard deviations of the mean is approximately 0.95 – The area within 3 standard deviations of the mean is approximately 0.997 (almost 100%) This is so called empirical rule. Therefore, a normal curve will be close to zero at about 3 standard deviation below and above the mean. Empirical Rule • The empirical rule or 68-95-99.7 rule is true – Approximately 68% of the values lie between (µ – σ) and (µ + σ) – Approximately 95% of the values lie between (µ – 2σ) and (µ + 2σ) – Approximately 99.7% of the values lie between (µ – 3σ) and (µ + 3σ) • These are difficult calculations, but they are true 5 Empirical Rule ( 68-95-99.7 Rule) • An illustration of the Empirical Rule Histogram & Density Curve • When we collect data, we can draw a histogram to summarize the results • However, using histograms has several drawbacks • Histograms are grouped, so – There are always grouping errors – It is difficult to make detailed calculations Histogram & Density Curve • Instead of using a histogram, we can use a probability density function that is an approximation of the histogram • Probability density functions are not grouped, so – There are not grouping errors – They can be used to make detailed calculations Normal Curve Approximation • Lay over the top of the histogram with a curve such as Normal Histogram • Frequently, histograms are bell shaped such as • We can approximate these with normal curves Normal Density Probability Function • The equation of the normal curve with mean µ and standard deviation σ is y= • In this case, the normal curve is close to the histogram, so the approximation should be accurate 1 e 2π σ −( x − µ ) 2 2σ 2 • This is a complicated formula, but we will never need to use it for the calculation of probabilities. (thankfully) 6 Modeling with Normal Curve • When we model a distribution with a normal probability distribution, we use the area under the normal curve to – Approximate the areas of the histogram being modeled Example • Assume that the distribution of giraffe weights has µ = 2200 pounds and σ = 200 pounds – Approximate probabilities that are too detailed to be computed from just the histogram Example Continued • What is an interpretation of the area under the curve to the left of 2100? Example Continued • It is the proportion of giraffes that weigh 2100 pounds and less Note: Area = Probability = Proportion Standardize Normal Random Variable • How do we calculate the areas under a normal curve? – If we need a table for every combination of µ and σ, this would rapidly become unmanageable – We would like to be able to compute these probabilities using just one table Standard Normal Random Variable • The standard normal random variable is the specific normal random variable that has µ = 0 and σ = 1 • We can relate general normal random variables to the standard normal random variable using a so-called Zscore calculation – The solution is to use the standard normal random variable 7 Standard Normal Random Variable • If X is a general normal random variable with mean µ and standard deviation σ then Z= Example • The area to the left of 2100 for a normal curve with mean 2200 and standard deviation 200 X −µ σ is a standard normal random variable ( Z-score) • This equation connects general normal random variables with the standard normal random variable • We only need a standard normal table Example Continued • To compute the corresponding value of Z, we use the Zscore Z= X −µ σ = 2100 − 2200 1 =− 200 2 • Thus the value of X = 2100 corresponds to a value of Z = – 0.5 Symmary • Normal probability distributions can be used to model data that have bell shaped distributions • Normal probability distributions are specified by their means and standard deviations • Areas under the curve of general normal probability distributions can be related to areas under the curve of the standard normal probability distribution Objectives • Find the area under the standard normal curve The Standard Normal Distribution • Find Z-scores for a given area • Interpret the area under the standard normal curve as a probability 8 How to Compute Area under Standard Normal Curve • There are several ways to calculate the area under the standard normal curve – We can use a table (such as Table IV on the inside back cover) – We can use technology (a calculator or software) Compute Area under Standard Normal Curve • Three different area calculations – Find the area to the left of – Find the area to the right of – Find the area between • Two different methods shown here – From a table – Using TI Graphing Calculator (recommended method) • Using technology is preferred Finding Area under Standard Normal Curve using Z-table • • “Area to the left of" – using Z-table ( Standard Normal Table) Calculate the area to the left of Z = 1.68 – Break up 1.68 as 1.6 + .08 – Find the row 1.6 – Find the column .08 • The probability is 0.9535 Note: The table always covers the area to the left of the Z score. Finding Area under Standard Normal Curve using Z-table • “Area Between” • Between Z = – 0.51 and Z = 1.87 • This is not a one step calculation Finding Area under Standard Normal Curve using Z- Table • “Area to the right of" – using a Z- table • The area to the left of Z = 1.68 is 0.9535 from reading the table. • The right of … that’s the remaining amount • The two add up to 1, so the right of is 1 – 0.9535 = 0.0465 which is the solution. Finding Area under Standard Normal Curve using Z-table • The left hand picture … area to the left of 1.87 ( which is 0.9693) … includes too much • It is too much by the right hand picture … area to the left of -0.51(which is 0.3050) Included too much 9 Finding Area under Standard Normal Curve using Z-table • Area between Z = – 0.51 and Z = 1.87…. 0.9693 – 0.3050 = 0.6643 We want We start out with, but it’s too much Finding Area under Standard Normal Curve using Z- Table • The area between -0.51 and 1.87 The area to the left of 1.87, or 0.9693 … minus The area to the left of -0.51, or 0.3050 … which equals The difference of 0.6643 Area = 0.9693 • Thus the area under the standard normal curve between -0.51 and 1.87 is 0.6643 Area=0.3050 We correct by Finding Area under Standard Normal Curve using Z-table • A different way for “between” …. 1 – (0.3050+0.0307) = 0.6643 We want We delete the extra on the left We delete the extra on the right Area = 0.3050 Finding Area under Standard Normal Curve using Z-table • The area between -0.51 and 1.87 – The area to the left of -0.51, or 0.3050 … plus – The area to the right of 1.87, or 0.0307 … which equals – The total area to get rid of which equals 0.3357 • Thus the area under the standard normal curve between -0.51 and 1.87 is 1 – 0.3357 = 0.6643 Area = 0.0307 Finding Area under Standard Normal Curve using TI Graphing Calculator • • Area to the left of 1.68 – using TI graphing calculator The function is normalcdf( ). Following the key sequence below: 1. DISTR[2ND VARS] DISTR 2:normalcdf ENTER 2 Then, enter -E99,1.68,0,1) ENTER The probability is 0.9535 Note: 1. -E99 = -1099 which is a negative number near –infinity. We use it as the left bound to obtain “less than or equal to” some values, that is, x ≤ a . E symbol can be entered by pressing EE on the calculator, using the key sequence [2ND ,]. 2. normalcdf() (cdf means cumulative distribution function) sums up the probabilities. It differs from 1:normalpdf() on the calculator which calculate the normal densities. 3. There are four entries/parameters needed for the function normalcdf(). For instance, to find the probability of a normal variable between the interval from a to b, i.e. a ≤ x ≤ b. The 1st number entered for normalcdf() is the left bound of an interval a; the 2nd number is the right bound of the interval b; the 3rd number is the mean of the normal variable ( it is 0 for a standard normal variable). The 4th number is the standard deviation of the normal variable. ( which is 1 for a standard normal variable). Finding Area under Normal Curve using TI Graphing Calculator • • “Area to the right of" – using TI graphing calculator The area to the right of Z = 1.68 1. DISTR[2ND VARS] DISTR 2:normalcdf ENTER 2 Then, enter 1.68, E99, 0,1) ENTER The probability is 0.0465 Note: 1. E99 = 1099 which is a very large number near infinity. We use it as the right bound to obtain “greater than or equal to” some values, that is, x ≥ a . E symbol can be entered by pressing EE key on the calculator, using the key sequence [2ND ,]. 10 Finding Area under Normal Curve using TI Graphing Calculator • • “Area Between” – using TI graphing calculator Between Z = – 0.51 and Z = 1.87 1. DISTR[2ND VARS] DISTR 2:normalcdf ENTER 2 Then, enter -0.51, 1.87, 0,1) ENTER The probability is 0.6642 Finding Z score from Probability • We did the problem: Z-Score Area • Now we will do the reverse of that Area Z-Score • This is finding the Z-score (value) that corresponds to a specified area (percentile) • And … no surprise … we can do this with a table, with TI graphing calculator. Locate Z Score from Table • • “To the left of” – using a table Find the Z-score for which the area to the left of it is 0.32 – Look in the middle of the table … find 0.32 Locate Z Score from Table • • • • • "To the right of" – using a table Find the Z-score for which the area to the right of it is 0.4332 Right of it is .4332 … So, left of it would be .5668 Look in the middle of the table … find 0.5668. The nearest one is 0.5675. A value of .17 Read Read – The nearest to 0.32 is 0.3192 … a Z-Score of -0.47 Enter Note: The table always covers the area to the left of a z score. So, we need the area to the left. Locate Z Score from TI Graphing Calculator • • “To the left of” – using TI graphing Calculator Find the Z-score for which the area to the left of it is 0.32 1. DISTR[2nd VARS] 3:invNorm ( ENTER 2. Enter 0.32,0,1), hit ENTER Solution: The Z-Score is -0.47 • • Finding a Middle Range • We will often want to find a middle range of Z scores, from z 0 to z1 . For instance, find the middle 90% or the middle 95% or the middle 99%, of a standard normal distribution • The middle 90% would be Find the Z-score for which the area to the right of it is 0.4332 Right of it is .4332 … So, left of it would be .5668 1. DISTR[2nd VARS] 3:invNorm ( ENTER 2. Enter 0.5668,0,1), hit ENTER Solution: The Z-Score is 0.17 Note: invNorm( ) contain 3 parameters: the 1st is the area to the left of a Z score; the 2nd is the mean; the 3rd is the standard deviation. 11 How to find a Middle 90% Range • The two possible ways – The number for which 5% is to the left, or – The number for which 5% is to the right How To Find a Middle 90% Range • 90% in the middle is 10% outside the middle, i.e. 5% off each end • These problems can be solved in either of two equivalent ways • We could find – The number for which 5% is to the left, or – The number for which 5% is to the right 5% is to the left 5% is to the right • Use TI calculator: From invNorm(.05, 0, 1), we get a lower z score of -1.64. From invNorm(0.95, 0, 1), we get a upper z score of 1.64. So the middle range that covers the middle 90% of the values for a standard normal distribution is from -1.64 to 1.64. What is zα ? • The number zα denotes a Z-score such that the area to the right of zα is α (Greek letter alpha) • Some commonly used zα values are z.10 = 1.28, the area between -1.28 and 1.28 is 0.80 z.05 = 1.64, the area between -1.64 and 1.64 is 0.90 z.025 = 1.96, the area between -1.96 and 1.96 is 0.95 z.01 = 2.33, the area between -2.33 and 2.33 is 0.98 z.005 = 2.58, the area between -2.58 and 2.58 is 0.99 Area as the Probability • • The area under a normal curve can be interpreted as a probability The standard normal curve can be interpreted as a probability density function • We will use Z to represent a standard normal random variable, so it has probabilities such as P(a < Z < b) P(Z < a) P(Z > a) Note: Normal random variable is a continuous random variable. The probability for a continuous random variable being equal to a single value is zero as explained previously. So, The probability remains the same regardless if the inequalities are inclusive (include the endpoints) or exclusive (do not include the end points). That is, for instance, P(Z < a) = P(Z ≤ a) . Summary • Calculations for the standard normal curve can be done using tables or using technology • One can calculate the area under the standard normal curve, to the left of or to the right of each Z-score • One can calculate the Z-score so that the area to the left of it or to the right of it is a certain value • Areas and probabilities are two different representations of the same concept Applications of the Normal Distribution 12 General Normal Probability Distribution Learning Objectives 1. Find and interpret the area under a normal curve 2. Find the value of a normal random variable • So far, we have learned to find the area under a standard normal curve. Now, we want to calculate area and values for general normal probability distributions • We can relate these problems to calculations for the standard normal previously. Standardize a General Normal Variable • For a general normal random variable X with mean µ and standard deviation σ, the variable Z= Convert X to Z • Values of X Values of Z • If x is a value for X, then X −µ z= σ has a standard normal probability distribution x−µ σ is a value for Z • This is a very useful relationship • We can use this relationship to perform calculations for X from Z Example • For example, if a normal variable X has µ = 3 and σ = 2, then a value of x = 4 for X corresponds to z= 4−3 = 0.5 2 a value of z = 0.5 for Z Find P(X < x) from P(Z < z) • Because of this relationship Values of X Values of Z z= x−µ σ then P(X < x) = P(Z < z) • To find P(X < x) for a general normal random variable, we could calculate P(Z < z) for a corresponding standard normal random variable 13 Find P(X < x) from P(Z < z) • This relationship lets us compute all the different types of probabilities Find P(X < x) from P(Z < z) • A different way to illustrate this relationship X a µ b Z • Probabilities for X are directly related to probabilities for Z using the (X – µ) / σ relationship a–µ σ Find P(X < x) from P(Z < z) • With this relationship, the following method can be used to compute areas for a general normal random variable X – Shade the desired area to be computed for X – Convert all values of X to Z-scores using z= Example • For a general normal random variable X with µ = 3 and σ = 2 calculate P(X < 6) • This corresponds to z= x−µ σ – Solve the problem for the standard normal Z – The answer will be the same for the general normal X z= − 3 − ( −2) = −0.25 4 so P(X > –3) = P(Z > –0.25) = 0.5987 [ Use a Z-Table or TI calculator from normalcdf(-3, E99, 0, 1)] 6−3 = 1.5 2 so P(X < 6) = P(Z < 1.5) = 0.9332 [Use a Z-table or TI calculator from normalcdf(-E99,1.5, 0, 1)] Example • For a general normal random variable X with µ = –2 and σ = 4 calculate P(X > –3) • This corresponds to b–µ σ Example • For a general normal random variable X with µ = 6 and σ = 4 calculate P(4 < X < 11) • This corresponds to z= 4−6 = −0.5 4 z= 11 − 6 = 1.25 4 so P(4 < X < 11) = P(– 0.5 < Z < 1.25) = 0.5858 [ Use a Z-table or TI calculator from normalcdf(-0.5,1.25,0,1)] 14 Calculate P(X < x) Directly • Technology often has direct calculations for the general normal probability distribution • For instance, for a general normal random variable X with µ = 6 and σ = 4, calculate P(4 < X < 11). Use TI graphing calculator, we can obtain the answer directly from normalcdf(4, 11, 6, 4) without converting X to Z. Note: In general, to find the area under any normal curve between the interval from a to b, the sequence of parameters for the function normalcdf( ) is (a, b, mean, standard deviation). If it is a standard normal curve, you can just enter (a, b) instead of (a, b, 0,1), because Z is the default normal variable in TI calculator. Compute X values from probabilities • The following method can be used to compute values for a general normal random variable X – Shade the desired area to be computed for X – Find the Z-scores for the same probability problem – Convert all the Z-scores to X using X = µ + Zσ Compute X values from probabilities • The inverse of the relationship Z= X −µ σ is the relationship X = µ + Zσ • With this, we can compute value problems ( convert Z score to its original score) for the general normal probability distribution Example • For a general random variable X with µ = 3 and σ = 2, find the value x such that P(X < x) = 0.3 • Since P(Z < –0.5244) = 0.3 (Note: From a Z-table or calculator: invNorm(0.3,0,1) = -0.5244), we then convert Z to X: X = µ + Zσ x = 3 + (−0.5244) × 2 = 1.95 so P(X < 1.95) = P(Z < –0.5244) = 0.3 Example • For a general random variable X with µ = –2 and σ = 4 find the value x such that P(X > x) = 0.2 • Since P(Z > 0.8416) = 0.2, (Note: From a Z-table or calculator to obtain a z-score: invNorm(0.8, 0,1) = 0.8416), we then convert the Z score back to X using: X = µ + Zσ x = −2 + 0.8416 ⋅ 4 = 1.37 Example • We know that z.05 = 1.28, so P(–1.28 < Z < 1.28) = 0.90 • Thus for a general random variable X with µ = 6 and σ = 4, the middle 90% range is from -0.58 to 12.58. x1 = 6 − 1.28 ⋅ 4 = −0.58 x2 = 6 + 1.28 ⋅ 4 = 12.58 so P(X > 1.37) = P(Z > 0.8416) = 0.2 15 Compute X values directly • Technology often has direct calculations for the general normal probability distribution • For instance, For a general random variable X with µ = 3 and σ = 2, find the value x such that P(X < x) = 0.3. We can solve it with a TI graphing calculator: invNorm(0.3, 3, 2) which gives the answer 1.95. Note: In general, to find a x value corresponding a given area, say p, to the right of x under any normal curve, the sequence of parameters for the function invNorm( ) is (p, mean, standard deviation). If it is a standard normal curve, you can just enter (p) instead of (p, 0,1), because Z is a default normal variable in TI calculator. Summary • We can perform calculations for general normal probability distributions based on calculations for the standard normal probability distribution • For tables, and for interpretation, converting values to Z-scores can be used • For technology, often the parameters of the general normal probability distribution can be entered directly into a routine Summary • The normal distribution is – The most important bell shaped distribution – Will be used to model many random variables • The standard normal probability distribution – Has a mean of 0 and a standard deviation of 1 – Is the basis for normal distribution calculations • The general normal probability distribution – Has a general mean and general standard deviation – Can be used in general modeling situations 16