Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Chapter 6 – Random Variables and the Normal Distribution Random Variable o A random variable is a variable whose values are determined by chance. Discrete and Continuous Random Variables o A discrete random variable can take either a finite or a countable number of values. Since these values may be written as a list of numbers, each value can be graphed as a separate point on a number line, with space between each point. o A continuous random variable can take infinitely many values. Because there are infinitely many values, the values of a continuous random variable form an interval on the number line. Probability Distribution of A Discrete Random Variable o A probability distribution of a discrete random variable provides all the possible values that the random variable can assume, together with the probability associated with each value. The probability distribution can take the form of a table, graph, or formula. Probability distributions describe populations, not samples. Requirements for the probability Distribution of a Discrete Random Variable o The sum of the probabilities of all the possible value of a discrete random variable must equal 1. That is, ∑ ( ) . o The probability of each value of X must be between 0 and 1, inclusive. That is, 0 ≤ P(X) ≤ 1. 2 Example 6.1 Kristin’s probability estimates for five investment outcomes for a period of 12 months (initial investment $2000) Scenario Cape Ann Biotech does very well. Cape Ann Biotech does fairly well. Cape Ann Biotech treads water. Cape Ann Biotech does not do very well. Cape Ann Biotech folds. Financial gain Gain $1000 Gain $500 Gain $0 Lose $200 (Gain -$200) Lose $2000 (Gain -$2000) Kristin’s estimated probability 0.15 0.30 0.25 0.20 0.10 Once we define our random variable, this table can be converted into a probability distribution table. Since Kristin is interested in what happens to her money, we define our random variable X to be X = financial gain (the financial gain associated with the five possible scenarios) a. Why is the variable X = Financial gain a random variable? X = financial gain is a random variable because we do not know, before the investment is made, the value that variable will take. b. What are the possible values that X can take? X = {-$2000, -$200, $0, $500, $1000} c. Use random variable notation to express the probabilities associated with each possible outcome of X. P(X = -2000) = 0.10, P(X = -200) = 0.20, P(X = 0) = 0.25, P(X = 500) = 0.30, P(X = 1000) = 0.15 d. Construct the probability distribution of X = financial gain. Probability distribution of Kristin’s financial gain 0 500 1000 X = Financial gain in dollars -2000 -200 0.10 0.20 0.25 0.30 0.15 P(X) e. Find the probability that Kristin will make a profit on her investment. P(gain $1000) + P(gain $500) = P(X = 1000) + P(X = 500) = 0.15 + 0.30 = 0.45 f. Find the probability that Kristin will take a loss on her investment. P(X = -2000) + P(X = -200) = 0.10 + 0.20 = 0.30 3 Finding the mean of a Discrete Random Variable o The mean μ of a discrete random variable is found as follows: Multiply each possible value of X by its probability. Add the resulting products. ( ) The procedure is denoted as μ ∑ Formulas for the Variance and Standard Deviation of a Discrete Random Variable o Definition Formulas ∑( ) ( ) √∑ ( ) ( ) o Computational Formulas ∑( ( )) √∑ ( ( )) Example 6.2 Probability distribution of Kristin’s financial gain -2000 -200 0 500 0.10 0.20 0.25 0.30 X = Financial gain in dollars P(X) 1000 0.15 a. Calculate the expected value for Kristin’s financial gain from her investment. μ = (-2000)(0.10) + (-200)(0.20)+(0)(0.25) + (500)(0.30) + (1000)(0.15) = 60 b. Find variance and standard deviation of Kristin’s financial gain, using the definition and computational formula X -2000 -200 0 500 1000 (𝑿 𝝁)𝟐 (𝑿 𝝁)𝟐 𝑷(𝑿) 4,243,600 424,360 67,600 13,520 3,600 900 193,600 58,080 883,600 132,540 𝝈𝟐 ∑ (𝑿 𝝁)𝟐 𝑷(𝑿) = 629,400 𝑿 𝝁 -2060 -260 -60 440 940 P(X) 0.10 0.20 0.25 0.30 0.15 𝝈 X P(X) -2000 -200 0 500 1000 0.10 0.20 0.25 0.30 0.15 𝝈𝟐 𝝈 𝝈𝟐 𝟕𝟗𝟑. 𝟑𝟒𝟕𝟑𝟑𝟖𝟖 ≈ $𝟕𝟗𝟑. 𝟑𝟓 𝑿𝟐 𝑿𝟐 𝑷(𝑿) 4,000,000 400,000 40,000 8,000 0 0 250,000 75,000 1,000,000 150,000 𝟐 ∑ 𝑿 𝑷(𝑿) = 633,000 ∑ 𝑿𝟐 𝑷(𝑿) 𝝁𝟐 = 629,400 𝝈𝟐 𝟕𝟗𝟑. 𝟑𝟒𝟕𝟑𝟑𝟖𝟖 ≈ 𝟕𝟗𝟑. 𝟑𝟓 4 The Binomial Probability Distribution Formula o The probability of observing exactly X successes in n trials of a binomial experiment is ( ) ( ) ( ) Example 6.3 Suppose the Joshua is about to take four-question multiple choice statistics quiz. Josh did not study for the quiz, so he will have to take random guesses on each of the four questions. Each question has five possible alternatives, only one of which is correct. There are four questions on the quiz, so the number of trials is n = 4. Next we know that p = 1/5, since there are five choices and Joshua has a 1 in 5 chance of being correct if he choose randomly. Thus, p = probability of success = 1/5 = 0.2 Four of the five possible alternatives are incorrect. So, (1 – p) = probability of failure = 4/5 = 0.8 a. What is the probability that Joshua will ace the quiz by answering all the questions correctly? To find the probability of correctly guessing the right answer on all four question, Joshua is interested in observing X = 4 successes. Using the binomial formula, we obtain 𝑷(𝑿 𝟒) (4 4)(𝟎. 𝟐𝟒 )(𝟏 𝟎. 𝟐)𝟒 𝟒 (𝟏)(𝟎. 𝟎𝟎𝟏𝟔)(𝟏) 𝟎. 𝟎𝟎𝟏𝟔 So Joshua’s chance of acing this quiz by making random guesses is very small, less than one-fifth of 1%. b. What is the probability that Joshua will pass the quiz by answering at least three questions correctly? To answer at least three questions correctly, Joshua must answer either X = 3 or X = 4 questions correctly. Since these events are mutually exclusive, we find the required probability by using the Addition Rule for Mutually Exclusive Events, 𝑷(𝑿 ≥ 𝟑) 𝑷(𝑿 𝟑) + 𝑷(𝑿 𝟒) We already found P(X = 4) = 0.0016 in (a). Now we find 𝑷(𝒙 𝟑) (4 3)(𝟎. 𝟐𝟑 )(𝟏 𝟎. 𝟐)𝟒 𝟑 (𝟒)(𝟎. 𝟎𝟎𝟖)(𝟎. 𝟖) 𝟎. 𝟎𝟐𝟓𝟔 Therefore, the probability that Joshua will pass this quiz by random guessing is 0.0016 + 0.0256 = 0.0272 Since he has less than a 3% chance of even passing this quiz, we would tell Joshua that perhaps random guessing isn’t the best strategy for stats quizzes. 5 Example 6.4 Suppose the Joshua is about to take four-question multiple choice statistics quiz. Josh did not study for the quiz, so he will have to take random guesses on each of the four questions. Each question has five possible alternatives, only one of which is correct. Use the binomial table to find the following probabilities a. What is the probability that Joshua will ace the quiz by answering all the questions correctly? Look under the n column until you find n = 4. That is the portion of the table you will us. Then go across the top of the table until you get to p = 0.20. That gives you your column. We are interested in finding the probability of observing X = 4, where X is the number of successes. So go down the column until you see 4 under the X column on the left (and in the subgroup with n = 4). The number in the p column is 0.0016. Mean, Variance, and Standard Deviation of a Binomial Random Variable X o Mean (or expected Value): μ = n * p o Variance: ( ) ( o Standard deviation: ) Example 6.5 Suppose we know that the population proportion p of left-handed students is 0.10 a. In a sample of 200 students, how many would we expect to be left-handed? E(X) = μ = n * p = (200)(0.10) = 20 b. Would 40 left-handed students out of 200 be considered unusual? 𝝈 𝒏 𝒑 (𝟏 𝒑) (𝟐𝟎𝟎)(𝟎. 𝟏)(𝟏 𝟎. 𝟏) 𝟏𝟖 ≈ 𝟒. 𝟐𝟒𝟐𝟔 How many standard deviations does 40 lie above the mean of 20? 𝑿 𝝁 𝝈 𝟒𝟎 𝟐𝟎 ≈ 𝟒. 𝟕𝟏𝟒 𝟒. 𝟐𝟒𝟐𝟔 Finding 40 lefties in a sample of 200 is unusual because this value lies 4.7 standard deviation above the men 6 Continuous Probability Distribution o A continuous probability distribution is a graph that indicates on the horizontal axis the range of value that the continuous random variable X can take, and above which is drawn a curve, called the density curve. A continuous probability distribution must follow the Requirements for the Probability Distribution of a Continuous Random Variable. o Requirements for the Probability Distribution of a Continuous Random Variable The total area under the density curve must equal 1 (this is the Law of Total Probability for Continuous Random Variables). The vertical height of the density curve can never be negative. That is, the density curve never goes below the horizontal axis. Properties of the Normal Density Curve (Normal Curve) o It is symmetric about the mean μ. o The highest point occurs at X = μ, because symmetry implies that the mean equals the median, which equals the mode of the distribution. o It has inflection points at μ – σ and μ + σ. o The total area under the curve equals 1. o Symmetry also implies that the area under the curve to the left of μ and the area under the curve to the right of μ are both equal to 0.5. o The normal distribution is defined for values of X extending indefinitely in both the positive and negative directions. As X moves farther from the mean, the density curve approaches but never quite touches the horizontal axis. 7 Example 6.6 Q1. Many educators are concerned about grade inflation. One study shows that one low SAT-score high school (with mean combined SAT score = 750) had higher mean grade point average (mean GPA = 3.6) than a high-SAT-score school (with mean combined SAT score = 1050 and mean GPA = 2.6). Define the following random variable: X = GPA at the high-SAT-score school Assume that X is normally distributed with mean μ = 2.6 and standard deviation σ = 0.46. a. What is the probability that a randomly chosen GPA at the high-SAT-score school will be between 3.06 and 3.52? The area under the curve between 3.06 and 3.52 represents the area between μ + σ and μ + 2σ. Courtesy of the Empirical Rule, the area between μ + σ and μ + 2σ is about 13.5% of the area under the curve. Therefore, the probability that a randomly chosen GPA at the high-SAT-score school will be between 3.06 and 3.52 is about 0.135 b. Find the probability that a randomly chosen GPA at the high-SAT-score school will be greater than 3.52. To find the area to the right of X = 3.52, we need to subtract the 34% and 13.5 from 50%: 50% – 34% – 13.5% = 2.5% Therefore, the probability that a randomly chosen GPA at the high-SAT-Score school will be greater than 3.52 is about 0.025. The Standard Normal (Z) Distribution o The standard normal distribution is a normal distribution with Mean μ = 0 and Standard deviation σ = 1. 8 Case 1 Find the area to the left of Z1 Case2 Case 3 Find the area to the right of Z1 Find the area to the between of Z1 and Z2 Step 1 Draw the standard normal curve. Label the Z-value Z1 Step 1 Draw the standard normal curve. Label the Z-value Z1 Step 1 Draw the standard normal curve. Label the Z-value Z1 and Z2 Step 2 Shade in the area to the left of Z1 Step 2 Shade in the area to the right of Z1 Step 2 Shade in the area between Z1 and Z2 Step 3 Use the Z table to find the area to the left of Z1 Step 3 Use the Z table to find the area to the left of Z1. The area to the right of Z1 is then equal to 1 — (area to the left of Z1) Step 3 Use the Z table to find the area to the left of Z1 and the area to the left of Z2. The area between Z1 and Z2 is then equal to (area to the left of Z2) – (area to the left of Z1) Standardizing a Normal Random Variable o Any normal random variable X can be transformed into the standard normal random variable Z by standardizing using the formula 9 Example 6.7 Q1. The state of Georgia reports that the average temperature statewide for the month of April from 1949 to 2006 was μ = 61.5oF. Assume that the standard deviation is σ = 8 oF and that temperature in Georgia in April is normally distributed. Draw the normal curve for temperatures between 45.5 oF and 77.5 oF, and the corresponding Z curve. Find the probability that the temperature is between 45.5 oF and 77.5 oF in April in Georgia. A1. Here we have a = 45.5 and b = 77.5, giving us Za = 𝒂 𝝁 𝟒𝟓.𝟓 𝟔𝟏.𝟓 𝝈 𝟖 𝟐 and Zb = 𝒃 𝝁 𝟕𝟕.𝟓 𝟔𝟏.𝟓 𝝈 𝟖 𝟐 The area between 45.5 oF and 77.5 oF is the same as between Z = -2 and Z = 2. 45.5 77.5 X = Temp. -2 2 P(45.5 < X < 77.5) = P(-2 < Z < 2) = 0.9772 – 0.0228 = 0.9544. The probability that temperature is between 45.5 oF and 77.5 oF in April in Georgia is 0.9544. Example 6.8 Q1. Edmunds.com reported that the average amount that people were paying for a 2007 Toyota Camry XLE was $23,400. Let X = price, and assume that price follows a normal distribution with μ = $23,400 and σ = $1000. Find the prices that separate the middle 95% of 2007 Toyota Camry XLE prices from the bottom 2.5% and the top 2.5%. A1. X1 = Z1 σ + μ = (-1.96)(1000) + 23,400 = 21,440 Area = 0.95 Area = 0.025 Area = 0.025 X2 = Z2 σ + μ = (1.96)(1000) + 23,400 = 25,360 X1 $23,400 X2 The prices that separate the middle 95% of 2007 Toyota Camry XLE prices from the bottom 2.5% of prices and the top 2.5% of prices are $21,440 and $25,360.