Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Chapter 6 Section 6.1 Discrete Random Variables The Game of Greed! • The point of this game is to gain as many points as possible. Points are earned by rolling two dice= the sum of the dice equals the points earned for the roll. To earn points, a player must be actively “in the game”. Once a player opts “out of the game” they maintain their current point total, but can not earn any more points until the next round. Players who are “in the game” continue earning points until they opt out or until the “greed point” is rolled, whichever occurs first. The “greed point” determines the end of a round. Any players “in the game” when the greed point is rolled lose all points earned for that round. 2 Random Variable • A variable whose value is a numerical outcome of a random phenomenon • Use the variables X or Y because the values will vary 3 Random Variables and Probability Distributions A probability model describes the possible outcomes of a chance process and the likelihood that those outcomes will occur. Consider tossing a fair coin 3 times. Define X = the number of heads obtained X = 0: TTT X = 1: HTT THT TTH X = 2: HHT HTH THH X = 3: HHH Value (X) 0 1 2 3 Probability 1/8 3/8 3/8 1/8 A random variable takes numerical values that describe the outcomes of some chance process. The probability distribution of a random variable gives its possible values and their probabilities. Discrete Random Variables There are two main types of random variables: discrete and continuous. If we can find a way to list all possible outcomes for a random variable and assign probabilities to each one, we have a discrete random variable. Discrete Random Variables And Their Probability Distributions A discrete random variable X takes a fixed set of possible values with gaps between. The probability distribution of a discrete random variable X lists the values xi and their probabilities pi: Value: Probability: x1 p1 x2 p2 x3 p3 … … The probabilities pi must satisfy two requirements: 1.Every probability pi is a number between 0 and 1. 2.The sum of the probabilities is 1. To find the probability of any event, add the probabilities pi of the particular values xi that make up the event. Example A University posts the grade distributions for its courses online. Students in stats received 21% A’s, 43% B’s, 30% C’s, 5% D’s, 1% F’s. Choose a student at random- the student’s grade on a four point scale (A = 4) is a random variable. The value of X changes when we repeatedly choose students at random but it’s always either a 0, 1, 2, 3, or 4. • Here is the distribution: Value of X Probability 0 .01 1 2 3 4 .05 .30 .43 .21 • The probability that the student got a B or better is the sum of the probabilities of an A and a B (they’re independent) P(X ≥ 3) = P(X = 3) + P(X = 4) = .43 + .21 = .64 Probability Histograms • Idealized pictures of the results over many trials – Outcomes – x axis – Probability – y axis How many languages? Imagine selecting a U.S. high school student at random. Define the random variable X = number of languages spoken by the randomly selected student. The table below gives the probability distribution of X, based on a sample of students from the U.S. Census at School database. Languages: Probability: 1 0.630 2 0.295 3 0.065 4 0.008 5 0.002 Problem: (a) Show that the probability distribution for X is legitimate. (b) Make a histogram of the probability distribution. Describe what you see. (c) What is the probability that a randomly selected student speaks at least 3 languages? (a) All the probabilities are between 0 and 1 and they add to 1, so this is a legitimate probability distribution. 0.7 0.6 Probability 0.5 0.4 0.3 0.2 0.1 0.0 1 2 3 Languages spoken 4 5 (b) Shape: The histogram is strongly skewed to the right. Center: The median is 1, but the mean will be slightly higher due to the skewness. Spread: The number of languages varies from 1 to 5, but nearly all of the students speak just one or two languages. (c) P(X>3) = P(X = 3) + P(X = 4) + P(X = 5) = 0.065 + 0.008 + 0.002 = 0.075. There is a 0.075 probability of randomly selecting a student who speaks three or more languages. Winning (and Losing) at Roulette Let’s define the random variable X = net gain from a single $1 bet on red. The possible values of X are -$1 and $1. What are the corresponding probabilities? Create a probability distribution: Value: –$1 $1 Probability: 20/38 18/38 What is the player’s average gain (aka expected value)? 𝜇𝑋 = −1 20 + 1 38 𝜇𝑋 = −$0.05 18 38 Mean (Expected Value) of a Discrete Random Variable When analyzing discrete random variables, we follow the same strategy we used with quantitative data – describe the shape, center, and spread, and identify any outliers. The mean of any discrete random variable is an average of the possible outcomes, with each outcome weighted by its probability. Suppose that X is a discrete random variable whose probability distribution is Value: Probability: x1 p1 x2 p2 x3 p3 … … To find the mean (expected value) of X, multiply each possible value by its probability, then add all the products: x E ( X ) x1 p1 x2 p2 x3 p3 ... xi pi How many languages? In an earlier Alternate Example, we defined the random variable X to be the number of languages spoken by a randomly selected U.S. high school student. The table below gives the probability distribution of X: Languages: Probability: 1 0.630 2 0.295 3 0.065 4 0.008 5 0.002 Compute the mean of the random variable X and interpret this value in context. X E( X ) (1)(0.630) (2)(0.295) (3)(0.065) (4)(0.008) (5)(0.002) = 1.457. If we were to randomly select many, many U.S. high school students at random, the average number of languages spoken would be about 1.457. Warm up 1. A large auto dealership keeps track of sales made during each hour of the day. Let X = the number of cars sold during the first hour of business on a randomly selected Friday. Based on previous records, the probability distribution of X is as follows: Cars sold Probability 0 1 2 3 0.3 0.4 0.2 0.1 Compute and interpret the mean of X. 𝜇𝑋 = 0 0.3 + 1 0.4 + 2 0.2 + 3 0.1 = 1.1 If many, many Fridays are randomly selected, the average number of cars sold will be about 1.1. The variance of a random variable • The mean is a measure of the center of a distribution • The variance and the standard deviation are the measures of spread that accompany mean to measure center s2 is the symbol for the variance of a data set 2 X is the notation for the variance of a random variable X Standard Deviation of a Discrete Random Variable Since we use the mean as the measure of center for a discrete random variable, we use the standard deviation as our measure of spread. The definition of the variance of a random variable is similar to the definition of the variance for a set of quantitative data. Suppose that X is a discrete random variable whose probability distribution is Value: x1 x2 x3 … Probability: p1 p2 p3 … and that µX is the mean of X. The variance of X is Var ( X ) X2 ( x1 X ) 2 p1 ( x2 X ) 2 p2 ( x3 X ) 2 p3 ... ( xi X ) 2 pi To get the standard deviation of a random variable, take the square root of the variance. Using the warm up problem, compute and interpret the standard deviation of X. 𝜎2𝑋 = 0 − 1.1 3 − 1.1 2 2 0.3 + 1 − 1.1 0.1 2 0.4 + 2 − 1.1 2 0.2 + 𝜎𝑋 = 0.89 = 0.943 The number of cars sold on a randomly selected Friday will typically vary from the mean (1.1) by about 0.943 cars. How many languages Yesterday, we calculated the mean number of languages spoken for a randomly selected U.S. high school student to be = 1.457. The table below gives the probability distribution for X one more time. Languages: Probability: 1 0.630 2 0.295 3 0.065 4 0.008 5 0.002 Compute and interpret the standard deviation of the random variable X. X2 (1 1.457)2 (0.630) (2 1.457)2 (0.295) X 0.450 0.671 The number of languages spoken by a randomly selected U.S. high school student typically varies by about 0.671 languages from the mean (1.457). Continuous Random Variables When we choose a digit between 0 and 9, each number has .1 probability. What if we wanted to allow any number between 0 and 1 as our outcome? There are infinite possible Outcomes! How can we Assign probabilities? X in this case is a continuous Variable b/c its values are not Isolated numbers but an entire interval of numbers Continuous Random Variables A continuous random variable X takes on all values in an interval of numbers. The probability distribution of X is described by a density curve. The probability of any event is the area under the density curve and above the values of X that make up the event. The probability model of a discrete random variable X assigns a probability between 0 and 1 to each possible value of X. A continuous random variable Y has infinitely many possible values. All continuous probability models assign probability 0 to every individual outcome. Only intervals of values have positive probability. Normal Distributions The density curve we are most familiar with is a Normal Curve. –Remember N(μ, σ) is our notation for a normal distribution. –Z score is a standard Normal random variable having the distribution N (0, 1) 𝑥−𝜇 𝑍= 𝜎 Example: Normal probability distributions The heights of young women closely follow the Normal distribution with mean µ = 64 inches and standard deviation σ = 2.7 inches. Now choose one young woman at random. Call her height Y. If we repeat the random choice very many times, the distribution of values of Y is the same Normal distribution that describes the heights of all young women. Problem: What’s the probability that the chosen woman is between 68 and 70 inches tall? Example: Normal probability distributions Step 1: State the distribution and the values of interest. The height Y of a randomly chosen young woman has the N(64, 2.7) distribution. We want to find P(68 ≤ Y ≤ 70). Step 2: Perform calculations—show your work! The standardized scores for the two boundary values are Opinion poll An SRS of 1500 adults consider what is the most serious problem facing schools? Suppose we say 30% consider drugs. N (0.3, 0.0118) What is the probability that the poll results differs from the truth about the population by more than two percentage points? 24 By the addition rule for disjoint events, the desired probability is: 𝑃 𝑝 < 0.28 𝑜𝑟 𝑝 > 0.32 = 𝑃(𝑝 < 0.28) + P(𝑝 > 0.32) = 0.0455 + 0.0455 = 0.0910 Example: cheating in School True population probability that a student will report on another cheating student is 12%. Based on a survey of 400 random students, done many times, we expect the average probability to get close to this .12 value. (with a standard deviation of .016). What is the probability that if I conduct a survey my result will differ from the true probability by 2%? • • • • If the result is less than .10 or greater than .14 P(From table A or calculator, P(-1.25 ≤ Z ≤ 1.25) = .8944 - .1056 = .7888 So probability we seek is 1 - .7888 = .2112 which is 21% Weights of three-year-old females The weights of three-year-old females closely follow a Normal distribution with a mean of = 30.7 pounds and a standard deviation of 3.6 pounds. Randomly choose one three-year-old female and call her weight X. What is the probability that a randomly selected three-year-old female weighs at least 30 pounds? The weight X of a randomly chosen three-year-old female has the N(30.7, 3.6) distribution. We want to find P(X 30) as shown in the figure below. Perform calculations–show your work! The standardized score for the boundary value is = −0.19. From Table A, P(Z < −0.19) = 0.4247, so P(Z −0.19) = 1 − 0.4247 = 0.5753. Using technology: The command normalcdf(lower:30, upper: 1000, : 30.7, :3.6) gives an area of 0.5771. There is about a 58% chance that the randomly selected threeyear-old female will weigh at least 30 pounds.