Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics Chapter 7 Random Variables Time: 2 + weeks Introduction: Random Variables The Game of Craps The game of craps is one of the most famous of all gambling games played with dice. In this game, the player rolls a pair of dice, and the sum of the numbers that turn up on the two faces is noted. If the sum is 7 or 11, then the player wins immediately. If the sum is 2, 3, or 12, then the player loses immediately. If any other sum is obtained, then the player continues to throw the dice until he either wins by repeating the first sum he obtained or loses by rolling a 7. Your mission in this activity is to estimate the probability of a player winning at craps. But first, let’s get a feel for the game. For this activity, your class will be divided into groups of two…a pair of dice for each group. 1. In your group, play a total of 20 games or craps. One person will roll the dice; the other will keep track of the sums and record the end result (win or lose). Switch jobs after 10 games, if you like. How many times out of 20 does the player win? What is the relative frequency (percentage, written as a decimal) of wins? Sum 2 3 4 5 6 7 8 9 10 11 12 Freq wins Win 1st roll 7 Loss 1st roll loss 11 2 3 12 2. Combine your results with the other groups in the class. What is the relative frequency of wins for the entire class? Sum 2 3 4 5 6 7 8 9 10 11 12 Freq wins Win 1st roll 7 Loss 1st roll loss 11 2 3 12 3. Use simulation techniques to represent 25 games of craps, using either the table of random numbers or the calculator. What is the relative frequency of wins based on the 25 simulations? How does this number compare to the relative frequency you found in step 2? Sum 2 3 4 5 6 7 8 9 10 11 12 Freq wins Win 1st roll 7 11 Loss 1st roll loss 2 3 12 4. One of the ways you can win at craps is to roll a sum of 7 or 11 on your first roll. Using your results and those of your fellow students, determine the number of times a player won by rolling a sum of 7 on the first roll? What is the relative frequency of rolling a sum of 7? Repeat these calculations for a sum of 11. Which of these sums appears more likely to occur than the other based on class results? 5. One of the ways you can lose at craps is to roll a sum of 2, 3, or 12 on your first roll. Using your results and those of your fellow students, determine the number of times a player lost by rolling a sum of 2 on the first roll. What is the relative frequency of rolling a sum of 2? Repeat these calculations for a sum of 3 and a sum of 12. Which of these sums appears more likely to occur than the others, based on the class results? 6. Clearly, the key quantity of interest in craps is the sum of the numbers on the two dice. Let’s try to get a better idea of how this sum behaves in general by conducting a simulation. First, determine how you would simulate the roll of a single fair die. Then determine how you would simulate a roll of two fair dice and determine the relative frequency of each of the possible sums. 7. Construct a relative frequency histogram of the relative frequency results in step 6. What is the approximate shape of the distribution? What sum appears most likely to occur? Which appears least likely to occur? 8. From the relative frequency data in step 6, compute the relative frequency of winning and the relative frequency of losing on your first roll in craps. How do these simulated results compare with what the class obtained? 2 A Random Variable is a variable whose value is a numerical outcome of a . Consider flipping a coin twice: Possible outcomes are . Random variable could be # of heads (X= # of heads) What are possible values for this random variable? Suppose survey is of hair styles. Random variables could be X=hair length, or Y= # of people with dyed hair Discrete Random Variable A random variable, X, is discrete if its set of possible values is a collection of on the number line. The probability distribution of X can be given in a table, probability histogram, or formula and is such that: 0 < P(X) < 1 Σ P(X) = 1 Suppose I was a restaurant manager and wanted to collect data about my restaurant. Name 2 discrete random variables I could keep track of. The probabilities in a distribution give information about the long-run behavior of the variable. That is, probabilities are predicted relative frequencies. For example…. If P(X=2)=.30, then after many observations of X, the value X=2 will occur about 30% of the time, on average. Tossing Coins What is the probability distribution of the discrete random variable X that counts the number of heads in four tosses of a coin? We can derive this distribution if we assume the coin is balanced and the coin tosses are completely independent. X=0 X=1 P(X=0) = P(X=3) Draw a histogram. 3 X=2 P(X=1) = P(X=4) = X=3 P(X=2) = X=4 Check that the Σ P(X) = 1, and 0 < P(X) < 1. Can we assume this is a discrete random variable? Continuous Random Variable A continuous random variable uses all of the values in an interval. The probability distribution for a continuous random variable X is denoted by f(X) and the graph is a smooth curve called a density curve. f(X)>0 Total area under density curve = 1 Probability the X falls in interval is equal to area under density curve in that interval. Probabilities are assigned to rather than individual outcomes. In fact, the probability is for every individual outcome. Only intervals of values have positive probability. Microwave Popcorn Let x = amount of time that it takes a bag of microwave popcorn to start popping and suppose that the probability density function is given to be: Graph f(x) and verify that the total area = 1. Describe the shape of this distribution. f (x) = What are the possible values of x? 0.25 1 < x < 5 0 otherwise How many possible values of x are there? Find the probability that it takes less than 2 minutes. Find the probability that it takes less than 4.5 minutes Find the probability that it takes between 2 and 4.5 minutes. Note: P(a<x<b) = P(x<b)-P(x<a) Find the probability that it takes exactly 2 minutes P(x=2) = 0 since x = 2 is only one out of an infinite number of possibilities P(x=2) = 0 since the line above x=2 has an area of 0 (no width) Note: for any continuous random variable x, P(x=a)=0. Note: P(x<a) = P(x<a) 4 Isosceles Triangle Suppose a continuous random variable x takes on values from 10 to 30 and that its density function is an isosceles triangle. Graph the density function and find its height Find the probability that x is between 12 and 18. Normal Distributions as Probability Distributions Density curves are most familiar to us as normal curves. Remember, we show a normal distribution with a symmetric bell-shaped curve and the notation N(,σ) And we standardize variables with the formula: z = x- σ For example: If x = length of a newborn baby and x N(19,2), sketch the distribution and shade the area that corresponds to P(x>21). For example: If zN(0,1), find the probability that z is less than one standard deviation below the mean: P(z<-1) = .1587 Find the probability that z is: Less than two standard deviations above the mean: P(z<2)= More than 2 standard deviations above the mean: P(z>2) = Less than 1.3 standard deviations above the mean: P(z<1.3) = Within 2 standard deviations of the mean: P(-2<z<2) = P(z<2)-P(z<-2) = 5 Means and Variances of Random Variables Remember: the mean, , of a set of observations is the ordinary average The mean of a random variable, X, is also an average of possible values of X, but we must take into account that not all outcomes are equally likely. The Tri-State Pick 3 Here is a simple lottery wager, from the Tri-State Pick 3 game than New Hampshire, Maine, and Vermont share. You choose a 3-digit number; the state chooses a 3-digit winning number at random and pays you $500 if your number is chosen. Because there are 1000 3digit numbers, you have probability 1/1000 of winning. Taking X to be the amount your ticket pays you, the probability distribution is Payoff X: $0 $500 Probability: .999 0.001 What is your average payoff from many tickets? The ordinary average of the 2 possible outcomes $0 and $500 is $250, but that makes no sense as the average because $500 is much less likely than $0. In the long run, you receive $500 once in every 1000 tickets and $0 in the remaining 999 of 1000 tickets. The long-run average payoff is: That number is the mean of the random variable X. Mean of a Random Variable We use the notation, σx to denote the mean of X, a random variable. The mean of a random variable X is also sometimes called the “expected value” even though we don’t necessarily expect one observation on X to be close to its expected value. So here’s the full definition: Suppose that X is a discrete random variable whose distribution is value of X: x1 x2 x3….. xk Probability: p1 p2 p3…. pk To find the mean of X, multiply each possible value by its probability, Then add all the products: σx = x1 p1 + x2 p2 + …+ xk pk = xipi 6 Variance of a Random Variable Remember that the measure of that accompanies the mean is the variance and standard deviation. The symbol we use for the measure of variance of a random variable X is: σ2x Or σ2x = (xi - σi)2 pi The standard deviation is the of the variance. What is the Law of Large Numbers? Try this: Write down a sequence of heads and tails that you think imitates 10 tosses of a balanced coin. How long was the longest run of consecutive heads or consecutive tails in your tosses? What is the probability that you will get a run of 3 heads or 3 tails? For the Law of Large Numbers, just how large is large? 7 Rules for Means: Rule #1: if X is a random variable and a and b are fixed numbers, then σa+bX = a + b σX Rule # 2: If X and Y are random variables, then σX+Y = σX + σY Linear Functions & Linear Combinations: If x = the length of a taxi ride in miles, suppose that x = 5.2 miles and σx = 2.8 miles. Two taxi companies have different fare-schedules: company A charges $2.50 per mile and company B charges $2 per mile plus an initial fee of $5. Thus, if we are interested in the total fare of a taxi ride (y), the functions are: yA = 2.50x and yB = 5 + 2x How can we find the mean and standard deviation for y in each case? Gain Communications The number X of communications units sold by the Gain Communications military division has distribution X = units sold: 1000 3000 5000 10,000 Probability: 0.1 0.3 0.4 0.2 The corresponding sales estimates for the civilian division are Y = units sold: 300 500 750 Probability: 0.4 0.5 0.1 Calculate σX and σY Gain makes a profit of $2000 on each military unit sold and $3500 on each civilian unit. Next year’s profit from military sales will be 2000X and civilian profit is 3500Y. By rule 1: By rule 2: 8 Rules for Variance Rule 1: If X is a random variable and a and b are fixed numbers, σ2a+bX = b2σ2X Rule 2: If X and Y are independent random variables, then σ2X+Y = σ2X + σ2Y and σ2X-Y = σ2X + σ2Y Rule 3: If X and Y have correlation p, then σ2X+Y = σ2X + σ2Y + 2pσXσY and 2 2 2 σ X-Y = σ X + σ Y - 2pσXσY Combining Normal Random Variables Any linear combination of independent normal random variables is also . If X and Y are independent normal random variables and a and b are any fixed numbers, aX + bY is also normally distributed. Golf Buddies Tom and George are playing in the club golf tournament. Their scores vary as they play the course repeatedly. Tom’s score X has the N(110,10) distribution, and George’s score Y varies from round to round according to the N(100,8) distribution. If they play independently, what is the probability that Tom will score lower than George and thus do better in the tournament? Chapter 7 problems: # 1, 2, 4, 5, 7, 8, 12, 16, 23, 29, 32, 34, 37, 41, 42, 44 9