Survey

Transcript

AP Statistics AP Review Random Variables Vocabulary Random variable Probability distribution Discrete random variable Mean (expected value) of a discrete random variable Variance of a discrete random variable Standard deviation of a discrete random variable Continuous random variable Linear transformation Effect on a random variable of multiplying/dividing by a constant Effect on a random variable of adding/subtracting by a constant Mean of the sum of random variables Independent random variables Variance of the sum of independent random variables Mean of the difference of random variables Variance of the difference of independent random variables Binomial setting Binomial random variable Binomial coefficient Binomial probability Mean of a binomial random variable Standard deviation of a binomial random variable Normal approximation for binomial distributions Geometric setting Geometric random variable Geometric probability Mean of a geometric random variable Summary A random variable takes numerical values determined by the outcome of a chance process. The probability distribution of a random variable X tells us what the possible values of X are and how probabilities are assigned to those values. There are two types of random variables: discrete and continuous. A discrete random variable has fixed set of possible values with gaps between them. The probability distribution assigns each of these values a probability between 0 and 1 such that the sum of all the probabilities is exactly 1. The probability of any event is the sum of the probabilities of all the values that make up the event. A continuous random variable takes all values in some interval of numbers. A density curve describes the probability distribution of a continuous random variable. The probability of any event is the area under the curve above the values that make up the event. The mean of a random variable X is the balance point of the probability distribution histogram or density curve. Because the mean is the long-run average value of the variable after many repetitions of the chance process, it is also known as the expected value of the random variable, If X is a discrete random variable, the mean is the average of the values of X, each weighted by its probability: The variance of a random variable X2 is the “average” squared deviation of the values of the variable from their mean. The standard deviation X is the square root of the variance. The standard deviation measures the typical distance of the values in the distribution from the mean. For a discrete random variable X, the variance is X2 xi x pi and the standard deviation is 2 X x 2 i x pi Adding a positive constant a to (subtracting a from) a random variable increases (decreases) the mean of the random variable by a but does not affect the standard deviation or the shape of its probability distribution. Multiplying (dividing) a random variable by a positive constant b multiplies (divides) the mean of the random variable by b and the standard deviation by b but does not change the shape of the probability distribution. A linear transformation of a random variable involves adding or subtracting a constant a, multiplying or dividing by a constant b, or both. We can write a linear transformation of the random variable X in the form Y = a + bX. The shape, center, and spread of the probability distribution of Y are as follows: Shape: same as the probability distribution of X if b > 0. Center: Y a b X Spread: Y b X If X and Y are any two random variables, The mean of the sum of two random variables is the sum of their means X Y X Y The mean of the difference of two random variables is the differences of their means X Y X Y If X and Y are two independent random variables, then knowing the value of one variable tells you nothing about the value of the other. In that case, variances add: The variance of the sum of two independent variables is the sum of their variances. X2 Y X2 Y2 . The variance of the difference of two independent variables is the sum of their variances. X2 Y X2 Y2 . The sum or difference of independent Normal random variables follows a Normal distribution. A binomial setting consists of n independent trials of the same chance process, each resulting in a success or a failure, with probability of success p on each trial. (BINS) The count X of successes is a binomial random variable. Its probability distribution is a binomial distribution. n n! The binomial coefficient counts the number of ways k successes can be arranged among n k k ! n k ! trials. The factorial of n is n! n n 1 n 2... 321 for positive whole numbers n, and 0! = 1. If X has the binomial distribution with parameters n and p, the possible values of X are the whole numbers n nk 0, 1, 2, , n. The binomial probability of observing k successes in n trials is P X k pk 1 p k The mean and standard deviation of a binomial random variable X are X np and X np 1 p The binomial distribution with n trials and probability p of success gives a good approximation to the count of successes in an SRS of size n from a large population containing proportions p of success. This is true as long as the sample size n is no more than 10% of the population size N. The Normal approximation to the binomial distribution says that if X is a count of successes having the binomial distribution with parameters n and p, then when n is large, X is approximately normally distributed with mean np and standard deviation np 1 p . We will use this approximation when np 10 and n 1 p 10 A geometric setting consists of repeated trials of the same chance process in which the probability p of success is the same on each trial, and the goal is to count the number of trials it takes to get one success. If Y = the number of trials required to obtain the first success, then Y is a geometric random variable. Its probability distribution is called a geometric distribution. If Y has the geometric distribution with probability of success p, the possible values of Y are the positive integers 1, 2, 3, …. The geometric probability that Y takes any value is P Y k 1 p The mean (expected value) of a geometric random variable Y is Y k 1 p 1 p Problems 1. Consider two 4-sided dice, each having sides labeled 1, 2, 3, 4. Let X = the sum of the numbers that appear after a roll of the dice. A. Is X a discrete or a continuous random variable? Sketch the probability distribution of X. Describe what you see. 0.3 1 2 3 4 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 X is a random variable. We are most likely to roll a sum of 5 and least likely to roll a sum of 2 or 8. B. If someone rolled the dice 10 times and got a sum less than 3 each time, would you be surprised? Why or why not? Yes, we should be surprised. In 10 rolls we would expect to see a sum of 3 or less about once or twice. 2. Suppose the random variable Y = number of goals in a randomly selected high school hockey game has the following probability distribution: Goals 0 1 2 3 4 Probability 0.155 0.195 0.243 0.233 0.174 Sketch the probability distribution. Then calculate the mean and standard deviation of Y and interpret them in the context of the situation. E Y 0 0.155 1 0.195 2 0.243 3 0.233 4 0.174 2.076 In the long run, we’d expect to see an average of 2.076 goals per game for many, many games. VAR Y 0 2.076 0.155 1 2.076 0.195 2 2.076 0.243 2 2 2 3 2.076 0.233 4 2.076 0.174 1.7382 2 2 Standard deviation = 1.7382 1.3184. We would expect the number of goals per game to vary by about 1.3184 from 2.076 in the long run. 3. The weights of toddler boys follow an approximately Normal distribution with mean 34 pounds and standard deviation 3.5 pounds. Suppose you randomly choose one toddler boy and record his weight. What is the probability that the randomly selected boy weighs less than 31 pounds? 31 34 z 0.8571 3.5 . P z .8571 0.1956 You can calculate this probability using the normalcdf function on your calculator. (low = -999, high = 31, mean = 34, standard deviation = 3.5 4. A carnival game involves tossing a ball into numbered baskets with the goal of having your ball land in a high-numbered basket. The probability distribution of X = value of the basket on a randomly selected toss. Value Probability 0 1 2 3 0.3 0.4 0.2 0.1 The expected value of X is 1.1 and its standard deviation is 0.0943. Suppose it costs $2 to play and you earn $1.50 for each point earned on your toss. That is, ifyou land in a basket labeled “2,” you earn $3.00. Define Y to be the amount of profit you make on a randomly selected toss. Describe the shape, center, and spread of the probability distribution of Y in the context of the situation. The shape will be slightly skewed to the right. E Y 1.5 E X 2 0.35. In the long run, we would expect to lose $0.35 each time we play the game, on average. StdDev Y 1.5 0.943 1.4145. On average, we would expect our profit to ary by about $1.42 around a loss of $0.35. 5. Students in Mr. Costello’s class are expected to check their homework in groups of 4 at the beginning of class each day. Students must check it as quickly as possible, one at a time. The means and standard deviations of the time it takes to check homework for the 4 students in one group are noted. Assume their times are independent. Mean Standard deviation Alan 1.4 min 0.1 min Barb 1.2 min 0.4 min Corey 0.9 min 0.8 min Doug 1.0 min 0.7 min A. If each student checks one after the other, what are the mean and standard deviation of the total time necessary for these four students to check their homework on a randomly chosen day? Mean 1.4 1.2 0.9 1 3.3 min. StdDev 0.12 0.42 0.82 0.72 1.3 1.14 min. B. Suppose Alan and Doug like to race to see who ca check their homework faster. What are the mean and standard deviation for the difference between their times (Doug – Alan)? Interpret these values in the context of the situation. Mean 1 1.4 0.4 min. On average, Doug is faster by 0.4 min. StdDev 0.72 0.12 0.7071 min. The difference between Doug and Allan’s times will vary by 0.7071 min around 0.4 on average. 6. Mr. Molesky and Mr. Liberty are avid video game golfers. Both like to compare times to complete a particular course on their favorite game. Mr. Molesky’s times are Normally distributed with a mean of 110 minutes and standard deviation of 10 minutes. Mr. Liberty’s times are Normally distributed with mean 100 minutes and standard deviation 8 minutes. A. Find the mean and standard deviation of the difference of their times (Molesky – Liberty). Assume their times are independent. MeanM L 110 100 10 StdDev M L 102 82 164 12.81 B. Find the probability that Mr. Molesky will finish his game before Mr. Liberty on any given day. 0 10 P M L 0 P z P z 0.78 0.2177 12.81 There is about a 21.77% chance Mr. Molesky will finish before Mr. Liberty on any given day. 7. Recall that there are 4 suits – spades, hearts, clubs, and diamonds – in a standard deck of playing cards. Suppose you play a game in which you draw a card, record the suit, replace it, shuffle, and repeat until you have observed 10 cards. Define X = number of hearts observed. A. Show that X is a binomial random variable B: A card is either a heart or it isn’t I: Each draw is independent since cards are replaced and the deck is shuffled N: There are 10 observations in each game S: The P(heart) = 0.25 in each draw. B. Find the probability of observing fewer than 4 hearts in this game. P X 4 P X 0 P X 1 P X 2 P X 3 0.7759 8. Suppose 72% of students in the U.S. would give their teachers a positive rating if asked to score their effectiveness. A survey is conducted in which 500 students are randomly selected and asked to rate their teachers. Let X = the number of students in the sample who would give their teachers a positive rating. A. Show that X is approximately a binomial random variable. B: Students either give a positive or negative rating I: Since there are more than 10(500) students in the population, we can assume independence N: 500 students were selected S: P(positive rating) = 0.72 for each student. B. Use a Normal approximation to find the probability that 400 or more students would give their teacher a positive rating in this sample. Mean np 500 .72 360 StdDev np 1-p 10.04 400 360 P S 400 P z P z 3.98 0.000034 10.04 9. Suppose 20% of Super Crunch cereal boxes contain a secret decoder ring. Let X = the number of boxes of Super Crunch that must be opened until a ring is found. A. Show that X is a geometric random variable. There are two outcomes (ring or no ring). Each box is independent. The probability of a ring in any given box is 0.2. We are interested in how long it will take to find a ring. B. Find the probability that you will have to open 7 boxes to find a ring. P X 7 0.86 0.2 0.0524 C. Find the probability that it will take fewer than 4 boxes to find a ring. P X 7 0.488 D. How many boxes would you expect to have to open to find a ring? 1 E X 5 boxes 0.2