Download Statistics Unit 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mathematics of radio engineering wikipedia , lookup

Inductive probability wikipedia , lookup

Karhunen–Loève theorem wikipedia , lookup

Birthday problem wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Expected value wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Statistics Chapter 7
Random Variables
Time: 2 + weeks
Introduction: Random Variables
The Game of Craps
The game of craps is one of the most famous of all gambling games
played with dice. In this game, the player rolls a pair of dice, and the
sum of the numbers that turn up on the two faces is noted. If the sum
is 7 or 11, then the player wins immediately. If the sum is 2, 3, or 12,
then the player loses immediately. If any other sum is obtained, then
the player continues to throw the dice until he either wins by repeating
the first sum he obtained or loses by rolling a 7. Your mission in this
activity is to estimate the probability of a player winning at craps. But
first, let’s get a feel for the game. For this activity, your class will be
divided into groups of two…a pair of dice for each group.
1. In your group, play a total of 20 games or craps. One person will
roll the dice; the other will keep track of the sums and record the end
result (win or lose). Switch jobs after 10 games, if you like. How
many times out of 20 does the player win? What is the relative
frequency (percentage, written as a decimal) of wins?
Sum 2 3 4 5 6 7 8 9 10 11 12
Freq
wins
Win 1st roll
7
Loss 1st roll
loss
11
2
3
12
2. Combine your results with the other groups in the class. What is
the relative frequency of wins for the entire class?
Sum 2 3 4 5 6 7 8 9 10 11 12
Freq
wins
Win 1st roll
7
Loss 1st roll
loss
11
2
3
12
3. Use simulation techniques to represent 25 games of craps, using
either the table of random numbers or the calculator. What is the
relative frequency of wins based on the 25 simulations? How does this
number compare to the relative frequency you found in step 2?
Sum 2 3 4 5 6 7 8 9 10 11 12
Freq
wins
Win 1st roll
7
11
Loss 1st roll
loss
2
3
12
4. One of the ways you can win at craps is to roll a sum of 7 or 11 on
your first roll. Using your results and those of your fellow students,
determine the number of times a player won by rolling a sum of 7 on
the first roll? What is the relative frequency of rolling a sum of 7?
Repeat these calculations for a sum of 11. Which of these sums
appears more likely to occur than the other based on class results?
5. One of the ways you can lose at craps is to roll a sum of 2, 3, or 12
on your first roll. Using your results and those of your fellow students,
determine the number of times a player lost by rolling a sum of 2 on
the first roll. What is the relative frequency of rolling a sum of 2?
Repeat these calculations for a sum of 3 and a sum of 12. Which of
these sums appears more likely to occur than the others, based on the
class results?
6. Clearly, the key quantity of interest in craps is the sum of the
numbers on the two dice. Let’s try to get a better idea of how this sum
behaves in general by conducting a simulation. First, determine how
you would simulate the roll of a single fair die. Then determine how
you would simulate a roll of two fair dice and determine the relative
frequency of each of the possible sums.
7. Construct a relative frequency histogram of the relative frequency
results in step 6. What is the approximate shape of the distribution?
What sum appears most likely to occur? Which appears least likely to
occur?
8. From the relative frequency data in step 6, compute the relative
frequency of winning and the relative frequency of losing on your first
roll in craps. How do these simulated results compare with what the
class obtained?
2
A Random Variable is a variable whose value is a numerical outcome
of a
.
 Consider flipping a coin twice:
Possible outcomes are
.
 Random variable could be # of heads (X= # of heads)
 What are possible values for this random variable?
Suppose survey is of hair styles.
Random variables could be X=hair length, or Y= # of people
with dyed hair
Discrete Random Variable
 A random variable, X, is discrete if its set of possible values is
a collection of
on the number line.
 The probability distribution of X can be given in a table,
probability histogram, or formula and is such that:
0 < P(X) < 1
Σ P(X) = 1
Suppose I was a restaurant manager and wanted to collect data about
my restaurant. Name 2 discrete random variables I could keep track
of.
The probabilities in a distribution give information about the long-run
behavior of the variable. That is, probabilities are predicted relative
frequencies. For example….
If P(X=2)=.30, then after many observations of X, the value X=2 will
occur about 30% of the time, on average.
Tossing Coins
What is the probability distribution of the discrete random variable X
that counts the number of heads in four tosses of a coin? We can
derive this distribution if we assume the coin is balanced and the coin
tosses are completely independent.
X=0
X=1
P(X=0) =
P(X=3)
Draw a histogram.
3
X=2
P(X=1) =
P(X=4) =
X=3
P(X=2) =
X=4
Check that the Σ P(X) = 1,
and 0 < P(X) < 1. Can we
assume this is a discrete
random variable?
Continuous Random Variable
 A continuous random variable uses all of the values in an
interval.
 The probability distribution for a continuous random variable
X is denoted by f(X) and the graph is a smooth curve called a
density curve.
f(X)>0
Total area under density curve = 1
Probability the X falls in interval is equal to area under
density curve in that interval.
 Probabilities are assigned to
rather
than individual outcomes.
 In fact, the probability is
for every individual
outcome.
 Only intervals of values have positive probability.
Microwave Popcorn
Let x = amount of time that it takes a bag of microwave popcorn to
start popping and suppose that the probability density function is given
to be:
Graph f(x) and verify that the total area = 1.
Describe the shape of this distribution.
f (x) =
What are the possible values of x?

0.25 1 < x < 5
0
otherwise
How many possible values of x are there?
Find the probability that it takes less than 2 minutes.
Find the probability that it takes less than 4.5 minutes
Find the probability that it takes between 2 and 4.5 minutes.
Note: P(a<x<b) = P(x<b)-P(x<a)
Find the probability that it takes exactly 2 minutes
P(x=2) = 0 since x = 2 is only one out of an infinite number of
possibilities
P(x=2) = 0 since the line above x=2 has an area of 0 (no width)
Note: for any continuous random variable x, P(x=a)=0.
Note: P(x<a) = P(x<a)
4
Isosceles Triangle
Suppose a continuous random variable x takes on values from 10 to 30
and that its density function is an isosceles triangle.
Graph the density function and find its height
Find the probability that x is between 12 and 18.
Normal Distributions as Probability Distributions
 Density curves are most familiar to us as normal curves.
 Remember, we show a normal distribution with a symmetric
bell-shaped curve and the notation N(,σ)
 And we standardize variables with the formula:
z = x- 
σ
For example:
If x = length of a newborn baby and x  N(19,2), sketch the
distribution and shade the area that corresponds to P(x>21).
For example:
If zN(0,1), find the probability that z is less than one standard
deviation below the mean: P(z<-1) = .1587
Find the probability that z is:
Less than two standard deviations above the mean: P(z<2)=
More than 2 standard deviations above the mean: P(z>2) =
Less than 1.3 standard deviations above the mean: P(z<1.3) =
Within 2 standard deviations of the mean:
P(-2<z<2) = P(z<2)-P(z<-2) =
5
Means and Variances of Random Variables
 Remember: the mean, , of a set of observations is the ordinary
average
 The mean of a random variable, X, is also an average of
possible values of X, but we must take into account that not all
outcomes are equally likely.
The Tri-State Pick 3
Here is a simple lottery wager, from the Tri-State Pick 3 game than
New Hampshire, Maine, and Vermont share. You choose a 3-digit
number; the state chooses a 3-digit winning number at random and
pays you $500 if your number is chosen. Because there are 1000 3digit numbers, you have probability 1/1000 of winning. Taking X to
be the amount your ticket pays you, the probability distribution is
Payoff X:
$0 $500
Probability: .999 0.001
What is your average payoff from many tickets?
The ordinary average of the 2 possible outcomes $0 and $500
is $250, but that makes no sense as the average because $500 is
much less likely than $0. In the long run, you receive $500
once in every 1000 tickets and $0 in the remaining 999 of 1000
tickets. The long-run average payoff is:
That number is the mean of the random variable X.
Mean of a Random Variable
 We use the notation, σx to denote the mean of X, a random
variable.
 The mean of a random variable X is also sometimes called the
“expected value” even though we don’t necessarily expect one
observation on X to be close to its expected value.
 So here’s the full definition:
Suppose that X is a discrete random variable whose distribution is
value of X:
x1
x2
x3….. xk
Probability: p1
p2
p3…. pk
To find the mean of X, multiply each possible value by its probability,
Then add all the products:
σx = x1 p1 + x2 p2 + …+ xk pk
=  xipi
6
Variance of a Random Variable
 Remember that the measure of
that
accompanies the mean is the variance and standard deviation.
 The symbol we use for the measure of variance of a random
variable X is: σ2x
 Or σ2x =  (xi - σi)2 pi
 The standard deviation is the
of the variance.
What is the Law of Large Numbers?
Try this:
Write down a sequence of heads and tails that you think imitates 10
tosses of a balanced coin.
How long was the longest run of consecutive heads or consecutive
tails in your tosses?
What is the probability that you will get a run of 3 heads or 3 tails?
For the Law of Large Numbers, just how large is large?
7
Rules for Means:
 Rule #1: if X is a random variable and a and b are fixed
numbers, then σa+bX = a + b σX
 Rule # 2: If X and Y are random variables, then
σX+Y = σX + σY
Linear Functions & Linear Combinations:
If x = the length of a taxi ride in miles, suppose that x = 5.2 miles and
σx = 2.8 miles. Two taxi companies have different fare-schedules:
company A charges $2.50 per mile and company B charges $2 per
mile plus an initial fee of $5. Thus, if we are interested in the total fare
of a taxi ride (y), the functions are:
yA = 2.50x
and yB = 5 + 2x
How can we find the mean and standard deviation for y in each case?
Gain Communications
The number X of communications units sold by the Gain
Communications military division has distribution
X = units sold:
1000 3000 5000 10,000
Probability:
0.1
0.3
0.4
0.2
The corresponding sales estimates for the civilian division are
Y = units sold:
300
500
750
Probability:
0.4
0.5
0.1
Calculate σX and σY
Gain makes a profit of $2000 on each military unit sold and $3500 on
each civilian unit. Next year’s profit from military sales will be
2000X and civilian profit is 3500Y.
By rule 1:
By rule 2:
8
Rules for Variance
 Rule 1: If X is a random variable and a and b are fixed
numbers, σ2a+bX = b2σ2X
 Rule 2: If X and Y are independent random variables, then
σ2X+Y = σ2X + σ2Y and
σ2X-Y = σ2X + σ2Y
 Rule 3: If X and Y have correlation p, then
σ2X+Y = σ2X + σ2Y + 2pσXσY
and
2
2
2
σ X-Y = σ X + σ Y - 2pσXσY
Combining Normal Random Variables
 Any linear combination of independent normal random
variables is also
.
 If X and Y are independent normal random variables and a and
b are any fixed numbers, aX + bY is also normally distributed.
Golf Buddies
Tom and George are playing in the club golf tournament. Their scores
vary as they play the course repeatedly. Tom’s score X has the
N(110,10) distribution, and George’s score Y varies from round to
round according to the N(100,8) distribution. If they play
independently, what is the probability that Tom will score lower than
George and thus do better in the tournament?
Chapter 7 problems: # 1, 2, 4, 5, 7, 8, 12, 16, 23, 29, 32, 34, 37, 41, 42, 44
9