Download X - jmullenkhs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
+
Chapter 6: Random Variables
Section 6.1
Discrete and Continuous Random Variables
The Practice of Statistics, 4th edition – For AP*
STARNES, YATES, MOORE
+
Chapter 6
Random Variables
 6.1
Discrete and Continuous Random Variables
 6.2
Transforming and Combining Random Variables
 6.3
Binomial and Geometric Random Variables
+ Section 6.1
Discrete and Continuous Random Variables
Learning Objectives
After this section, you should be able to…

APPLY the concept of discrete random variables to a variety of
statistical settings

CALCULATE and INTERPRET the mean (expected value) of a
discrete random variable

CALCULATE and INTERPRET the standard deviation (and variance)
of a discrete random variable

DESCRIBE continuous random variables
Pig
+
 Greedy
The Game: The match consists of 5 rounds. Each round
consists of a number of games. Before each game, each
person decides whether to stop (and retain the points earned in
that round) or continue to play (and win more points, or lost all
points gained in the round, depending on the outcome.
•All players get the first two
throws for free. A die is tossed
twice. The points are added
together.
•All players stand up.
•Each player has two options:
quit, sit down, and keep the score
that is the total of the two dice for
Round 1; or continue to play and
remain standing.
Round
1
2
3
4
5
TOTAL
Points
Randomness, Probability, and Simulation
How can we use probability to make and justify decisions?
Pig
The Game: The match consists of 5 rounds. Each round
consists of a number of games. Before each game, each
person decides whether to stop (and retain the points earned in
that round) or continue to play (and win more points, or lost all
points gained in the round, depending on the outcome.
•If players continue to play, they
Round
Points
remain standing. The die is
1
tossed. If the number 1, 3, 4, 5,
2
or 6 is rolled, the player adds this
number to his or her total for that
3
round. But if the number is 2, the
4
students that are still standing
5
lose all points for that round, and
record a score of 0 for that round.
TOTAL
•This continues until all players
The object of the game is to determine
have sat down, or a 2 is rolled.
a strategy that in the long term will
Then that round is over.
maximize the total points.
Randomness, Probability, and Simulation
How can we use probability to make and justify decisions?
+
 Greedy
Pig
Randomness, Probability, and Simulation
•What strategies did you use to decide when to stop?
•Choose a strategy to use to play again. Choose this strategy
and stick to it. Play the game again. How did this strategy work
for you?
•Record your results in a back-to-back stem-and-leaf plot.
Compare the results.
+
 Greedy
Water vs. Tap Water
+
 Bottled
Station
Bottled Water Cup?
Truth
Discrete and Continuous Random Variables
1) Look at your index card with your station written on it.
2) Go to the corresponding station. Pick up three cups (one each A,
B, and C) and take them back to your seat.
3) Your task is to determine which one of the three cups contains the
bottled water. Drink all the water in Cup A first, then the water in
Cup B, and finally the water in Cup C. Write down the letter of the
cup that you think held the bottled water. Do not discuss your
results with any of your classmates.
4) When Ms. Raskin tells you to do so, go to the board and record
your station number and the letter of the cup you identified as
containing bottled water.
Water vs. Tap Water
When my Nerd Camp class did this activity, 13 out of the 21 teachers
made the correct identifications. If you assume that the teachers can’t
tell bottled water from tap water, you would assume that 7 teachers
would guess correctly (1/3 of the total). How likely is it that 13 of the 21
teachers would guess correctly? To answer this, we will need a different
kind of probability model than what we have been using.
Discrete and Continuous Random Variables
5) Let’s assume that no one in the class can distinguish tap water
from bottled water. In that case, students would just be guessing
which cup of water tastes different. If so, what’s the probability that
an individual student would guess correctly?
6) How many correct identifications would you need to see to be
convinced that the students in your class aren’t just guessing?
Choose a partner and design a simulation to answer this question.
What do you conclude about your class’s ability to distinguish
tap water from bottled water?
+
 Bottled
Water vs. Tap Water
X
X
X
X X X
X
X X X
X
X X X
X X X X X
X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X
X X X X X X X
X X X X X X X X
X X X X X X X X X
X X X X X X X X X
X X X X X X X X X X X
2 3 4 5 6 7 8 9 10 11 12 13 14
How likely is it that 13 of the 21 teachers would guess
correctly? To answer this, we will need a different kind of
probability model than what we have been using.
Discrete and Continuous Random Variables
When my Nerd Camp class did this activity, 13 out of the 21 teachers made the
correct identifications. If you assume that the teachers can’t tell bottled water
from tap water, you would assume that 7 teachers would guess correctly (1/3 of
the total). Here is a dotplot showing 100 trials of the simulation to see how often
there are 13 or more correct guesses (using RandInt; letting 1 be a correct guess
and 2 & 3 be incorrect; looking in groups of 21; in the simulation below, there
were 4 trials in which three teachers guessed correctly).
+
 Bottled
Variable and Probability Distribution
A numerical variable that describes the outcomes of a chance process
is called a random variable. The probability model for a random
variable is its probability distribution
Definition:
A random variable takes numerical values that describe the outcomes
of some chance process. The probability distribution of a random
variable gives its possible values and their probabilities.
Example: Consider tossing a fair coin 3 times.
Define X = the number of heads obtained
X = 0: TTT
X = 1: HTT THT TTH
X = 2: HHT HTH THH
X = 3: HHH
Value
0
1
2
3
Probability
1/8
3/8
3/8
1/8
What’s the
probability that you
will get at least one
“heads” in three
tosses?
How would you
interpret that
probability?
Discrete and Continuous Random Variables
A probability model describes the possible outcomes of a chance
process and the likelihood that those outcomes will occur.
+
 Random
Random Variables
Shoe size
(discrete)
Foot length
(continuous)
Discrete and Continuous Random Variables
There are two main types of random variables: discrete and
continuous. If we can find a way to list all possible outcomes
for a random variable and assign probabilities to each one, we
have a discrete random variable.
+
 Discrete
Random Variables
+
 Discrete
Discrete Random Variables and Their Probability Distributions
A discrete random variable X takes a fixed set of possible values with
gaps between. The probability distribution of a discrete random variable
X lists the values xi and their probabilities pi:
Value:
x1
Probability: p1
x2
p2
x3
p3
…
…
The probabilities pi must satisfy two requirements:
1. Every probability pi is a number between 0 and 1.
2. The sum of the probabilities is 1.
To find the probability of any event, add the probabilities pi of the particular
values xi that make up the event.
Discrete and Continuous Random Variables
There are two main types of random variables: discrete and
continuous. If we can find a way to list all possible outcomes
for a random variable and assign probabilities to each one, we
have a discrete random variable.
Babies’ Health at Birth (Apgar Scores)
+
 Example:
In 1952, Dr. Virginia Apgar suggested five criteria for measuring a baby’s health
at birth: skin color, heart rate, muscle tone, breathing, and response to stimuli.
She developed a 0 – 1 – 2 scale to rate a newborn on each of the five criteria.
A baby’s Apgar score is the sum of the ratings on the five scales, which gives
a whole-number value from 0 to 10.
What Apgar scores are typical? To find out, researchers recorded the Apgar
scores of over 2 million newborn babies in a single year. Imagine selecting
one of these newborns at random. (That’s our chance process.)
Define X as the Apgar score of a randomly-selected baby one minute after birth.
The table on the next slide gives the probability description for X.
Babies’ Health at Birth (Apgar Scores)
+
 Example:
(a) Show
that the probability distribution for X is legitimate.
(b) Make a histogram of the probability distribution. Describe what you see.
(c) Apgar scores of 7 or higher indicate a healthy baby. What is P(X ≥ 7)?
Value:
0
1
2
3
4
5
6
7
8
9
10
Probability:
0.001
0.006
0.007
0.008
0.012
0.020
0.038
0.099
0.319
0.437
0.053
(a) All probabilities
are between 0 and 1
and they add up to 1.
This is a legitimate
probability
distribution.
(c) P(X ≥ 7) = .908
We’d have a 91 %
chance of randomly
choosing a healthy
baby.
Notice the
(b) The left-skewed shape of the distribution suggests a randomly
selected newborn will have an Apgar score at the high end of the scale.
There is a small chance of getting a baby with a score of 5 or lower.
difference
between > and ≥.
With discrete
variables, these
are different but
not with
continuous
variables.
NHL Goals
+
 Example:
(a) Show
that the probability distribution for X is legitimate.
(b) Make a histogram of the probability distribution. Describe what you see.
(c) What is the probability that the number of goals scored by a randomlyselected game is at least 6?
Goals:
0
1
2
3
4
5
6
7
8
9
Probability:
0.061
0.154
0.228
0.229
0.173
0.094
0.041
0.015
0.004
0.001
(a) All probabilities
are between 0 and 1
and they add up to 1.
This is a legitimate
probability
distribution.
(c) P(X ≥ 6) = .041 +
.015 + .004 + .001 =
0.061.
We’d have a 6.1 %
chance that the number
of goals scored is at
least 6.
(b) The histogram is right-skewed, which means that the majority of
games are relatively low scoring. It is pretty unusual for a team to score
6 or more goals.
YOUR UNDERSTANDING
+
 CHECK
Value of X
0
1
2
3
4
Probability
0.02
0.10
0.20
0.42
0.26
1) Say in words what the meaning of P(X ≥ 3) is. What is this
probability?
2) Write the event “the student got a grade worse than C” in
terms of values of the random variable X. What is the
probability of this event?
3) Sketch a graph of the probability distribution. Describe what
you see.
Discrete and Continuous Random Variables
North Carolina State University posts the grade distribution for its
courses online. Students in Statistics 101 in a recent semester received
26% As, 42% Bs, 20% Cs, 10% Ds, and 2% Fs. Choose a Statistics
student at random. The student’s grade on a four-point scale is a
discrete random variable X with this probability distribution:
YOUR UNDERSTANDING
+
 CHECK
Value of X
0
1
2
3
4
Probability
0.02
0.10
0.20
0.42
0.26
1) Say in words what the meaning of P(X ≥ 3) is. What is this
probability? The probability that the student gets either an A
or a B is 0.68.
2) Write the event “the student got a grade worse than C” in
terms of values of the random variable X. What is the
probability of this event? P(X < 2) = 0.12
3) Sketch a graph of the probability distribution. Describe what
you see. The histogram is left-skewed. Higher grades are
more likely, but there are a few lower grades.
Discrete and Continuous Random Variables
North Carolina State University posts the grade distribution for its
courses online. Students in Statistics 101 in a recent semester received
26% As, 42% Bs, 20% Cs, 10% Ds, and 2% Fs. Choose a Statistics
student at random. The student’s grade on a four-point scale is a
discrete random variable X with this probability distribution:
of a Discrete Random Variable
The mean of any discrete random variable is an average of the
possible outcomes, with each outcome weighted by its
probability.
Definition:
Suppose that X is a discrete random variable whose probability
distribution is
Value:
x1 x2 x3 …
Probability: p1 p2 p3 …
To find the mean (expected value) of X, multiply each possible value
by its probability, then add all the products:
 x  E(X)  x1 p1  x 2 p2  x 3 p3  ...
  x i pi
Discrete and Continuous Random Variables
When analyzing discrete random variables, we’ll follow the same
strategy we used with quantitative data – describe the shape,
center, and spread, and identify any outliers.
+
 Mean
Apgar Scores – What’s Typical?
+
 Example:
Consider the random variable X = Apgar Score
Compute the mean of the random variable X and interpret it in context.
Value:
0
1
2
3
4
5
6
7
8
9
10
Probability:
0.001
0.006
0.007
0.008
0.012
0.020
0.038
0.099
0.319
0.437
0.053
x  E(X)   xi pi
 (0)(0.001)  (1)(0.006)  (2)(0.007)  ... (10)(0.053)
 8.128
The mean Apgar score of a randomly selected newborn is 8.128. This is the longterm average Agar score of many, many randomly chosen babies.

Note: The expected value does not need to be a possible value of X or an integer!
It is a long-term average over many repetitions.
Winning (and Losing) at Roulette
Consider the random variable X = net gain from a single $1 bet on red.
Compute the mean of the random variable X and interpret it in context.
Value:
-1
1
Probability:
20/38
18/38
x  E(X)   xi pi
 ($1)(20 /38)  ($1)(18 /38)
 $0.05
In the long run, the player loses (and the casino gains) five cents per bet.
The ordinary average of -1 and 1 is 0, but $0 isn’t the average winnings because
the player is less likely to win $1 than to lose $1.
+
 Example:
More Roulette
+
 Example:
Another wager players can make in roulette is called a “corner bet.” To make
this bet, a player places his chips on the intersection of four numbered
squares on the roulette table. If one of these numbers comes up on the wheel
and the player bet $1, the player gets his $1 back plus $8 more. Otherwise,
the casino keeps the original $1 bet.
Consider the random variable X = net gain from a single $1 corner bet
Compute the mean of the random variable X and interpret it in context.
Value:
-1
8
Probability:
34/38
4/38
x  E(X)   xi pi
 ($1)(34 /38)  ($8)(4 /38)
 $0.05
If a player were to make $1 corner bets many, many times, the average gain would
be about -$0.05 per bet.

NHL Goals
+
 Example:
Consider the random variable X = Number of goals scored by a randomlyselected team in a randomly-selected game
Goals:
0
1
2
3
4
5
6
7
8
9
Probability:
0.061
0.154
0.228
0.229
0.173
0.094
0.041
0.015
0.004
0.001
x  E(X)   xi pi
 (0)(0.061)  (1)(0.154)  ... (9)(0.001)
 2.851
The number of goals for a randomly-selected team in a randomly-selected game is
2.851. If you were to repeat the random sampling process over and over again,
the mean number of goals scored would be about 2.851 in the long run.

ERROR ALERT!
Discrete and Continuous Random Variables
Many students incorrectly believe that the expected value of a random
variable must be equal to one of the possible values of the variable.
This is not the case. In the roulette examples, the expected value was
-$0.05, even though this was not a possible gain from a $1 single bet.
In the Apgar example, the average score 8.128, is not a possible value
of the random variable. If you think of the mean as a long-run average
over many repetitions, these facts make sense.
+
 AP
Deviation of a Discrete Random Variable
Definition:
Suppose that X is a discrete random variable whose probability
distribution is
Value:
x1 x2 x3 …
Probability: p1 p2 p3 …
and that µX is the mean of X. The variance of X is
Var(X)   X2  (x1   X ) 2 p1  (x 2   X ) 2 p2  (x 3   X ) 2 p3  ...
  (x i   X ) 2 pi
This formula is given to you on the
formula sheet of the AP exam.
To get the standard deviation of a random variable, take the square root
of the variance.
Discrete and Continuous Random Variables
Since we use the mean as the measure of center for a discrete
random variable, we’ll use the standard deviation as our measure of
spread. The definition of the variance of a random variable is
similar to the definition of the variance for a set of quantitative data.
+
 Standard
Apgar Scores – How Variable Are They?
+
 Example:
Consider the random variable X = Apgar Score
Compute the standard deviation of the random variable X and interpret it in
context. Recall that the mean Apgar score was 8.128.
Value:
0
1
2
3
4
5
6
7
8
9
10
Probability:
0.001
0.006
0.007
0.008
0.012
0.020
0.038
0.099
0.319
0.437
0.053
  (x i X ) pi
2
X
2
 (0  8.128)2 (0.001)  (1 8.128)2 (0.006)  ... (10  8.128)2 (0.053)
Variance
 2.066
 X  2.066 1.437
The standard deviation of X is 1.437. On average, a randomly selected baby’s
Apgar score will differ from the mean 8.128 by about 1.4 units.
Analyzing Random Variables on the
1) Entering the values of the random variable in L1 and the
corresponding probabilities in L2. (Practice by using the Apgar
Scores)
2) To graph a histogram of the probability distribution:
1) Set up a statistics plot with Xlist: L1 and Freq: L2
2) Adjust your window settings to:
1) Xmin = -1
When you are done
2) Xmax = 11
with the histogram,
3) Xscl = 1
remember to reset Freq
4) Ymin = -0.1
back to 1.
5) Ymax = 0.5
6) Yscl = 0.1
3) Press GRAPH (F3 on TI-89)
3) To calculate the mean and standard deviation of the random
variable, use one-variable statistics with the values in L1 and the
probabilities in L2:
1) TI-83/84: Execute the command 1-Var Stats L1, L2
2) TI-89: In the Statistics/List Editor, press F4 (Calc) and choose 1:
1-Var Stats. Use the inputs List: list1 and Freq: list 2
Discrete and Continuous Random Variables
Calculator
+
 Technology:
Discrete and Continuous Random Variables
Calculator
Analyzing Random Variables on the
+
 Technology:
Discrete and Continuous Random Variables
Calculator
Analyzing Random Variables on the
+
 Technology:
NHL Goals
+
 Example:
Consider the random variable X = goals scored
Compute the standard deviation of the random variable X and interpret it in
context. Recall that the mean goals scored was 2.851.
Goals
0
1
2
3
4
5
6
7
8
9
Probability:
0.061
0.154
0.228
0.229
0.173
0.094
0.041
0.015
0.004
0.001
  (x i X ) pi
2
X
2
 (0  2.851)2 (0.061)  (1 2.851)2 (0.154)  ... (9 1.54)2 (0.001)
 2.66
X  2.66 1.63
The standard deviation of X is 1.63. On average, a randomly-selected team’s
number of goals in a randomly-selected game will differ from the mean by about
1.63 goals.
YOUR UNDERSTANDING
+
 CHECK
Cars Sold
0
1
2
3
Probability
0.3
0.4
0.2
0.1
1) Compute and interpret the meaning of X.
2) Compute and interpret the standard deviation of X.
Discrete and Continuous Random Variables
A large auto dealership keeps track of sales made during each hour of
the day. Let X = the number of cars sold during the first hour our of
business on a randomly-selected Friday. Based on previous records,
the probability distribution of X is as follows:
YOUR UNDERSTANDING
+
 CHECK
Cars Sold
0
1
2
3
Probability
0.3
0.4
0.2
0.1
1) Compute and interpret the meaning of X.
μx = 1.1. The long-run average, over many Friday mornings,
will be about 1.1 cars sold.
2) Compute and interpret the standard deviation of X.
σx = 0.943. On average, the number of cars sold on a
randomly-selected Friday will differ from the mean (1.1) by
about 0.943 cars sold.
Discrete and Continuous Random Variables
A large auto dealership keeps track of sales made during each hour of
the day. Let X = the number of cars sold during the first hour our of
business on a randomly-selected Friday. Based on previous records,
the probability distribution of X is as follows:
ERROR ALERT!
Discrete and Continuous Random Variables
In many cases where you are asked to calculate the mean (expected
value) or standard deviation of a random variable, you will lose credit for
not showing adequate work – or for not showing work at all. To get full
credit, you need to go beyond just reporting your commands used on
your graphing calculator. Show the first couple of terms with an ellipsis
(. . .) for the calculation of the mean or standard deviation.
+
 AP
Random Variables
Definition:
A continuous random variable X takes on all values in an interval of
numbers. The probability distribution of X is described by a density
curve. The probability of any event is the area under the density curve
and above the values of X that make up the event.
The probability model of a discrete random variable X assigns a
probability between 0 and 1 to each possible value of X.
A continuous random variable Y has infinitely many possible values.
All continuous probability models assign probability 0 to every
individual outcome. Only intervals of values have positive probability.
Discrete and Continuous Random Variables
Discrete random variables commonly arise from situations that
involve counting something. Situations that involve measuring
something often result in a continuous random variable.
+
 Continuous
Random Variables
The command 2rand will generate a random number from 0 to
2.
To generate a random number from 1 to 2, use the command
rand+1.
You can combine these commands to produce other intervals.
Discrete and Continuous Random Variables
Calculator Commands:
The calculator command rand will generate a random number
from 0 to 1.
+
 Continuous
Random Variables
The random number generator
will spread its output uniformly
across the entire interval from 0
to 1 as we allow it to generate
a long sequence of random
numbers. The results of many
trials are represented by the
density curve of a uniform
distribution. This density curve
appears at right. It has height
1 over the interval from 0 to 1.
The area under the density
curve is 1, and the probability
of any event is the area under
the density curve and above
the event in question.
Remember that each individual
probability is 0.
Discrete and Continuous Random Variables
Random Numbers:
+
 Continuous
Young Women’s Heights
+
 Example:
The heights of young women closely follow the Normal distribution with mean =
64 inches and standard deviation = 2.7 inches. This is a distribution for a
large set of data. Choose one young woman at random. Call her height Y. If
we repeat the random choice many, many times, the distribution of values of Y
is the same Normal distribution that describes the heights of all young women.
Find the probability that the chosen woman is between 68 and 70 inches tall.
Young Women’s Heights
+
 Example:
Define Y as the height of a randomly chosen young woman. Y is a continuous
random variable whose probability distribution is N(64, 2.7).
What is the probability that a randomly chosen young woman has height
between 68 and 70 inches?
P(68 ≤ Y ≤ 70) = ???
68  64
2.7
 1.48
z
70  64
2.7
 2.22
z
P(1.48 ≤ Z ≤ 2.22) = P(Z ≤ 2.22) – P(Z ≤ 1.48)

= 0.9868 – 0.9306

= 0.0562
There is about a 5.6% chance that a randomly chosen young woman
has a height between 68 and 70 inches.
Young Women’s Heights – Using a
Calculator
2nd VARS (gives you DISTR)
Use normalcdf(68, 70, 64, 2.7)
+
 Example:
Weights of 3-yr.-old Females
+
 Example:
Define Y as the weight of a randomly-chosen 3-yr.-old female. Y is a continuous
random variable whose probability distribution is N(30.7, 3.6).
What is the probability that a randomly chosen 3-yr.-old female weighs at
least 30 pounds?
P(Y ≥ 30) = ???
30  30.7
3.6
 0.19
z
P(Z ≥ -0.19) = 1 – P(Z < -0.19)

= 1 – 0.4247
= 0.5753
There is about a 58% chance that the randomly selected 3-yr.-old female
will weigh at least 30 pounds.
ERROR ALERT!
Discrete and Continuous Random Variables
When you solve problems involving random variables, start by defining
the random variable of interest. For example, let X = the Apgar score of
a randomly-selected baby or let Y = the height of a randomly-selected
young woman. Then state the probability you’re trying to find in terms of
the random variable P(68 ≤ Y ≤ 70) or P(X ≥ 7).
+
 AP
+ Section 6.1
Discrete and Continuous Random Variables
Summary
In this section, we learned that…

A random variable is a variable taking numerical values determined
by the outcome of a chance process. The probability distribution of
a random variable X tells us what the possible values of X are and
how probabilities are assigned to those values.

A discrete random variable has a fixed set of possible values with
gaps between them. The probability distribution assigns each of
these values a probability between 0 and 1 such that the sum of all
the probabilities is exactly 1.

A continuous random variable takes all values in some interval of
numbers. A density curve describes the probability distribution of a
continuous random variable.
+ Section 6.1
Discrete and Continuous Random Variables
Summary
In this section, we learned that…

The mean of a random variable is the long-run average value of
the variable after many repetitions of the chance process. It is also
known as the expected value of the random variable.

The expected value of a discrete random variable X is

The variance of a random variable is the average squared
deviation of the values of the variable from their mean. The standard
deviation is the square root of the variance. For a discrete random
variable X,
x   xi pi  x1 p1  x2 p2  x3 p3  ...
2
 X  (xi X )2 pi  (x1 X )2 p1  (x2 X )2 p2  (x3 X )2 p3  ...
+
Looking Ahead…
In the next Section…
We’ll learn how to determine the mean and standard
deviation when we transform or combine random
variables.
We’ll learn about
 Linear Transformations
 Combining Random Variables
 Combining Normal Random Variables