Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 6
Section 6.1 Discrete Random
Variables
The Game of Greed!
• The point of this game is to gain as many points as
possible. Points are earned by rolling two dice= the
sum of the dice equals the points earned for the roll. To
earn points, a player must be actively “in the game”.
Once a player opts “out of the game” they maintain
their current point total, but can not earn any more
points until the next round. Players who are “in the
game” continue earning points until they opt out or
until the “greed point” is rolled, whichever occurs first.
The “greed point” determines the end of a round. Any
players “in the game” when the greed point is rolled
lose all points earned for that round.
2
Random Variable
• A variable whose value is a numerical
outcome of a random phenomenon
• Use the variables X or Y because the
values will vary
3
Random Variables and Probability Distributions
A probability model describes the possible outcomes of a
chance process and the likelihood that those outcomes will occur.
Consider tossing a fair coin 3 times.
Define X = the number of heads obtained
X = 0: TTT
X = 1: HTT THT TTH
X = 2: HHT HTH THH
X = 3: HHH
Value (X)
0
1
2
3
Probability
1/8
3/8
3/8
1/8
A random variable takes numerical values that describe the
outcomes of some chance process. The probability distribution of
a random variable gives its possible values and their probabilities.
Discrete Random Variables
There are two main types of random variables: discrete and
continuous. If we can find a way to list all possible outcomes for a
random variable and assign probabilities to each one, we have a
discrete random variable.
Discrete Random Variables And Their Probability Distributions
A discrete random variable X takes a fixed set of possible values with
gaps between. The probability distribution of a discrete random variable
X lists the values xi and their probabilities pi:
Value:
Probability:
x1
p1
x2
p2
x3
p3
…
…
The probabilities pi must satisfy two requirements:
1.Every probability pi is a number between 0 and 1.
2.The sum of the probabilities is 1.
To find the probability of any event, add the probabilities pi of the
particular values xi that make up the event.
Example
A University posts the grade distributions for its
courses online. Students in stats received 21% A’s,
43% B’s, 30% C’s, 5% D’s, 1% F’s. Choose a student
at random- the student’s grade on a four point
scale (A = 4) is a random variable. The value of X
changes when we repeatedly choose students at
random but it’s always either a 0, 1, 2, 3, or 4.
• Here is the distribution:
Value of X
Probability
0
.01
1
2 3
4
.05 .30 .43 .21
• The probability that the student got a B or
better is the sum of the probabilities of an
A and a B (they’re independent)
P(X ≥ 3) = P(X = 3) + P(X = 4)
= .43 + .21 = .64
Probability Histograms
• Idealized pictures of the results over many trials
– Outcomes – x axis
– Probability – y axis
How many languages?
Imagine selecting a U.S. high school student at random.
Define the random variable X = number of languages spoken
by the randomly selected student. The table below gives the
probability distribution of X, based on a sample of students
from the U.S. Census at School database.
Languages:
Probability:
1
0.630
2
0.295
3
0.065
4
0.008
5
0.002
Problem:
(a) Show that the probability distribution for X is legitimate.
(b) Make a histogram of the probability distribution. Describe
what you see.
(c) What is the probability that a randomly selected student
speaks at least 3 languages?
(a) All the probabilities are between 0 and 1 and they add to 1,
so this is a legitimate probability distribution.
0.7
0.6
Probability
0.5
0.4
0.3
0.2
0.1
0.0
1
2
3
Languages spoken
4
5
(b) Shape: The histogram is strongly skewed to the right.
Center: The median is 1, but the mean will be slightly higher
due to the skewness. Spread: The number of languages varies
from 1 to 5, but nearly all of the students speak just one or two
languages.
(c) P(X>3) = P(X = 3) + P(X = 4) + P(X = 5) = 0.065 + 0.008 +
0.002 = 0.075. There is a 0.075 probability of randomly
selecting a student who speaks three or more languages.
Winning (and Losing) at Roulette
Let’s define the random variable X = net gain from a single $1
bet on red. The possible values of X are -$1 and $1. What are
the corresponding probabilities? Create a probability
distribution:
Value:
–$1
$1
Probability:
20/38
18/38
What is the player’s
average gain (aka
expected value)?
𝜇𝑋 = −1
20
+ 1
38
𝜇𝑋 = −$0.05
18
38
Mean (Expected Value) of a Discrete Random Variable
When analyzing discrete random variables, we follow the same strategy we
used with quantitative data – describe the shape, center, and spread, and
identify any outliers.
The mean of any discrete random variable is an average of the possible
outcomes, with each outcome weighted by its probability.
Suppose that X is a discrete random variable whose probability
distribution is
Value:
Probability:
x1
p1
x2
p2
x3
p3
…
…
To find the mean (expected value) of X, multiply each possible
value by its probability, then add all the products:
 x  E ( X )  x1 p1  x2 p2  x3 p3  ...
  xi pi
How many languages?
In an earlier Alternate Example, we defined the
random variable X to be the number of languages
spoken by a randomly selected U.S. high school
student. The table below gives the probability
distribution of X:
Languages:
Probability:
1
0.630
2
0.295
3
0.065
4
0.008
5
0.002
Compute the mean of the random variable X and
interpret this value in context.
 X  E( X )  (1)(0.630)  (2)(0.295)  (3)(0.065)  (4)(0.008)  (5)(0.002)
= 1.457.
If we were to randomly select many, many U.S. high school students at
random, the average number of languages spoken would be about 1.457.
Warm up
1. A large auto dealership keeps track of sales made
during each hour of the day. Let X = the number
of cars sold during the first hour of business on a
randomly selected Friday. Based on previous
records, the probability distribution of X is as
follows:
Cars sold
Probability
0
1
2
3
0.3
0.4
0.2
0.1
Compute and interpret the mean of X.
𝜇𝑋 = 0 0.3 + 1 0.4 + 2 0.2 + 3 0.1 = 1.1
If many, many Fridays are randomly selected, the
average number of cars sold will be about 1.1.
The variance of a random variable
• The mean is a measure of the center of a
distribution
• The variance and the standard deviation are
the measures of spread that accompany mean
to measure center
s2 is the symbol for the variance of a data set
2
 X is the notation for the variance of a random
variable X
Standard Deviation of a Discrete Random Variable
Since we use the mean as the measure of center for a discrete random
variable, we use the standard deviation as our measure of spread. The
definition of the variance of a random variable is similar to the
definition of the variance for a set of quantitative data.
Suppose that X is a discrete random variable whose probability
distribution is
Value:
x1
x2
x3
…
Probability:
p1
p2
p3
…
and that µX is the mean of X. The variance of X is
Var ( X )   X2  ( x1   X ) 2 p1  ( x2   X ) 2 p2  ( x3   X ) 2 p3  ...
  ( xi   X ) 2 pi
To get the standard deviation of a random variable, take the square
root of the variance.
Using the warm up problem, compute and
interpret the standard deviation of X.
𝜎2𝑋 =
0 − 1.1
3 − 1.1
2
2
0.3 + 1 − 1.1
0.1
2
0.4 + 2 − 1.1
2
0.2 +
𝜎𝑋 = 0.89 = 0.943
The number of cars sold on a randomly selected Friday will
typically vary from the mean (1.1) by about 0.943 cars.
How many languages
Yesterday, we calculated the mean number of
languages spoken for a randomly selected U.S. high
school student to be = 1.457. The table below gives
the probability distribution for X one more time.
Languages:
Probability:
1
0.630
2
0.295
3
0.065
4
0.008
5
0.002
Compute and interpret the standard deviation of the
random variable X.
 X2  (1  1.457)2 (0.630)  (2  1.457)2 (0.295) 
 X  0.450  0.671
The number of languages spoken by a randomly selected U.S. high
school student typically varies by about 0.671 languages from the mean
(1.457).
Continuous Random Variables
When we choose a digit between 0 and 9, each
number has .1 probability. What if we wanted
to allow any number between 0 and 1 as our
outcome?
There are infinite possible
Outcomes! How can we
Assign probabilities?
X in this case is a continuous
Variable b/c its values are not
Isolated numbers but an entire interval of numbers
Continuous Random Variables
A continuous random variable X takes on all values in an
interval of numbers. The probability distribution of X is
described by a density curve. The probability of any event is
the area under the density curve and above the values of X
that make up the event.
The probability model of a discrete random variable X
assigns a probability between 0 and 1 to each possible
value of X.
A continuous random variable Y has infinitely many
possible values. All continuous probability models assign
probability 0 to every individual outcome. Only intervals of
values have positive probability.
Normal Distributions
The density curve we are most familiar
with is a Normal Curve.
–Remember N(μ, σ) is our notation for a
normal distribution.
–Z score is a standard Normal random
variable having the distribution N (0, 1)
𝑥−𝜇
𝑍=
𝜎
Example: Normal probability distributions
The heights of young women closely follow the Normal distribution with
mean µ = 64 inches and standard deviation σ = 2.7 inches.
Now choose one young woman at random. Call her height Y.
If we repeat the random choice very many times, the distribution of values
of Y is the same Normal distribution that describes the heights of all young
women.
Problem: What’s the
probability that the chosen
woman is between 68 and 70
inches tall?
Example: Normal probability distributions
Step 1: State the distribution and the values of interest.
The height Y of a randomly chosen young woman has the
N(64, 2.7) distribution. We want to find P(68 ≤ Y ≤ 70).
Step 2: Perform calculations—show
your work!
The standardized scores for the two
boundary values are
Opinion poll
An SRS of 1500 adults consider what is the
most serious problem facing schools?
Suppose we say 30% consider drugs.
N (0.3, 0.0118)
What is the probability that the poll results
differs from the truth about the population
by more than two percentage points?
24
By the addition rule for disjoint events, the
desired probability is:
𝑃 𝑝 < 0.28 𝑜𝑟 𝑝 > 0.32 = 𝑃(𝑝 < 0.28) + P(𝑝 > 0.32)
= 0.0455 + 0.0455 = 0.0910
Example: cheating in School
True population probability that a student will report
on another cheating student is 12%. Based on a
survey of 400 random students, done many times, we
expect the average probability to get close to this .12
value. (with a standard deviation of .016). What is
the probability that if I conduct a survey my result will
differ from the true probability by 2%?
•
•
•
•
If the result is less than .10 or greater than .14
P(From table A or calculator, P(-1.25 ≤ Z ≤ 1.25)
= .8944 - .1056 = .7888
So probability we seek is 1 - .7888 = .2112 which is 21%
Weights of three-year-old females
The weights of three-year-old females closely
follow a Normal distribution with a mean of = 30.7
pounds and a standard deviation of 3.6 pounds.
Randomly choose one three-year-old female and
call her weight X. What is the probability that a
randomly selected three-year-old female weighs at
least 30 pounds?
The weight X of a randomly chosen three-year-old female has the
N(30.7, 3.6) distribution. We want to find P(X  30) as shown in
the figure below.
Perform calculations–show your work! The standardized score
for the boundary value is
= −0.19. From Table A, P(Z <
−0.19) = 0.4247, so P(Z  −0.19) = 1 − 0.4247 = 0.5753. Using
technology: The command normalcdf(lower:30, upper: 1000, :
30.7, :3.6) gives an area of 0.5771.
There is about a 58% chance that the randomly selected threeyear-old female will weigh at least 30 pounds.
Related documents