Download Chapter 6 – Random Variables and the Normal

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
1
Chapter 6 – Random Variables and the Normal Distribution
 Random Variable
o A random variable is a variable whose values are determined by
chance.
 Discrete and Continuous Random Variables
o A discrete random variable can take either a finite or a countable
number of values. Since these values may be written as a list of
numbers, each value can be graphed as a separate point on a number
line, with space between each point.
o A continuous random variable can take infinitely many values.
Because there are infinitely many values, the values of a continuous
random variable form an interval on the number line.
 Probability Distribution of A Discrete Random Variable
o A probability distribution of a discrete random variable provides all
the possible values that the random variable can assume, together with
the probability associated with each value. The probability
distribution can take the form of a table, graph, or formula. Probability
distributions describe populations, not samples.
 Requirements for the probability Distribution of a Discrete Random
Variable
o The sum of the probabilities of all the possible value of a discrete
random variable must equal 1. That is, ∑ ( )
.
o The probability of each value of X must be between 0 and 1, inclusive.
That is, 0 ≤ P(X) ≤ 1.
2
Example 6.1
Kristin’s probability estimates for five investment outcomes for a period of 12 months (initial investment $2000)
Scenario
Cape Ann Biotech does very well.
Cape Ann Biotech does fairly well.
Cape Ann Biotech treads water.
Cape Ann Biotech does not do very well.
Cape Ann Biotech folds.
Financial gain
Gain $1000
Gain $500
Gain $0
Lose $200 (Gain -$200)
Lose $2000 (Gain -$2000)
Kristin’s estimated probability
0.15
0.30
0.25
0.20
0.10
Once we define our random variable, this table can be converted into a probability distribution table.
Since Kristin is interested in what happens to her money, we define our random variable X to be
X = financial gain (the financial gain associated with the five possible scenarios)
a. Why is the variable X = Financial gain a random variable?
X = financial gain is a random variable because we do not know, before the
investment is made, the value that variable will take.
b. What are the possible values that X can take?
X = {-$2000, -$200, $0, $500, $1000}
c. Use random variable notation to express the probabilities associated with each possible
outcome of X.
P(X = -2000) = 0.10, P(X = -200) = 0.20, P(X = 0) = 0.25, P(X = 500) = 0.30, P(X = 1000) = 0.15
d. Construct the probability distribution of X = financial gain.
Probability distribution of Kristin’s financial gain
0
500 1000
X = Financial gain in dollars -2000 -200
0.10 0.20 0.25 0.30 0.15
P(X)
e. Find the probability that Kristin will make a profit on her investment.
P(gain $1000) + P(gain $500) = P(X = 1000) + P(X = 500) = 0.15 + 0.30 = 0.45
f. Find the probability that Kristin will take a loss on her investment.
P(X = -2000) + P(X = -200) = 0.10 + 0.20 = 0.30
3
 Finding the mean of a Discrete Random Variable
o The mean μ of a discrete random variable is found as follows:
 Multiply each possible value of X by its probability.
 Add the resulting products.
( )
The procedure is denoted as μ ∑
 Formulas for the Variance and Standard Deviation of a Discrete Random Variable
o Definition Formulas
∑(
)
( )
√∑ (
)
( )
o Computational Formulas
∑(
( ))
√∑ (
( ))
Example 6.2
Probability distribution of Kristin’s financial gain
-2000 -200
0
500
0.10
0.20 0.25 0.30
X = Financial gain in dollars
P(X)
1000
0.15
a. Calculate the expected value for Kristin’s financial gain from her investment.
μ = (-2000)(0.10) + (-200)(0.20)+(0)(0.25) + (500)(0.30) + (1000)(0.15) = 60
b. Find variance and standard deviation of Kristin’s financial gain, using the definition and computational
formula
X
-2000
-200
0
500
1000
(𝑿 𝝁)𝟐
(𝑿 𝝁)𝟐 𝑷(𝑿)
4,243,600
424,360
67,600
13,520
3,600
900
193,600
58,080
883,600
132,540
𝝈𝟐 ∑ (𝑿 𝝁)𝟐 𝑷(𝑿) = 629,400
𝑿 𝝁
-2060
-260
-60
440
940
P(X)
0.10
0.20
0.25
0.30
0.15
𝝈
X
P(X)
-2000
-200
0
500
1000
0.10
0.20
0.25
0.30
0.15
𝝈𝟐
𝝈
𝝈𝟐
𝟕𝟗𝟑. 𝟑𝟒𝟕𝟑𝟑𝟖𝟖 ≈ $𝟕𝟗𝟑. 𝟑𝟓
𝑿𝟐
𝑿𝟐 𝑷(𝑿)
4,000,000
400,000
40,000
8,000
0
0
250,000
75,000
1,000,000
150,000
𝟐
∑ 𝑿 𝑷(𝑿) = 633,000
∑ 𝑿𝟐 𝑷(𝑿)
𝝁𝟐 = 629,400
𝝈𝟐
𝟕𝟗𝟑. 𝟑𝟒𝟕𝟑𝟑𝟖𝟖 ≈ 𝟕𝟗𝟑. 𝟑𝟓
4
 The Binomial Probability Distribution Formula
o The probability of observing exactly X successes in n trials of a
binomial experiment is
( ) ( ) (
)
Example 6.3
Suppose the Joshua is about to take four-question multiple choice statistics quiz. Josh did not study
for the quiz, so he will have to take random guesses on each of the four questions. Each question has
five possible alternatives, only one of which is correct.
There are four questions on the quiz, so the number of trials is n = 4. Next we know that p = 1/5,
since there are five choices and Joshua has a 1 in 5 chance of being correct if he choose
randomly. Thus,
p = probability of success = 1/5 = 0.2
Four of the five possible alternatives are incorrect. So,
(1 – p) = probability of failure = 4/5 = 0.8
a. What is the probability that Joshua will ace the quiz by answering all the questions correctly?
To find the probability of correctly guessing the right answer on all four question,
Joshua is interested in observing X = 4 successes. Using the binomial formula, we obtain
𝑷(𝑿
𝟒)
(4 4)(𝟎. 𝟐𝟒 )(𝟏
𝟎. 𝟐)𝟒
𝟒
(𝟏)(𝟎. 𝟎𝟎𝟏𝟔)(𝟏)
𝟎. 𝟎𝟎𝟏𝟔
So Joshua’s chance of acing this quiz by making random guesses is very small, less than
one-fifth of 1%.
b. What is the probability that Joshua will pass the quiz by answering at least three questions
correctly?
To answer at least three questions correctly, Joshua must answer either X = 3 or X = 4
questions correctly. Since these events are mutually exclusive, we find the required
probability by using the Addition Rule for Mutually Exclusive Events,
𝑷(𝑿 ≥ 𝟑)
𝑷(𝑿
𝟑) + 𝑷(𝑿
𝟒)
We already found P(X = 4) = 0.0016 in (a). Now we find
𝑷(𝒙
𝟑)
(4 3)(𝟎. 𝟐𝟑 )(𝟏
𝟎. 𝟐)𝟒
𝟑
(𝟒)(𝟎. 𝟎𝟎𝟖)(𝟎. 𝟖)
𝟎. 𝟎𝟐𝟓𝟔
Therefore, the probability that Joshua will pass this quiz by random guessing is
0.0016 + 0.0256 = 0.0272
Since he has less than a 3% chance of even passing this quiz, we would tell Joshua that
perhaps random guessing isn’t the best strategy for stats quizzes.
5
Example 6.4
Suppose the Joshua is about to take four-question multiple choice statistics quiz. Josh did not study
for the quiz, so he will have to take random guesses on each of the four questions. Each question has
five possible alternatives, only one of which is correct.
Use the binomial table to find the following probabilities
a. What is the probability that Joshua will ace the quiz by answering all the questions correctly?

Look under the n column until you find n = 4. That is the portion of the table
you will us.

Then go across the top of the table until you get to p = 0.20. That gives you your
column.

We are interested in finding the probability of observing X = 4, where X is the
number of successes. So go down the column until you see 4 under the X column
on the left (and in the subgroup with n = 4).

The number in the p column is 0.0016.
 Mean, Variance, and Standard Deviation of a Binomial Random Variable X
o Mean (or expected Value): μ = n * p
o Variance:
(
)
(
o Standard deviation:
)
Example 6.5
Suppose we know that the population proportion p of left-handed students is 0.10
a. In a sample of 200 students, how many would we expect to be left-handed?
E(X) = μ = n * p = (200)(0.10) = 20
b. Would 40 left-handed students out of 200 be considered unusual?
𝝈
𝒏 𝒑 (𝟏
𝒑)
(𝟐𝟎𝟎)(𝟎. 𝟏)(𝟏
𝟎. 𝟏)
𝟏𝟖 ≈ 𝟒. 𝟐𝟒𝟐𝟔
How many standard deviations does 40 lie above the mean of 20?
𝑿
𝝁
𝝈
𝟒𝟎 𝟐𝟎
≈ 𝟒. 𝟕𝟏𝟒
𝟒. 𝟐𝟒𝟐𝟔
Finding 40 lefties in a sample of 200 is unusual because this value lies 4.7 standard
deviation above the men
6
 Continuous Probability Distribution
o A continuous probability distribution is a graph that indicates on the
horizontal axis the range of value that the continuous random variable
X can take, and above which is drawn a curve, called the density
curve. A continuous probability distribution must follow the
Requirements for the Probability Distribution of a Continuous
Random Variable.
o Requirements for the Probability Distribution of a Continuous
Random Variable
 The total area under the density curve must equal 1 (this is the
Law of Total Probability for Continuous Random Variables).
 The vertical height of the density curve can never be negative.
That is, the density curve never goes below the horizontal axis.
 Properties of the Normal Density Curve (Normal Curve)
o It is symmetric about the mean μ.
o The highest point occurs at X = μ, because symmetry implies that the
mean equals the median, which equals the mode of the distribution.
o It has inflection points at μ – σ and μ + σ.
o The total area under the curve equals 1.
o Symmetry also implies that the area under the curve to the left of μ
and the area under the curve to the right of μ are both equal to 0.5.
o The normal distribution is defined for values of X extending
indefinitely in both the positive and negative directions. As X moves
farther from the mean, the density curve approaches but never quite
touches the horizontal axis.
7
Example 6.6
Q1. Many educators are concerned about grade inflation. One study shows that one low SAT-score
high school (with mean combined SAT score = 750) had higher mean grade point average (mean GPA
= 3.6) than a high-SAT-score school (with mean combined SAT score = 1050 and mean GPA = 2.6).
Define the following random variable:
X = GPA at the high-SAT-score school
Assume that X is normally distributed with mean μ = 2.6 and standard deviation σ = 0.46.
a. What is the probability that a randomly chosen GPA at the high-SAT-score school will be between
3.06 and 3.52?
The area under the curve between 3.06 and 3.52 represents the area between μ + σ and μ + 2σ.
Courtesy of the Empirical Rule, the area between μ + σ and μ + 2σ is about 13.5% of the area
under the curve. Therefore, the probability that a randomly chosen GPA at the high-SAT-score
school will be between 3.06 and 3.52 is about 0.135
b. Find the probability that a randomly chosen GPA at the high-SAT-score school will be greater than
3.52.
To find the area to the right of X = 3.52, we need to subtract the 34% and 13.5 from 50%:
50% – 34% – 13.5% = 2.5%
Therefore, the probability that a randomly chosen GPA at the high-SAT-Score school will be
greater than 3.52 is about 0.025.
 The Standard Normal (Z) Distribution
o The standard normal distribution is a normal distribution with
 Mean μ = 0 and
 Standard deviation σ = 1.
8
Case 1
Find the area to the left of Z1
Case2
Case 3
Find the area to the right of Z1
Find the area to the between of Z1 and Z2
Step 1
Draw the standard normal curve.
Label the Z-value Z1
Step 1
Draw the standard normal curve.
Label the Z-value Z1
Step 1
Draw the standard normal curve. Label
the Z-value Z1 and Z2
Step 2
Shade in the area to the left of Z1
Step 2
Shade in the area to the right of Z1
Step 2
Shade in the area between Z1 and Z2
Step 3
Use the Z table to find the area to
the left of Z1
Step 3
Use the Z table to find the area to
the left of Z1. The area to the right
of Z1 is then equal to
1 — (area to the left of Z1)
Step 3
Use the Z table to find the area to the left
of Z1 and the area to the left of Z2. The
area between Z1 and Z2 is then equal to
(area to the left of Z2) –
(area to the left of Z1)
 Standardizing a Normal Random Variable
o Any normal random variable X can be transformed into the standard
normal random variable Z by standardizing using the formula
9
Example 6.7
Q1. The state of Georgia reports that the average temperature statewide for the month of April from
1949 to 2006 was μ = 61.5oF. Assume that the standard deviation is σ = 8 oF and that temperature in
Georgia in April is normally distributed. Draw the normal curve for temperatures between 45.5 oF and
77.5 oF, and the corresponding Z curve. Find the probability that the temperature is between 45.5 oF
and 77.5 oF in April in Georgia.
A1. Here we have a = 45.5 and b = 77.5, giving us
Za =
𝒂 𝝁
𝟒𝟓.𝟓 𝟔𝟏.𝟓
𝝈
𝟖
𝟐 and Zb =
𝒃 𝝁
𝟕𝟕.𝟓 𝟔𝟏.𝟓
𝝈
𝟖
𝟐
The area between 45.5 oF and 77.5 oF is the same as between Z = -2 and Z = 2.
45.5
77.5
X = Temp.
-2
2
P(45.5 < X < 77.5) = P(-2 < Z < 2) = 0.9772 – 0.0228 = 0.9544.
The probability that temperature is between 45.5 oF and 77.5 oF in April in Georgia is 0.9544.
Example 6.8
Q1. Edmunds.com reported that the average amount that people were paying for a 2007 Toyota
Camry XLE was $23,400. Let X = price, and assume that price follows a normal distribution with μ =
$23,400 and σ = $1000. Find the prices that separate the middle 95% of 2007 Toyota Camry XLE
prices from the bottom 2.5% and the top 2.5%.
A1.
X1 = Z1 σ + μ = (-1.96)(1000) + 23,400 = 21,440
Area = 0.95
Area = 0.025
Area = 0.025
X2 = Z2 σ + μ = (1.96)(1000) + 23,400 = 25,360
X1
$23,400
X2
The prices that separate the middle 95% of 2007 Toyota Camry XLE prices from the bottom
2.5% of prices and the top 2.5% of prices are $21,440 and $25,360.