Download 4.1-4.2 PowerPoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Information theory wikipedia , lookup

Generalized linear model wikipedia , lookup

Birthday problem wikipedia , lookup

Hardware random number generator wikipedia , lookup

Probability amplitude wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Sta220 - Statistics
Mr. Smith
Room 310
Class #11
Section 4.1-4.2
4.1- Two Types of Random Variables
A random variable is a variable that assumes
numerical values associated with the random
outcomes of an experiment, where one (and
only one) numerical value is assigned to each
sample point.
A random variable is a numerical quantity
whose value depends on chance.
Example 1
You are tossing a coin twice and will be on the number of heads. The
outcome is a number (0, 1,2) which depends on chance. The number
of heads is a random variable.
Example 2
You are tossing a coin twice and will bet on a specific outcome such as
“first a head then a tail” or HT. The outcome depends on chance, but is
not a number. This is NOT a random variable.
Example 3
You go to Las Vegas and begin to put quarters in a slot machine. Let X
be the number of quarters you play before you first win of any
amount. X is a number and depends on chance. X is a random variable.
Example 4.1
A panel of 10 experts for the Wine Spectator (a
national publication) is asked to taste a new
white wine and assign it a rating of 0, 1, 2, 3. A
score is then obtained by adding them together
the ratings of the 10 experts. How many values
can this random variable assume?
Solution
A sample point is a sequence of 10 numbers associated
with the rating of each expert.
Example {1, 0, 0, 1, 2, 0, 0, 3, 1, 0}
So the lowest score is 0 while the highest would be a 30.
So possible scores range from 0 to 30 (x = 0, 1, 2, …, 30).
The random variable denoted by the symbol x can
assume 31 values.
Note: Out sample point show here is x = 8
There are two different types of random
variables, discrete and continuous.
Random variables that can assume a countable
number of values are called discrete.
Random variables that can assume values
corresponding to any of the points contained in
an interval are called continuous.
Examples of Discrete Random
Variables
1. The number of seizures an epileptic patient
has in a given week.
2. The shoe size of a tennis player: x = …5, 5.5,
6, 6.5, 7, 7.5, ….
3. The change received for paying a bill: x =
$0.01, $0.02,…, $1, $1.01, $1.02, ….
4. The number of customers waiting to be
served in a restaurant at a particular time: x =
1, 2, 3, …
Examples of Continuous Random
Variables
1. The length of time (in seconds) between
arrivals at a hospital clinic: 0 ≤ 𝑥 < ∞
2. The length in time (in minutes) it takes a
student to complete a one-hour exam:
0 ≤ 𝑥 ≤ 60
3. The depth (in feet) at which a successful oildrilling venture first strikes oil: 0 ≤ 𝑥 ≤ 𝑐, where
c is the maximum depth obtainable.
Watch out for similar situations.
The number of checkout lanes open at a grocery
store is a discrete random variable, while the
amount of time spent standing in line is a
continuous random variable.
4.2 – Probability Distributions for
Discrete Random Variables
A complete description of a discrete random
variable requires that we specify all the values
the random variable can assume and the
probability associated with each value.
The probability distribution of a discrete random
variable is a graph, table, or formula that
specifies the probability associated with each
possible value that the random variable can
assume.
Requirements for Probability Distribution of a
Discrete Random Variable x
1.
𝑝(𝑥) ≥ 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑥
2.
∑p(𝑥) = 1
where the summation of p(x) is over all possible
values of x.
Example 4.4-1
Recall the experiment of tossing two coins, and
let x be the number of heads observed. Find the
probability associated with each value of the
random variable x, assuming that the two coins
are fair.
Solution
Sample space and sample points for this
experiment are reproduced in the following
figure.
Note that the random variable x can assume
values 0, 1, 2.
The probability of the sample points with each
value of x
P(x = 0) = p(0) = ¼
P(x = 1) = p(1) = ½
P(x = 2) = p(2) = ¼
This dual specification completely describes the
random variable and is referred to as the
probability distribution, denoted by the symbol
p(x).
Example 4.4-2
Lets look at the experiment of tossing three
coins, and let x be the number of heads
observed. Find the probability associated with
each value of the random variable x, assuming
that the three coins are fair.
Picture on the white board.
Note that the random variable x can assume values 0, 1, 2, 3.
The probability of the sample points with each value of x
1
𝑃 𝑥 = 0 = 𝑝 0 =
8
3
𝑃 𝑥 =1 = 𝑝 1 =
8
3
𝑃 𝑥 =2 = 𝑝 2 =
8
𝑃 𝑥 =3 = 𝑝 3 =
1
8
Since probability distributions are related to the
relative frequency distributions of chapter 2, it
should be not surprise that the mean and
standard deviation are useful descriptive
measures.
Measuring Central Tendency; Expected
Value
The mean, or expected value, of a discrete
random variable x is
𝜇 = 𝐸 𝑥 = ∑𝑥𝑝 𝑥
The expected value is the mean of the
probability distribution, or a measure of its
central tendency.
Example 4.7
Suppose you work for an insurance company
and you sell a $10,000 one-year term insurance
policy at an annual premium of $290. Actuary
tables show that the probability of death during
the next year for a person of you customer's
age, sex, health, etc., is 0.001. What is the
expected gain (amount of money made by the
company) for an policy of this type?
Solution
The experiment is to observe whether the
customer survives the upcoming year. There are
two sample points, Live and Die, are .999 and .001,
respectively.
If the customer lives, the company gains the $290
premium as profit. If the customer dies, the gain is
negative because the company must pay $10,000,
for a net ‘gain’ of $(290 - 10,000) = -$9,710.
The random variable you are interested in is the
gain x, which can assume the values shown in
the following table:
Gain x
Sample Point
Probability
$290
Lives
.999
-$9,710
Dies
.001
The expected gain is therefore
𝜇 = 𝐸(𝑥) = ∑𝑥𝑝(𝑥) =
(290)(.999) + (−9,710)(.001) = $280
In other words, if the company were to sell a very
large number of $10,000 one-year policies to
customer possessing the characteristics describe, it
would (on the average) net $280 per sale in the
next year.
NOTE
The E(x) need not equal a possible value of x.
This is, the expected value is $280, but x will
equal either -$9,710 or $290 each time the
experiment is performed. The expected value is
a measure of central tendency – and in this case
represents the average over a very large number
of one-year policies – but is not a possible value
of x.
The variance of a random variable x is
𝜎 2 = 𝐸[(𝑥 − 𝜇)2 ] = ∑ 𝑥 − 𝜇 2 𝑝 𝑥 = ∑𝑥 2 𝑝 𝑥 − 𝜇2
The standard deviation of a discrete random
variable is equal to the square root of the variance,
or 𝜎 = 𝜎 2
Procedure
Copyright © 2013 Pearson
Education, Inc.. All rights
reserved.
Example 4.8
Medical research has shown that a certain type
of chemotherapy is successful 70% of the time
when used to treat skin cancer. Suppose five
skin cancer patients are treated with this type of
chemotherapy, and let x equal the number of
successful cures out of the five. The probability
distribution for the number x of successful cures
out of five is given in the following table:
x
0
1
2
3
4
5
p(x)
.002
.029
.132
.309
.360
.168
a. Find 𝜇 = 𝐸(𝑥). Interpret the results.
b. Find 𝜎 = 𝐸 (𝑥 − 𝜇 2 ].Interpret the result.
c. Graph p(x). Locate 𝜇 and the interval 𝜇 ± 2𝜎 on
the graph. Use either Chebyshev’s rule or the
empirical rule to approximate the probability
that x falls into the interval. Compare your result
with the actual probability.
d. Would you expect to observe fewer than two
successful cures out of five?
Solution
a. Applying the formula for 𝜇, we obtain
𝜇 = 𝐸(𝑥) = ∑𝑥𝑝 𝑥
= 0(.002) + 1(.029) + 2(.132) + 3(.309) + 4(.36) + 5(.168)
=3.5
On average, the number of successful cures out of five
skin cancer patients treated with chemotherapy will
equal 3.5. Remember that this expected value has
meaning only when the experiment-treating five skin
cancer patients with chemotherapy- is repeated a large
number of times.
x
0
1
2
3
4
5
p(x)
.002
.029
.132
.309
.360
.168
𝜇 = 3.5
𝜎 2 = 𝐸[(𝑥 − 𝜇)2 ] = ∑ 𝑥 − 𝜇 2 𝑝 𝑥
b. Now we calculate the variance of x:
𝜎2 =
= 0 − 3.5 2 (0.002) + 1 − 3.5 2 (.029)
+ 2 – 3.5 2 (.132) + 3 − 3.5 2 (.309) + 4 − 3.5 2 (.36)
+ 5 − 3.5 2 (.168)
= 1.05
The standard deviation is
𝜎 = 1.05 = 1.02
This value measure the spread of the probability
of x, the number of successful cures out of five.
A more useful interpretation is obtained by
answering c and d.
C. Graph p(x).
The interval within two standard deviations (1.46, 5.54).
Note particularly that 𝜇 = 3.5 locates the center of the
probability distribution. Since this distribution is
theoretical relative frequency distribution that is
moderately mound shape, we expect (by Chevy’s rule)
75% and , more likely (by empirical rule, approximately
95%, of observed x values to fall between 1.46 and 5.54.
The actually probability that x falls in the interval includes
the sum of p(x) for the values x =2, x=3, x=4, and x= 5.
The probability is
p(2) + p(3) + p(4) + p(5)
= .969.
Therefore 96.9% of the probability distribution
lies within two standard deviations of the mean.
This percentage is consistent with both
Chebyshev’s rule and the empirical rule.
d. Fewer than two successful cures out of five
implies that x = 0 or x = 1. Both these values of x
lie outside the interval 𝜇 ± 2𝜎, and the
empirical rule tells us that such a result is
unlikely.
The exact probability,
𝑃(𝑥 ≤ 1) 𝑖𝑠 𝑝(0) + 𝑝(1) = .002 + .029
= .031.