Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CHAPTER 16
Random Variables and
Probability Distributions
Streamline Treatment of
Probability
Sample spaces and events are good
starting points for probability
Sample spaces and events become quite
cumbersome when applied to real-life
business-related processes
Random variables allow us to apply
probability, risk and uncertainty to
meaningful business-related situations
Bring Together Numerical Summaries
of Data and Probability
In previous chapters we saw that data
could be graphically and numerically
summarized in terms of midpoints,
spreads, outliers, etc.
In basic probability we saw how
probabilities could be assigned to
outcomes of an experiment. Now we bring
them together
First: Two Quick Examples
1. Hardee’s vs. The Colonel
Hardee’s vs The Colonel
Out of 100 taste-testers, 63 preferred
Hardee’s fried chicken, 37 preferred KFC
Evidence that Hardee’s is better? A
landslide?
What if there is no difference in the
chicken? (p=1/2, flip a fair coin)
Is 63 heads out of 100 tosses that
unusual?
Example 2.
Mothers Identify Newborns
Mothers Identify Newborns
After spending 1 hour with their newborns,
blindfolded and nose-covered mothers were
asked to choose their child from 3 sleeping
babies by feeling the backs of the babies’ hands
22 of 32 women (69%) selected their own
newborn
“far better than 33% one would expect…”
Is it possible the mothers are guessing?
Can we quantify “far better”?
Graphically and
Numerically Summarize a
Random Experiment
Principal vehicle by which we do this:
random variables
A random variable assigns a number to
each outcome of an experiment
Random Variables
Definition:
A random variable is a numerical-valued
function defined on the outcomes of an
experiment
S
Random variable
Number line
Examples
S = {HH, TH, HT, TT}
the random variable:
x = # of heads in 2 tosses of a coin
Possible values of x = 0, 1, 2
Two Types of Random
Variables
Discrete: random variables that have a
finite or countably infinite number of
possible values
Test: for any given value of the random
variable, you can designate the next
largest or next smallest value of the
random variable
Examples: Discrete rv’s
Number of girls in a 5 child family
Number of customers that use an ATM in
a 1-hour period.
Number of tosses of a fair coin that is
required until you get 3 heads in a row
(note that this discrete random variable
has a countably infinite number of
possible values: x=3, 4, 5, 6, 7, . . .)
Two types (cont.)
Continuous: a random variable that can
take on all possible values in an interval of
numbers
Test: given a particular value of the
random variable, you cannot designate
the next largest or next smallest value
Which is it, Discrete or
Continuous?
Discrete random variables “count”
Continuous random variables “measure”
(length, width, height, area, volume,
distance, time, etc.)
Examples: continuous rv’s
The time it takes to run the 100 yard dash
(measure)
The time between arrivals at an ATM
machine (measure)
Time spent waiting in line at the “express”
checkout at the grocery store (the
probability is 1 that the person in front of
you is buying a loaf of bread with a third
party check drawn on a Hungarian bank)
(measure)
Examples: cont. rv’s (cont.)
The length of a precision-engineered
magnesium rod (measure)
The area of a silicon wafer for a computer
chip coming off a production line
(measure)
Classify as discrete or
continuous
a x=the number of customers who enter a
particular bank during the noon hour on a
particular day
a discrete x={0, 1, 2, 3, …}
b x=time (in seconds) required for a teller to
serve a bank customer
b continuous x>0
Classify (cont.)
c x=the distance (in miles) between a
randomly selected home in a community
and the nearest pharmacy
c continuous x>0
d x=the diameter of precision-engineered “5
inch diameter” ball bearings coming off an
assembly line
d continuous; range could be {4.5<x<5.5}
Classify (cont.)
e x=the number of tosses of a fair coin
required to observe at least 3 heads in
succession
e discrete x=3, 4, 5, ...
Data Variables and Data Distributions
CUSIP
60855410
40262810
81180410
46489010
69318010
26157010
90249410
4886910
87183910
62475210
36473510
00755P10
23935910
68555910
16278010
51460610
4523710
74555310
80819410
19770920
23790310
11457710
00431L10
29605610
23303110
64124610
59492810
22821010
190710
46978310
531320
49766010
30205210
46065P10
19247910
IND
4
5
4
9
9
7
4
5
9
4
7
9
2
4
4
4
4
4
4
9
4
4
9
4
4
4
6
7
4
6
4
4
4
5
4
CONAME
MOLEX INC
GULFMARK INTL INC
SEAGATE TECHNOLOGY
ISOMEDIX INC
PCA INTERNATIONAL INC
DRESS BARN INC
TYSON FOODS INC
ATLANTIC SOUTHEAST AIRLINES
SYSTEM SOFTWARE ASSOC INC
MUELLER (PAUL) CO
GANTOS INC
ADVANTAGE HEALTH CORP
DAWSON GEOPHYSICAL CO
ORBIT INTERNATIONAL CP
CHECK TECHNOLOGY CORP
LANCE INC
ASPECT TELECOMMUNICATIONS
PULASKI FURNITURE CORP
SCHULMAN (A.) INC
COLUMBIA HOSPITAL CORP
DATA MEASUREMENT CORP
BROOKTREE CORP
ACCESS HEALTH MARKETING INC
ESCALADE INC
DBA SYSTEMS INC
NEUTROGENA CORP
MICROAGE INC
CROWN BOOKS CORP
AST RESEARCH INC
JACO ELECTRONICS INC
ADAC LABORATORIES
KIRSCHNER MEDICAL CORP
EXIDE ELECTRS GROUP INC
INTERPROVINCIAL PIPE LN
COHERENT INC
PE
24.7
21.4
21.3
25.2
21.4
24.5
20.9
20.1
23.7
14.5
15.7
23.3
14.9
15.0
17.1
19.0
25.7
22.0
19.4
18.3
11.3
13.8
22.4
10.8
6.3
27.2
9.0
24.4
9.7
31.9
18.5
33.0
29.0
11.9
40.2
NPM
8.7
8.1
2.2
21.1
4.7
4.5
3.9
15.7
11.6
3.9
1.8
5.3
9.3
3.0
3.2
8.5
8.2
2.1
6.0
3.1
2.6
13.6
11.0
2.0
5.0
9.0
0.5
1.8
7.3
0.4
10.6
0.8
2.4
19.2
1.2
CUSIP IND CONAME
60855410 4 MOLEX INC
40262810 5 GULFMARK INTL INC
81180410 4 SEAGATE TECHNOLOGY
46489010 9 ISOMEDIX INC
69318010 9 PCA INTERNATIONAL INC
26157010 7 DRESS BARN INC
PE NPM
24.7 8.7
21.4 8.1
21.3 2.2
25.2 21.1
21.4 4.7
24.5 4.5
Data variables are
known outcomes.
Data Variables and Data Distributons
CUSIP
60855410
40262810
81180410
46489010
69318010
26157010
90249410
4886910
87183910
62475210
36473510
00755P10
23935910
68555910
16278010
51460610
4523710
74555310
80819410
19770920
23790310
11457710
00431L10
29605610
23303110
Class
64124610
(bin)
59492810
22821010
1
190710
46978310
2
531320
49766010
3
30205210
46065P10
4
19247910
IND
CONAME
4
MOLEX INC
5
GULFMARK INTL INC
4
SEAGATE TECHNOLOGY
9
ISOMEDIX INC
9
PCA INTERNATIONAL INC
7
DRESS BARN INC
4
TYSON FOODS INC
5
ATLANTIC SOUTHEAST AIRLINES
9
SYSTEM SOFTWARE ASSOC INC
4
MUELLER (PAUL) CO
7
GANTOS INC
9
ADVANTAGE HEALTH CORP
2
DAWSON GEOPHYSICAL CO
4
ORBIT INTERNATIONAL CP
4
CHECK TECHNOLOGY CORP
4
LANCE INC
4
ASPECT TELECOMMUNICATIONS
4
PULASKI FURNITURE CORP
4
SCHULMAN (A.) INC
9
COLUMBIA HOSPITAL CORP
4
DATA MEASUREMENT CORP
4
BROOKTREE CORP
9
ACCESS HEALTH MARKETING INC
4
ESCALADE INC
4
DBA SYSTEMS INC
Class
4
NEUTROGENA
TallyCORPFrequency
Boundary
6
MICROAGE INC
76.00-12.99
CROWN BOOKS
|||| | CORP 6
4
AST RESEARCH INC
6
JACO ELECTRONICS
INC 10
13.00-19.99
|||| ||||
4
ADAC LABORATORIES
4
KIRSCHNER
CORP
20.00-26.99
|||| ||||MEDICAL
||||
14
4
EXIDE ELECTRS GROUP INC
5
INTERPROVINCIAL
PIPE LN4
27.00-33.99
||||
4
COHERENT INC
PE
NPM
24.7
8.7
21.4
8.1
21.3
2.2
25.2
21.1
21.4
4.7
24.5
4.5
20.9
3.9
20.1
15.7
23.7
11.6
14.5
3.9
15.7
1.8
23.3
5.3
14.9
9.3
15.0
3.0
17.1
3.2
19.0
8.5
25.7
8.2
22.0
2.1
19.4
6.0
18.3
3.1
11.3
2.6
13.8
13.6
22.4
11.0
10.8
2.0
6.3
5.0
Relative
27.2
9.0
Frequency
9.0
0.5
24.4= 0.1711.8
6/35
9.7
7.3
31.9= 0.2860.4
10/35
18.5
10.6
33.0
14/35 = 0.4000.8
29.0
2.4
11.9= 0.114
19.2
4/35
40.2
1.2
CUSIP IND CONAME
60855410 4 MOLEX INC
40262810 5 GULFMARK INTL INC
81180410 4 SEAGATE TECHNOLOGY
46489010 9 ISOMEDIX INC
69318010 9 PCA INTERNATIONAL INC
26157010 7 DRESS BARN INC
5
DATA DISTRIBUTION
Price-Earnings Ratios
34.00-40.99
|
1
1/35 = 0.029
PE NPM
24.7 8.7
21.4 8.1
21.3 2.2
25.2 21.1
21.4 4.7
24.5 4.5
Data variables are
known outcomes.
Data distributions
tell us what happened.
Handout 2.1, P. 10
Random Variables and
Probability Distributions
Random variables are
unknown chance
outcomes.
Probability distributions
tell us what is likely
to happen.
Data variables are
known outcomes.
Data distributions
tell us what happened.
Profit Scenarios
Economic
Scenario
Profit
($ Millions)
Great
10
Good
5
Random variables are
unknown chance
outcomes.
Probability distributions
tell us what is likely
to happen.
Handout 4.1, P. 3
Profit Scenarios
Economic
Scenario
Profit
($ Millions)
Great
10
Good
5
OK
1
Lousy
-4
Probability
Economic
Scenario
Profit
($ Millions)
Probability
Great
10
0.20
Good
5
0.40
OK
1
0.25
Lousy
-4
0.15
The proportion of the time an outcome is
expected to happen.
Probability Distribution
Economic
Scenario
Profit
($ Millions)
Probability
Great
10
0.20
Good
5
0.40
OK
1
0.25
Lousy
-4
0.15
Shows all possible values of a random
variable and the probability associated
with each outcome.
Notation
Economic
Scenario
Profit X
($ Millions)
Probability
Great
x1 10
0.20
Good
x2 5
0.40
OK
x3 1
0.25
Lousy
x4 -4
0.15
X = the random variable (profits)
xi = outcome i
x1 = 10
x2 = 5
x3 = 1
x4 = -4
Notation
Economic
Scenario
Profit X
($ Millions)
Probability
Great
x1 10
Pr(X=x1) 0.20
Good
x2 5
Pr(X=x2) 0.40
OK
x3 1
Pr(X=x3) 0.25
Lousy
Pr(X=x4) 0.15
x4 -4
P is the probability
p(xi)= Pr(X = xi) is the probability of X being
outcome xi
p(x1) = Pr(X = 10) = .20
p(x2) = Pr(X = 5) = .40
p(x3) = Pr(X = 1) = .25
p(x4) = Pr(X = -4) = .15
What are the
chances?
Economic
Scenario
Profit X
($ Millions)
Probability
Great
x1 10
0.20
Good
x2 5
0.40
OK
x3 1
0.25
Lousy
x4 -4
0.15
What are the chances that profits will be
less than $5 million in 2009?
P(X < 5)
= P(X = 1 or X = -4)
= P(X = 1) + P(X = -4)
= .25 + .15 = .40
What are the
chances?
Economic
Scenario
Profit X
($ Millions)
Probability P
Great
x1 10
p(x1)
0.20
Good
x2 5
p(x2)
0.40
OK
x3 1
p(x3)
0.25
Lousy
x4 -4
p(x4)
0.15
P(X < 5) = .40
What are the chances that profits will be
less than $5 million in 2009 and less
than $5 million in 2010?
P(X < 5 in 2009 and X < 5 in 2010)
= P(X < 5)·P(X < 5) = .40·.40 = .16
Economic
Scenario
Probability
Histogram
Profit X
($ Millions)
Great
x1 10
p(x1)
0.20
Good
x2 5
p(x2)
0.40
OK
x3 1
p(x3)
0.25
Lousy
x4 -4
p(x4)
0.15
Probability
.40
.35
.30
.25
.20
.15
.10
.05
-4
-2
0
2
Probability
4
Profit
6
8
10
12
Economic
Scenario
Probability
Histogram
Probability
.40
.35
.30
.25
.20
.15
Lousy
Profit X
($ Millions)
Great
x1 10
p(x1)
0.20
Good
x2 5
p(x2)
0.40
OK
x3 1
p(x3)
0.25
Lousy
x4 -4
p(x4)
0.15
Good
OK
Great
.10
.05
-4
-2
0
2
Probability P
4
Profit
6
8
10
12
Probability distributions:
requirements
Notation: p(x)= Pr(X = x) is the probability that
the random variable X has value x
Requirements
1. 0  p(x)  1 for all values x of X
2. all x p(x) = 1
Example
x
0
1
2
p(x)
.20
.90
-.10
property 1) violated:
p(2) = -.10
x
-2
-1
1
2
p(x)
.3
.3
.3
.3
property 2) violated:
p(x) = 1.2
Example (cont.)
x
p(x)
-1
.25
0
.65
1
.10
OK 1) satisfied: 0  p(x)  1 for all x
2) satisfied: all x p(x) = .25+.65+.10 = 1
Example
20% of light bulbs last at least 800 hrs; you
have just purchased 2 light bulbs.
X=number of the 2 bulbs that last at least
800 hrs (possible values of x: 0, 1, 2)
Find the probability distribution of X
S: bulb lasts at least 800 hrs
F: bulb fails to last 800 hrs
P(S) = .2; P(F) = .8
Example (cont.)
Possible outcomes
(S,S)
(S,F)
(F,S)
(F,F)
probability
distribution of x:
P(outcome)
(.2)(.2)=.04
(.2)(.8)=.16
(.8)(.2)=.16
(.8)(.8)=.64
x
0
1
p(x) .64 .32
x
2
1
1
0
2
.04
Example
Outcomes P(outcome)
MMM
(1/2)3=1/8
3 child family;
MMF
1/8
X=#of boys
MFM
1/8
M: child is male
P(M)=1/2
FMM
1/8
(0.5121; from .5134)MFF
1/8
F: child is female
FMF
1/8
P(F)=1/2
FFM
1/8
(0.4879)
FFF
1/8
x
3
2
2
2
1
1
1
0
Probability Distribution of
x
x
0
1
2
3
p(x)
1/8 3/8 3/8 1/8
Probability of at least 1 boy:
P(x  1)= 3/8 + 3/8 +1/8 = 7/8
Probability of no boys or 1 boy:
p(0) + p(1)= 1/8 + 3/8 = 4/8 = 1/2
Two More Examples
1. X = # of games played in a randomly
selected World Series
Possible values of X are x=4, 5, 6, 7
2. Y=score on 13th hole (par 5) at Augusta
National golf course for a randomly
selected golfer on day 1 of 2011 Masters
y=3, 4, 5, 6, 7
Probability Distribution Of Number of
Games Played in Randomly Selected
World Series
Estimate based on results from 1946 to
2010.
x
4
5
6
7
p(x)
12/65=0.185
12/65=0.185
14/65=0.215
27/65=0.415
Probability
Histogram
Number of Games in Randomly
Selected World Series
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.415
0.185
0.185
4
5
0.215
6
7
Probability Distribution Of Score on
13th hole (par 5) at Augusta
National Golf Course on Day 1 of
2011 Masters
y
3
4
5
6
7
p(x)
0.040
0.414
0.465
0.051
0.030
Score on 13th Hole
0.5
Probability
Histogram
0.465
0.45
0.414
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0.051
0.04
0.03
0
3
4
5
6
7
Related documents