Download Chap. 4: Discrete Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
IV. Random Variables
PBAF 527
Winter 2005
1
Learning Objectives
1.
2.
Distinguish Between the Two Types of Random
Variables
Discrete Random Variables
1.
2.
3.
Continuous Random Variables
1.
2.
3.
4.
2
Describe Discrete Random Variables
Compute the Expected Value & Variance of Discrete
Random Variables
Describe Normal Random Variables
Introduce the Normal Distribution
Calculate Probabilities for Continuous Random Variables
Assessing Normality
Random Variables
•
3
A variable defined by the probabilities
of each possible value in the
population.
Data Types
Data
Numerical
Discrete
4
Continuous
Qualitative
Types of
Random Variables
Discrete Random Variable


Whole Number (0, 1, 2, 3 etc.)
Countable, Finite Number of Values

Jump from one value to the next and cannot take any
values in between.
Continuous Random Variables



Whole or Fractional Number
Obtained by Measuring
Infinite Number of Values in Interval

5
Too Many to List Like Discrete Variable
Discrete Random
Variable Examples
Experiment
Children of One
Gender in Family
Open Check in Lines
6
Random
Variable
# Girls
# Open
Possible
Values
0, 1, 2, ..., 10?
0, 1, 2, ..., 8
Answer 33 Questions # Correct
0, 1, 2, ..., 33
Count Cars at Toll
# Cars
Between 11:00 & 1:00 Arriving
0, 1, 2, ..., 
Discrete
Probability Distribution
1. List of All possible [x, p(x)] pairs


x = Value of Random Variable (Outcome)
p(x) = Probability Associated with Value
2. Mutually Exclusive (No Overlap)
3. Collectively Exhaustive (Nothing Left Out)
4. 0  p(x)  1
5.  p(x) = 1
7
Marilyn says: It may sound strange, but more
families of 4 children have 3 of one gender and one of
the other than any other combination. Explain this.
Construct a sample space and look at the total number of
ways each event can occur out of the total number of
combinations that can occur, and calculate frequencies.
Sample Space
• Are all 16 combinations equally likely? Is the sex
of each child independent of the other three?
P (girl) = 1/2
P (boy) = 1/2
so, P (BBBB) = ½ x ½ x ½ x ½ = 1/16
• If you have a family of four, what is the probability of…
P(all girls or all boys) =
P (2 boys, 2 girls)= 6/16 = 3/8 six different ways to have 2 boys and 2 girls
P(3 boys, 1 girl or 3 girls, 2 boy)=
2/16 = 1/8
8
8/16=4/8=1/2 8 ways to have 3 of 1 and 2 of
the other.
BBBB
GBBB
BGBB
BBGB
BBBG
GGBB
GBGB
GBBG
BGGB
BGBG
BBGG
BGGG
GBGG
GGBG
GGGB
GGGG
Assume the random variable X represents the number
of girls in a family of 4 kids. (lower case x is a
particular value of X, ie: x=3 girls in the family)
9
Sample Space
Random Variable X
BBBB
x=0
GBBB
x=1
BGBB
Number of
Girls, x
Probability,
P(x)
x=1
0
1/16
BBGB
x=1
1
4/16
BBBG
x=1
6/16
GGBB
x=2
2
GBGB
x=2
3
4/16
GBBG
x=2
4
1/16
BGGB
x=2
Total
16/16=1.00
BGBG
x=2
BBGG
x=2
BGGG
x=3
GBGG
x=3
GGBG
x=3
GGGB
x=3
GGGG
x=4
What is the probability of exactly 3 girls in 4 kids?
P(X=3) = 4/16
What is the probability of at least 3 girls in 4 kids?
P(X≥3) = 5/16
Visualizing Discrete
Probability Distributions
Listing
Table
{(0,1/16), (1,.25), (2,3/8),(3,.25),(4,1/16) }
Graph
Probability, P(x)
6/16
0.40
Number of Girls, x
Probability, P(x)
0
1/16
1
4/16
2
6/16
3
4/16
4
1/16
Total
16/16=1.00
0.35
0.30
4/16
4/16
P(x)
0.25
0.20
0.15
0.10
1/16
1/16
0.05
0.00
0
10
1
2
Number of Girls, x
3
4
X is random and x is fixed. We
can calculate the probability
that different values of X will
occur and make a probability
distribution.
Probability Distributions
Probability, P(x)
6/16
0.40
0.35
0.30
4/16
4/16
P(x)
0.25
0.20
0.15
0.10
1/16
1/16
0.05
0.00
0
1
2
3
4
Number of Girls, x
11
Probability distributions can be written as probability histograms.
Cumulative probabilities: Adding up probabilities of a range of
values.
Washington State Population
Survey and Random Variables
A telephone survey of
number of telephones,x
households throughout
0
1
Washington State.
2
But some households don’t have
3
Probability Histogram of Telephone Coverage in
phones.
Washington
4
0.71
0.70553
0.21769
0.02966
0.00775
0.00332
6
0.00088
0.50
7
0.00002
0.40
8
0.00000
9
0.00015
Total
1.00000
0.60
P(x)
0.03500
5
0.70
0.30
0.22
0.20
0.10
0.04
0.03 0.01 0.00
0.00
0
12
P(x)
1
2
3
4
5
6
7
Number of Telephone Lines (x)
8
9
Probabilities about Telephone in
Washington State
•
•
•
•
•
•
13
What is the probability that a household will have no
telephone?
What is the probability that a household will have 2 or
more telephone lines?
What is the probability that a household will have 2 to 4
phone lines?
What is the probability a household will have no phone
lines or more than 4 phone lines?
Who do you think is in that 3.5% of the population?
What are the implications of this for the quality of the
survey?
Probability Histogram of
Probability Histogram of Telephone
Coverage in 1998
Telephone
Lines,
Washington
0.71
0.70
0.60
P(x)
0.50
0.40
0.30
0.22
0.20
0.10
0.04
0.03 0.01 0.00
0.00
0
1
2
3
4
5
6
7
Number of Telephone Lines (x)
14
8
9
Summary Measures
1. Expected Value



mu
Mean of Probability Distribution
Weighted Average of All Possible Values
 = E(X) = x p(x)
2. Variance

Sigma -squared


Weighted Average Squared Deviation about
Mean
2 = V(X)= E[ (x (x  p(x)
2 = V(X)=E(X [E(X
3. Standard Deviation
15

 2 = SD(X)
What is the average number of telephones in
Washington Households and how much does size vary
from the average?
Approach 1: Variance
# of
Phones
x
P(x)
xP(x)
(x-)
(x- )2
(x-)2P(x)
x2
x2P(x)
0
198,286
0.04
0.00
-1.3
1.65
0.06
0
0.00
1
4,142,030
0.71
0.71
-0.3
0.08
0.06
1
0.71
2
1,278,026
0.22
0.44
0.7
0.51
0.11
4
0.87
3
174,110
0.03
0.09
1.7
2.94
0.09
9
0.27
4
45,499
0.01
0.03
2.7
7.38
0.06
16
0.12
5
19,473
0.00
0.02
3.7
13.81
0.05
25
0.08
6
5,170
0.00
0.01
4.7
22.24
0.02
36
0.03
7
118
0.00
0.00
5.7
32.67
0.00
49
0.00
8
-
0.00
0.00
6.7
45.10
0.00
64
0.00
9
897
0.00
0.00
7.7
59.53
0.01
81
0.01
5,863,609
1.00
=1.28
32.16
Sum
16
Frequency
Approach 2:
Variance
2=0.45
2.10
Cherbyshev’s Rule and Empirical Rule
for a Discrete Random Variable
Let x be a discrete random variable with a probability
distribution p(x), mean , and standard deviation . Then,
depending on the shape of p(x), the following probability
statements can be made:
Chebyshev’s Rule
Applies to any probability
distribution (eg: telephones
in Washington State)
Empirical Rule
Applies to probability distributions
that are mound-shaped and
symmetric (eg: girls born of 4
children)
0
.68
P( - 2 < x <  + 2)
3/4
.95
P( - 3 < x <  + 3)
8/9
1.00
P( -  < x <  + )
17
Data Types
Data
Numerical
Discrete
18
Continuous
Qualitative
Continuous Random
Variable
•
19
A variable with many possible values at
all intervals
Continuous Random
Variable Examples
Experiment
20
Random
Variable
Possible
Values
Weigh 100 People
Weight
45.1, 78, ...
Measure Part Life
Hours
900, 875.9, ...
Ask Food Spending
Spending
54.12, 42, ...
Measure Time
Between Arrivals
Inter-Arrival 0, 1.3, 2.78, ...
Time
Continuous Probability
Density Function
Frequency
1. Mathematical Formula
2. Shows All Values, x, &
Frequencies, f(x)

f(X) Is Not Probability
(Value, Frequency)
f(x)
3. Properties
21

Area under curve sums to 1

Can add up areas of function
to get probability less than a
specific value
a
b
Value
x
Continuous Random
Variable Probability
Probability Is Area
Under Curve!
P (c  x  d)
f(x)
c
22
© 1984-1994 T/Maker Co.
d
X
Continuous Probability
Distribution Models
Continuous
Probability
Distribution
Uniform
23
Normal
Exponential
Importance of
Normal Distribution
1. Describes Many Random Processes or
Continuous Phenomena
2. Can Be Used to Approximate Discrete
Probability Distributions

Example: Binomial
3. Basis for Classical Statistical Inference
24
Normal Distribution
1. ‘Bell-Shaped’ &
Symmetrical
f(X)
2. Mean, Median,
Mode Are Equal
X
3. ‘Middle Spread’
Is 1.33 
4. Random Variable
Has Infinite Range
25
Mean
Median
Mode
Normal Distribution
Useful Properties
• About half of “weight” below
mean (because
symmetrical)
• About 68% of probability
within 1 standard deviation
of mean (at change in
curve)
• About 95% of probability
within 2 standard deviations
• More than 99% of
probability within 3 standard
deviations
26
f(X)
 
  3   2   
Mean
Median
Mode
     2   3
X
Probability
Density Function
1
f ( x) 
e
 2
x


e

27
 1  x    2



2




= Value of Random Variable (- < x < )
= Population Standard Deviation
= 3.14159
= 2.71828
= Mean of Random Variable x
Don’t memorize this!
Notation
X is N(μ,σ)
The random variable X has a normal
distribution (N) with mean μ and standard
deviation σ.
X is N(40,1)
X is N(10,5)
X is N(50,3)
28
Effect of Varying
Parameters ( & )
f(X)
B
A
C
X
29
Normal Distribution
Probability
Probability is
area under
curve!
d
?
P(c  x  d )   f ( x) dx
c
f(x)
c
30
d
x
?
Infinite Number
of Tables
Normal distributions differ by
mean & standard deviation.
Each distribution would
require its own table.
f(X)
X
That’s an infinite number!
31
Standardize the
Normal Distribution
X 
Z

Normal
Distribution
Z is N(0,1)
Standardized
Normal Distribution

= 1

32
X
=0
One table!
Z
Standardizing Example
X   6.2  5
Z

 .12

10
Normal
Distribution
 = 10
= 5 6.2 X
33
Standardized
Normal Distribution
=1
= 0 .12
Z
Obtaining
the Probability
Standardized Normal
Probability Table (Portion)
Z
.00
.01
=1
.02
0.0 .0000 .0040 .0080
.0478
0.1 .0398 .0438 .0478
0.2 .0793 .0832 .0871
= 0 .12
0.3 .1179 .1217 .1255
34
Probabilities
Z
Shaded area
exaggerated
Example
P(3.8  X  5)
X   3.8  5
Z

  .12

10
Normal
Distribution
Standardized
Normal Distribution
 = 10
=1
.0478
3.8 = 5
35
X
-.12  = 0
Shaded area exaggerated
Z
Example
P(2.9  X  7.1)
Normal
Distribution
X   2.9  5
Z

  .21

10
X   7.1  5
Z

 .21
Standardized

10
Normal Distribution
 = 10
=1
.1664
.0832 .0832
2.9 5 7.1 X
36
-.21 0 .21
Shaded area exaggerated
Z
Example
P(X  8)
X  85
Z

 .30

10
Normal
Distribution
Standardized
Normal Distribution
 = 10
=1
.5000
.3821
.1179
=5
37
8
X
=0
Shaded area exaggerated
.30 Z
Example
P(7.1  X  8)
Normal
Distribution
X   7.1  5
Z

 .21

10
X  85
Z

 .30

10
Standardized
Normal Distribution
 = 10
=1
.1179
.0347
.0832
 = 5 7.1 8 X
38
 = 0 .21 .30 Z
Shaded area exaggerated
Travel Time and
the Normal Distribution
To help people plan their travel, WSDOT estimates
that average trip from Seattle to Bellevue at 5:40 pm
(at peak) takes 11 minutes and with a standard
deviation of 10. They also believe this travel time
approximates a normal distribution.
What proportion of trips take less than 27 minutes?
39
Process
1. Draw a picture and write down the
probability you need.
2. Convert probability to standard scores.
3. Find cumulative probability in the table.
40
More Travel Time
Suppose we have only 10-15 minutes to travel to Seattle
from Bellevue. What proportion of trips will make it in
that time?
 10  11 
 15  11 
P10  X  15  P
  Z  P

10
10




 P 0.1  Z  .4
 1  PZ  0.1  P(Z  .4)
Since normal curves
are symmetrical:
41
 1  PZ  .1  P( Z  .4)
 1  (.5  .0398)  (.5  .1554)
 1  (.4602)  (.3446)
 .1952
19.5% of trips will make it in between 10 and 15 minutes.
Finding Z Values
for Known Probabilities
Standardized Normal
Probability Table (Portion)
What is Z given
P(Z) = .1217?
.1217
=1
Z
.00
.01
0.2
0.0 .0000 .0040 .0080
0.1 .0398 .0438 .0478
 = 0 .31
Shaded area
exaggerated
42
Z
0.2 .0793 .0832 .0871
0.3 .1179 .1217 .1255
Finding X Values
for Known Probabilities
Normal Distribution
Standardized Normal Distribution
 = 10
=1
.1217
= 5
?
X
.1217
 = 0 .31
X    Z    5  .3110  8.1
43
Shaded areas exaggerated
Z
Travel Times Take 3
How much time will the trip take 99% of
the time?
44
Finding Z Values
for Known Probabilities
1. Write down probability statement and draw a
picture
P(Z<____)=.99
2. Look up Z value in table
2.325
P(Z<_____)=.99
3. Convert Z value (SD units) to variable (X) by
using mean and SD.
34.25
2.325
X=μ+Zσ so X=11+(_____)(10)=
So, the trip can be made 99% of the time in 34.25 minutes.
45
Assessing Normality
46
1.
A histogram of the data is mound shaped and symmetrical
about the mean.
2.
Determine the percentage of measurements falling in each
of the intervals x s, x 2s, and x 3s. If the data are
approximately normal, the percentages will be
approximately equal to 68%, 95%, and 100% respectively.
3.
Find the interquartile range, IQR, and standard deviation, s,
for the sample, then calculate the ratio IQR/s. If the data
are approximately normal, then IQR/S  1.3.
4.
Construct a normal probability plot for the data. If the data
are approximately normal, the points will fall
(approximately) on a straight line.
Assessing Normality:
Is Class Height Normally Distributed?
How does the histogram
look?
SPSS can produce the line of
the normal curve for you. In
SPSS select GRAPH,
HISTOGRAM. After you
choose the variable you want,
click on the box “Display
Normal Curve” and you’ll get
something that looks like this.
6
5
Frequency
1.
7
4
3
2
1
Mean = 66.52
Std. Dev. = 3.117
N = 23
0
60
62
64
66
68
Height 527 2005
47
70
72
Assessing Normality:
Is Class Height Normally Distributed?
Anticipated Actual
Percent
Percent
2. Compute the intervals:
x±s
Height 527 2005
Valid
48
60
62
63
64
65
66
67
68
69
70
71
72
Total
Frequency
1
1
3
2
1
3
2
2
5
1
1
1
23
Percent Valid Percent
4.3
4.3
4.3
4.3
13.0
13.0
8.7
8.7
4.3
4.3
13.0
13.0
8.7
8.7
8.7
8.7
21.7
21.7
4.3
4.3
4.3
4.3
4.3
4.3
100.0
100.0
Cumulative
Percent
4.3
8.7
21.7
30.4
34.8
47.8
56.5
65.2
87.0
91.3
95.7
100.0
[63.40,69.64]
68%
43%
x±2s [60.29,72.75]
95%
96%
x±3s [57.17,75.87]
100%
100%
SPSS: ANALYZE,
DESCRIPTIVE STATISTICS,
FREQUENCIES
Assessing Normality:
Is Class Height Normally Distributed?
Statistics
3. Does IQR/s≈1.3?
IQR=69-64=5
IQR/s=5/3.117=1.6
Height 527 2005
N
Valid
Mis sing
Std. Deviation
Percentiles
25
50
75
23
0
3.117
64.00
67.00
69.00
SPSS: ANALYZE,
DESCRIPTIVE STATISTICS, FREQUENCIES
then click on STATISTICS and choose the
ones you want.
49
Assessing Normality:
Is Class Height Normally
Distributed?
Normal Q-Q Plot of Height 527 2005
4. What does the normal
probability plot look like?
74
Expected Normal Value
72
SPSS: Graphs>Q-Q Test distribution is normal
and click estimate distribution parameters from
data.
70
68
66
64
62
60
60
62
64
66
68
Observed Value
50
70
72
74
Learning Objectives
1.
2.
Distinguish Between the Two Types of Random
Variables
Discrete Random Variables
1.
2.
3.
Continuous Random Variables
1.
2.
3.
4.
51
Describe Discrete Random Variables
Compute the Expected Value & Variance of Discrete
Random Variables
Describe Normal Random Variables
Introduce the Normal Distribution
Calculate Probabilities for Continuous Random Variables
Assessing Normality
Related documents