Download IV. Random Variables Learning Objectives Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Student Lecture Notes
IV. Random Variables
PBAF 527
Winter 2005
1
Learning Objectives
1.
2.
Distinguish Between the Two Types of Random
Variables
Discrete Random Variables
1.
2.
3.
Continuous Random Variables
1.
2.
3.
4.
Describe Discrete Random Variables
Compute the Expected Value & Variance of Discrete
Random Variables
Describe Normal Random Variables
Introduce the Normal Distribution
Calculate Probabilities for Continuous Random Variables
Assessing Normality
2
Random Variables
•
3
A variable defined by the probabilities
of each possible value in the
population.
1
Student Lecture Notes
Data Types
Data
Data
Numerical
Numerical
Discrete
Discrete
Qualitative
Qualitative
Continuous
Continuous
4
Types of
Random Variables
Discrete Random Variable
n
n
Whole Number (0, 1, 2, 3 etc.)
Countable, Finite Number of Values
l
Jump from one value to the next and cannot take any
values in between.
Continuous Random Variables
n
n
n
Whole or Fractional Number
Obtained by Measuring
Infinite Number of Values in Interval
l
Too Many to List Like Discrete Variable
5
Discrete Random
Variable Examples
Experiment
Children of One
Gender in Family
Open Check in Lines
6
Random
Variable
# Girls
# Open
Possible
Values
0, 1, 2, ..., 10?
0, 1, 2, ..., 8
Answer 33 Questions # Correct
0, 1, 2, ..., 33
Count Cars at Toll
# Cars
Between 11:00 & 1:00 Arriving
0, 1, 2, ..., ∞
2
Student Lecture Notes
Discrete
Probability Distribution
1. List of All possible [x, p(x)] pairs
n
n
x = Value of Random Variable (Outcome)
p(x) = Probability Associated with Value
2. Mutually Exclusive (No Overlap)
3. Collectively Exhaustive (Nothing Left Out)
4. 0 ≤ p(x) ≤ 1
5. Σ p(x) = 1
7
Marilyn says: It may sound strange, but more
families of 4 children have 3 of one gender and one of
the other than any other combination. Explain this.
Construct a sample space and look at the total number of
ways each event can occur out of the total number of
combinations that can occur, and calculate frequencies.
Sample Space
BBBB
GBBB
BGBB
BBGB
• Are all 16 combinations equally likely? Is the sex
of each child independent of the other three?
BBBG
GGBB
P (girl) = 1/2
P (boy) = 1/2
so, P (BBBB) = ½ x ½ x ½ x ½ = 1/16
GBGB
GBBG
BGGB
BGBG
• If you have a family of four, what is the probability of…
8
BBGG
P(all girls or all boys) = 2/16 = 1/8
P (2 boys, 2 girls)= 6/16 = 3/8 six different ways to have 2 boys and 2 girls
P(3 boys, 1 girl or 3 girls, 2 boy)=
8/16=4/8=1/2 8 ways to have 3 of 1 and 2 of
the other.
BGGG
GBGG
GGBG
GGGB
GGGG
Assume the random variable X represents the number
of girls in a family of 4 kids. (lower case x is a
particular value of X, ie: x=3 girls in the family)
9
Sample Space
Random Variable X
BBBB
x=0
GBBB
x=1
BGBB
BBGB
Number of
Girls, x
Probability,
P(x)
x=1
0
1/16
x=1
1
4/16
2
6/16
4/16
BBBG
x=1
GGBB
x=2
GBGB
x=2
3
GBBG
x=2
4
1/16
BGGB
x=2
Total
16/16=1.00
BGBG
x=2
BBGG
x=2
BGGG
x=3
GBGG
x=3
GGBG
x=3
GGGB
x=3
GGGG
x=4
What is the probability of exactly 3 girls in 4 kids?
P(X=3) = 4/16
What is the probability of at least 3 girls in 4 kids?
P(X=3) = 5/16
3
Student Lecture Notes
Visualizing Discrete
Probability Distributions
Listing
Table
Number of Girls, x
Probability, P(x)
0
1/16
1
4/16
2
6/16
3
4/16
{(0,1/16), (1,.25), (2,3/8),(3,.25),(4,1/16) }
Graph
Probability, P(x)
6/16
0.40
4
1/16
Total
16/16=1.00
0.35
P(x)
0.30
4/16
4/16
0.25
X is random and x is fixed. We
can calculate the probability
that different values of X will
occur and make a probability
distribution.
0.20
0.15
0.10
1/16
1/16
0.05
0.00
0
1
2
10
3
4
Number of Girls, x
Probability Distributions
Probability, P(x)
6/16
0.40
0.35
0.30
4/16
4/16
P(x)
0.25
0.20
0.15
0.10
1/16
1/16
0.05
0.00
0
1
2
3
4
Number of Girls, x
11
Probability distributions can be written as probability histograms.
Cumulative probabilities: Adding up probabilities of a range of
values.
Washington State Population
Survey and Random Variables
A telephone survey of
number of telephones,x
households throughout
0
1
Washington State.
2
But some households don’t have
3
phones.
4
0.71
0.21769
0.02966
0.00775
0.00332
6
0.00088
0.50
7
0.00002
0.40
8
0.00000
9
0.00015
Total
1.00000
0.60
P(x)
0.70553
5
0.70
0.30
0.22
0.20
0.10 0.04
0.03 0.01 0.00
0.00
0
12
P(x)
0.03500
1
2
3
4
5
6
7
Number of Telephone Lines (x)
8
9
4
Student Lecture Notes
Probabilities about Telephone in
Washington State
•
•
•
•
•
•
13
What is the probability that a household will have no
telephone?
What is the probability that a household will have 2 or
more telephone lines?
What is the probability that a household will have 2 to 4
phone lines?
What is the probability a household will have no phone
lines or more than 4 phone lines?
Who do you think is in that 3.5% of the population?
What are the implications of this for the quality of the
survey?
Probability Histogram of
Telephone Lines, 1998
0.71
0.70
0.60
P(x)
0.50
0.40
0.30
0.22
0.20
0.10 0.04
0.03 0.01 0.00
0.00
0
1
2
3
4
5
6
7
8
9
Number of Telephone Lines (x)
14
Summary Measures
1. Expected Value
n
n
n
mu
Mean of Probability Distribution
Weighted Average of All Possible Values
µ = E(X) = Σx p(x)
2. Variance
n
Sigma -squared
n
n
Weighted Average Squared Deviation about
Mean
σ2 = V(X)= E[ (x − µ)2 ] = Σ (x − µ)2 p(x)
σ2 = V(X)=E(X2) −[E(X)]2
3. Standard Deviation
15
n
σ =√σ2 = SD(X)
5
Student Lecture Notes
What is the average number of telephones in
Washington Households and how much does size vary
from the average?
# of
Approach 1: Variance
Approach 2:
Variance
Phones
x
Frequency
P(x)
xP(x)
(x-µ)
(x-
µ)2
(x-µ)2P(x)
x2
x2P(x)
0
198,286
0.04
0.00
-1.3
1.65
0.06
0
0.00
1
4,142,030
0.71
0.71
-0.3
0.08
0.06
1
0.71
2
1,278,026
0.22
0.44
0.7
0.51
0.11
4
0.87
3
174,110
0.03
0.09
1.7
2.94
0.09
9
0.27
4
45,499
0.01
0.03
2.7
7.38
0.06
16
0.12
5
19,473
0.00
0.02
3.7
13.81
0.05
25
0.08
6
5,170
0.00
0.01
4.7
22.24
0.02
36
0.03
7
118
0.00
0.00
5.7
32.67
0.00
49
0.00
8
-
0.00
0.00
6.7
45.10
0.00
64
0.00
9
897
0.00
0.00
7.7
59.53
0.01
81
0.01
Sum
5,863,609
1.00
µ=1.28
32.16
σ2=0.45
2.10
16
Cherbyshev’s Rule and Empirical Rule
for a Discrete Random Variable
Let x be a discrete random variable with a probability
distribution p(x), mean µ, and standard deviation σ. Then,
depending on the shape of p(x), the following probability
statements can be made:
Chebyshev’s Rule
Applies to any probability
distribution (eg: telephones
in Washington State)
Empirical Rule
Applies to probability distributions
that are mound-shaped and
symmetric (eg: girls born of 4
children)
≥0
≈.68
P(µ - σ < x < µ + σ)
P(µ - 2σ < x < µ + 2σ)
≥3/4
≈.95
P(µ - 3σ < x < µ + 3σ)
≥8/9
≈1.00
17
Data Types
Data
Data
Numerical
Numerical
Discrete
Discrete
18
Continuous
Continuous
Qualitative
Qualitative
6
Student Lecture Notes
Continuous Random
Variable
•
A variable with many possible values at
all intervals
19
Continuous Random
Variable Examples
Experiment
Random
Variable
Possible
Values
Weigh 100 People
Weight
45.1, 78, ...
Measure Part Life
Hours
900, 875.9, ...
Ask Food Spending
Spending
54.12, 42, ...
Measure Time
Between Arrivals
Inter-Arrival 0, 1.3, 2.78, ...
Time
20
Continuous Probability
Density Function
Frequency
1. Mathematical Formula
2. Shows All Values, x, &
Frequencies, f(x)
n
f(X) Is Not Probability
(Value, Frequency)
f(x)
3. Properties
21
n
Area under curve sums to 1
n
Can add up areas of function
to get probability less than a
specific value
a
b
Value
x
7
Student Lecture Notes
Continuous Random
Variable Probability
P (c ≤ x ≤ d )
Probability Is Area
Under Curve!
f(x)
c
d
© 1984-1994 T/Maker Co.
22
Continuous Probability
Distribution Models
Continuous
Probability
Distribution
Uniform
Normal
Exponential
23
Importance of
Normal Distribution
1. Describes Many Random Processes or
Continuous Phenomena
2. Can Be Used to Approximate Discrete
Probability Distributions
n
Example: Binomial
3. Basis for Classical Statistical Inference
24
X
8
Student Lecture Notes
Normal Distribution
1. ‘Bell-Shaped’ &
Symmetrical
f(X)
2. Mean, Median,
Mode Are Equal
X
3. ‘Middle Spread’
Is 1.33 σ
4. Random Variable
Has Infinite Range
Mean
Median
Mode
25
Normal Distribution
Useful Properties
• About half of “weight” below
mean (because
symmetrical)
• About 68% of probability
within 1 standard deviation
of mean (at change in
curve)
• About 95% of probability
within 2 standard deviations
• More than 99% of
probability within 3 standard
deviations
f(X)
µ +σ
µ − 3σ
µ − 2σ µ − σ
µ + σ µ + 2σ µ + 3σ
Mean
Median
Mode
26
Probability
Density Function
f (x) =
x
σ
π
e
µ
27
1
σ
2π
e
2
 1  x − µ 
−


 2  σ

= Value of Random Variable (-∞ < x < ∞)
= Population Standard Deviation
= 3.14159
= 2.71828
= Mean of Random Variable x
Don’t memorize this!
X
9
Student Lecture Notes
Notation
X is N(µ,s )
The random variable X has a normal
distribution (N) with mean µ and standard
deviation s .
X is N(40,1)
X is N(10,5)
X is N(50,3)
28
Effect of Varying
Parameters (µ & σ)
f(X)
B
A
C
X
29
Normal Distribution
Probability
Probability is
area under
curve!
d
c
f(x)
f(x)
cc
30
?
P(c ≤ x ≤ d) = ∫? f (x) dx
d
d
xx
10
Student Lecture Notes
Infinite Number
of Tables
Normal distributions differ by
mean & standard deviation.
Each distribution would
require its own table.
f(X)
X
That’s an infinite number!
31
Standardize the
Normal Distribution
Z=
Normal
Distribution
X −µ
σ
Z is N(0,1)
Standardized
Normal Distribution
σ
σ=1
µ
X
µ= 0
Z
One table!
32
Standardizing Example
Z=
X − µ 6.2 − 5
=
= .12
σ
10
Normal
Distribution
σ = 10
µ= 5 6.2 X
33
Standardized
Normal Distribution
σ=1
µ= 0 .12 Z
11
Student Lecture Notes
Obtaining
the Probability
Standardized Normal
Probability Table (Portion)
Z
.00
.01
σ=1
.02
0.0 .0000 .0040 .0080
.0478
0.1 .0398 .0438 .0478
0.2 .0793 .0832 .0871
µ= 0 .12 Z
0.3 .1179 .1217 .1255
Probabilities
34
Shaded area
exaggerated
Example
P(3.8 ≤ X ≤ 5)
Z=
X − µ 3.8 − 5
=
= − .12
σ
10
Normal
Distribution
Standardized
Normal Distribution
σ = 10
σ=1
.0478
3.8 µ = 5
35
X
-.12 µ = 0
Z
Shaded area exaggerated
Example
P(2.9 ≤ X ≤ 7.1)
X − µ 2.9 − 5
=
= −.21
σ
10
X − µ 7.1 − 5
Z=
=
= .21
Standardized
σ
10
Normal Distribution
Z=
Normal
Distribution
σ = 10
σ=1
.1664
.0832 .0832
2.9 5 7.1 X
36
-.21 0 .21
Shaded area exaggerated
Z
12
Student Lecture Notes
Example
P(X ≥ 8)
Z=
X −µ 8− 5
=
= .30
σ
10
Normal
Distribution
Standardized
Normal Distribution
σ = 10
σ=1
.5000
.3821
.1179
µ=5
37
8 X
µ=0
.30 Z
Shaded area exaggerated
Example
P(7.1 ≤ X ≤ 8)
X − µ 7.1 − 5
=
= .21
σ
10
X − µ 8− 5
Z=
=
= .30
σ
10
Z=
Normal
Distribution
Standardized
Normal Distribution
σ = 10
σ=1
.1179
.0347
.0832
µ=5
38
7.1 8
X
µ=0
.21 .30 Z
Shaded area exaggerated
Travel Time and
the Normal Distribution
To help people plan their travel, WSDOT estimates
that average trip from Seattle to Bellevue at 5:40 pm
(at peak) takes 11 minutes and with a standard
deviation of 10. They also believe this travel time
approximates a normal distribution.
What proportion of trips take less than 27 minutes?
39
13
Student Lecture Notes
Process
1. Draw a picture and write down the
probability you need.
2. Convert probability to standard scores.
3. Find cumulative probability in the table.
40
More Travel Time
Suppose we have only 10-15 minutes to travel to Seattle
from Bellevue. What proportion of trips will make it in
that time?
 10 − 11 
 15 − 11 
P (10 < X < 15) = P
 < Z < P

 10 
 10 
= P(− 0.1 < Z < .4)
= 1 − P (Z < −0.1) − P ( Z > .4)
Since normal curves
are symmetrical:
41
= 1 − P (Z > .1) − P ( Z > .4)
= 1 − (.5 − .0398) − (. 5 − .1554 )
= 1 − (.4602) − (.3446)
= .1952
19.5% of trips will make it in between 10 and 15 minutes.
Finding Z Values
for Known Probabilities
What is Z given
P(Z) = .1217?
.1217
σ=1
Standardized Normal
Probability Table (Portion)
Z
.00
.01
0.2
0.0 .0000 .0040 .0080
0.1 .0398 .0438 .0478
µ = 0 .31 Z
Shaded area
exaggerated
42
0.2 .0793 .0832 .0871
0.3 .1179 .1217 .1255
14
Student Lecture Notes
Finding X Values
for Known Probabilities
Normal Distribution
Standardized Normal Distribution
σ = 10
σ=1
.1217
µ=5
?
X
.1217
µ = 0 .31
Z
X = µ + Z ⋅ σ = 5 + (. 31 )(10 ) = 8 . 1
43
Shaded areas exaggerated
Travel Times Take 3
How much time will the trip take 99% of
the time?
44
Finding Z Values
for Known Probabilities
1. Write down probability statement and draw a
picture
P(Z<____)=.99
2. Look up Z value in table
2.325
P(Z<_____)=.99
3. Convert Z value (SD units) to variable (X) by
using mean and SD.
34.25
2.325
X=µ+Zs so X=11+(_____)(10)=
So, the trip can be made 99% of the time in 34.25 minutes.
45
15
Student Lecture Notes
Assessing Normality
1.
A histogram of the data is mound shaped and symmetrical
about the mean.
2.
Determine the percentage of measurements falling in each
of the intervals x± s, x± 2s, and x± 3s. If the data are
approximately normal, the percentages will be
approximately equal to 68%, 95%, and 100% respectively.
3.
Find the interquartile range, IQR, and standard deviation, s,
for the sample, then calculate the ratio IQR/s. If the data
are approximately normal, then IQR/S ≈ 1.3.
4.
Construct a normal probability plot for the data. If the data
are approximately normal, the points will fall
(approximately) on a straight line.
46
Assessing Normality:
Is Class Height Normally Distributed?
1.
How does the histogram
look?
SPSS can produce the line of
the normal curve for you. In
SPSS select GRAPH,
HISTOGRAM. After you
choose the variable you want,
click on the box “Display
Normal Curve” and you’ll get
something that looks like this.
7
6
5
y
c
n4
e
u
q
e
r
F3
2
1
Mean = 66.52
Std. Dev. = 3.117
N = 23
0
60
62
64
66
68
70
72
Height 527 2005
47
Assessing Normality:
Is Class Height Normally Distributed?
Anticipated Actual
Percent
Percent
2. Compute the intervals:
Height 527 2005
Valid 60
62
63
64
65
66
67
68
69
70
71
72
Total
48
Cumulative
Frequency Percent Valid Percent Percent
1
4.3
4.3
4.3
1
4.3
4.3
8.7
3
13.0
13.0
21.7
2
8.7
8.7
30.4
1
4.3
4.3
34.8
3
13.0
13.0
47.8
2
8.7
8.7
56.5
2
8.7
8.7
65.2
5
21.7
21.7
87.0
1
4.3
4.3
91.3
1
4.3
4.3
95.7
1
4.3
4.3
100.0
23
100.0
100.0
x±s
[63.40,69.64]
68%
x±2s [60.29,72.75]
95%
96%
x±3s [57.17,75.87]
100%
100%
SPSS: ANALYZE,
DESCRIPTIVE STATISTICS,
FREQUENCIES
43%
16
Student Lecture Notes
Assessing Normality:
Is Class Height Normally Distributed?
Statistics
3. Does IQR/s˜ 1.3?
IQR=69-64=5
IQR/s=5/3.117=1.6
Height 527 2005
N
Valid
Missing
Std. Deviation
Percentiles
25
50
75
23
0
3.117
64.00
67.00
69.00
SPSS: ANALYZE,
DESCRIPTIVE STATISTICS,
FREQUENCIES then click on STATISTICS
and choose the ones you want.
49
Assessing Normality:
Is Class Height Normally
Distributed?
Normal Q-Q Plot of Height 527 2005
4. What does the normal
probability plot look like?
74
72
SPSS: Graphs>Q-Q Test distribution is normal
and click estimate distribution parameters from
data.
e70
lu
a
V
l
a68
rm
o
N
d66
te
c
e
p
x
E64
62
60
60
62
64
66
68
70
Observed Value
50
Learning Objectives
1.
2.
Distinguish Between the Two Types of Random
Variables
Discrete Random Variables
1.
2.
3.
Continuous Random Variables
1.
2.
3.
4.
51
Describe Discrete Random Variables
Compute the Expected Value & Variance of Discrete
Random Variables
Describe Normal Random Variables
Introduce the Normal Distribution
Calculate Probabilities for Continuous Random Variables
Assessing Normality
72
74
17
Related documents