Download Lecture Unit 5 - NCSU Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 16 Random Variables
Streamlining Probability:
Probability Distribution,
Expected Value and Standard
Deviation of Random Variable
Graphically and
Numerically Summarize a
Random Experiment
Principal vehicle by which we do this:
random variables
Random Variables
Definition:
A random variable is a numerical-valued
variable whose value is based on the
outcome of a random event.
Denoted by upper-case letters X, Y, etc.
Examples
1. X = # of games played in a randomly
selected World Series
Possible values of X are x=4, 5, 6, 7
2. Y=score on 13th hole (par 5) at Augusta
National golf course for a randomly
selected golfer on day 1 of 2011 Masters
y=3, 4, 5, 6, 7
Random Variables and
Probability Distributions
A probability distribution lists the possible values of a
random variable and the probability that each value will
occur.
Random variables are
unknown chance
outcomes.
Probability distributions
tell us what is likely
to happen.
Data variables are
known outcomes.
Data distributions
tell us what happened.
Probability Distribution Of Number of
Games Played in Randomly Selected
World Series
Estimate based on results from 1946 to
2010.
x
4
5
6
7
p(x)
12/65=0.185
12/65=0.185
14/65=0.215
27/65=0.415
Probability
Histogram
Number of Games in Randomly
Selected World Series
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.415
0.185
0.185
4
5
0.215
6
7
Probability Distribution Of Score on
13th hole (par 5) at Augusta
National Golf Course on Day 1 of
2011 Masters
y
3
4
5
6
7
p(x)
0.040
0.414
0.465
0.051
0.030
Score on 13th Hole
0.5
Probability
Histogram
0.465
0.45
0.414
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0.051
0.04
0.03
0
3
4
5
6
7
Probability distributions:
requirements
Requirements
1. 0  p(x)  1 for all values x of X
2. all x p(x) = 1
Expected Value of a
Random Variable
A measure of the “middle”
of the values of a random
variable
Score on 13th Hole
Number of Games in
Randomly Selected World
Series
0.5
0.415
0.4
0.3
0.2
0.185
0.185
0.215
0.1
0
4
5
6
7
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.465
0.414
0.051
0.04
3
4
5
6
The mean of the probability distribution is
the expected value of X, denoted E(X)
E(X) is also denoted by the Greek letter µ
(mu)
0.03
7
Mean or
Expected
Value
x
4
5
6
7
p(x)
12/65=0.185
12/65=0.185
14/65=0.215
27/65=0.415
y
3
4
5
6
7
p(x)
0.040
0.414
0.465
0.051
0.030
k = the number of possible values of
random variable
E ( x)   =
k
x
i
 P(X=x i )
i=1
E(x)= µ = x1·p(x1) + x2·p(x2) + x3·p(x3) +
... + xk·p(xk)
Weighted mean
Sample Mean
Mean or
Expected
Value
X
=
n
X

i
i = 1
n
x +x +x +...+x
n
X= 1 2 3
n
1
1
1
1
= x + x + x +...+ x
n 1 n 2 n 3
n n
k = the number of outcomes
E ( x)   =
k
x
i
 P(X=x i )
i=1
µ = x1·p(x1) + x2·p(x2) + x3·p(x3) + ... +
xk·p(xk)
Weighted mean
Each outcome is weighted by its probability
Other Weighted Means
GPA A=4, B=3, C=2, D=1, F=0
Five 3-hour courses: 2 A's (6 hrs), 1 B (3 hrs), 2 C's (6 hrs)
GPA:
4 * 6  3*3  2 * 6
15

45
 3.0
15
Baseball slugging percentage SLG (hr=4, 3b=3,
2b=2, 1b=1)
4* hr  3*3B  2* 2 B  1*1B
AB
Babe Ruth 1920 (80 yrs): 458 AB; 54 hr, 9 3B, 36 2B, 73 1B
4*54  9*3  36* 2  73*1 388
SLG 

 .847
458
458
SLG 
Baseball ticket prices Football ticket prices
Mean or
Expected
Value
x
4
5
6
7
p(x)
12/65=0.185
12/65=0.185
14/65=0.215
27/65=0.415
y
3
4
5
6
7
p(x)
0.040
0.414
0.465
0.051
0.030
E( X )   =
k
x
i
 P(X=x i )
i=1
E(X)= µ =4(0.185)+5(0.185)+6(0.215)+7(0.414)
=5.86 games
E(Y)= µ=3(.04)+4(0.414)+5(0.465)+6(0.051)+7(0.03)
=4.617 strokes
Number of Games in Randomly
Selected World Series
Mean or
Expected
Value
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0.415
0.185
0.185
4
5
0.215
6
7
µ=5.86
E(X)= µ
=4(0.185)+5(0.185)+6(0.215)+7(0.414)
=5.86 games
Interpretation
E(x) is not the value of the random
variable x that you “expect” to observe if
you perform the experiment once
Interpretation of E(X)
E(X) is a “long run” average.
The expected value of a random variable
is equal to the average value of the
random variable if the chance process was
repeated an infinite number of times. In
reality, if the chance process is continually
repeated, x will get closer to E(x) as you
observe more and more values of the
random variable x.
Example: Green Mountain
Lottery
State of Vermont
choose 3 digits from 0 through 9; repeats
allowed
win $500
x
$0
$500
p(x)
.999
.001
E(x)=$0(.999) + $500(.001) = $.50
Example (cont.)
E(x)=$.50
On average, each ticket wins $.50.
Important for Vermont to know
E(x) is not necessarily a possible value of
the random variable (values of x are $0
and $500)
Expected Value, Surprise Onside Kicks
http://www.advancednflstats.com/ The change in
expected points for the kicking team: successful
1.9; fail -1.4.
X=change in expected points for kicking team
when attempting surprise onside kick
X
1.9
-1.4
p(x)
p
1-p
What values of p make surprise onside kicks a
good strategy?
Expected change should be greater than 0
(1.9) p  (1.4)(1  p)  0
3.3 p  1.4  0
3.3 p  1.4
p  0.424
US Roulette Wheel
and Table
 The roulette wheel has
alternating black and
red slots numbered 1
through 36.
 There are also 2 green
slots numbered 0 and
00.
 A bet on any one of
the 38 numbers (1-36,
0, or 00) pays odds of
35:1; that is . . .
 If you bet $1 on the
winning number, you
receive $36, so your
winnings are $35
American Roulette 0 - 00
(The European version has
only one 0.)
US Roulette Wheel: Expected Value of a
$1 bet on a single number
Let x be your winnings resulting from a $1 bet
on a single number; x has 2 possible values
x
p(x)
-1
37/38
35
1/38
E(x)= -1(37/38)+35(1/38)= -.05
So on average the house wins 5 cents on every
such bet. A “fair” game would have E(x)=0.
The roulette wheels are spinning 24/7, winning
big $$ for the house, resulting in …
Standard Deviation of a
Random Variable
First center (expected value)
Now - spread
Standard Deviation of a
Random Variable
Measures how “spread out”
the random variable is
Summarizing data and
probability
Data
Histogram
measure of the
center: sample mean
x
measure of spread:
sample standard
deviation s
Random variable
Probability Histogram
measure of the
center: population
mean 
measure of spread:
population standard
deviation s
Example
x
0
100
p(x)
1/2 1/2
E(x) = 0(1/2) + 100(1/2) = 50
y
49 51
p(y)
1/2 1/2
E(y) = 49(1/2) + 51(1/2) = 50
Variance
Variation
n
s2 =
 (X
i
 X) 2
i=1
n-1
=
1805.703
= 53.1089
34
The deviations of the outcomes from the
mean of the probability distribution
xi - µ
Xi - X
s2 (sigma squared) is the variance of the
probability distribution
Variance
Variation
n
s2 =
 (X
i
 X) 2
i=1
n-1
=
1805.703
= 53.1089
34
Variance of random variable X
s
k
2
=
 (x
i =1
  )  P( X = x i )
2
i
Variation
s2
x
4
5
6
7
p(x)
12/65=0.185
12/65=0.185
14/65=0.215
27/65=0.415
k
=
2
(
x


)
 P( X = x i )
 i
i =1
Example 5.86
5.86
5.86
s2 = (x1-µ)2 · P(X=x1) + (x2-µ)2 · P(X=x2) +
5.86
(x3-µ)2 · P(X=x3) + (x4-µ)2 · P(X=x4)
= (4-5.86)2 · 0.185 + (5-5.86)2 · 0.185 +
(6-5.86)2 · 0.215 + (7-5.86)2 · 0.415 =
1.3204
P. 207, Handout 4.1, P. 4
Standard Deviation: of
More Interest then the
Variance
The population standard deviation is the square root
of the population variance
s  s

Standard Deviation
Standard
Deviation
Standard Deviation (s) =
Positive Square Root of the Variance
s =
s2
s2 = 1.3204
s, or SD, is the standard deviation of the
probability distribution
s (or SD) = s
2
s (or SD) = 1.3204  1.1491 games
Expected Value of a Random Variable
Example: The probability model for a particular life insurance
policy is shown. Find the expected annual payout on a policy.
We expect that the insurance company will pay out $200 per policy
per year.
33
© 2010 Pearson Education
Standard Deviation of a Random Variable
Example: The probability model for a particular life insurance
policy is shown. Find the standard deviation of the annual payout.
34
© 2010 Pearson Education
68-95-99.7 Rule for
Random Variables
For random variables x whose probability
histograms are approximately moundshaped:
P(  s  x    s)  .68
P(  s  x    s)  .95
P( 3s  x    3s)  .997
(  1s,   1s) (50-5, 50+5) (45, 55)
P(  s  X    s)  P(45  X  55)
=.048+.057+.066+.073+.078+.08+.078+.073+
.066+.057+.048=.724
Rules for E(X), Var(X) and SD(X):
adding a constant a
If X is a rv and a is Example: a = -1
a constant:
 E(X+a) = E(X)+a
 E(X+a)=E(X-1)=E(X)-1
Rules for E(X), Var(X) and SD(X):
adding constant a (cont.)
Var(X+a) = Var(X)
SD(X+a) = SD(X)
Example: a = -1
 Var(X+a)=Var(X-1)=Var(X)
 SD(X+a)=SD(X-1)=SD(X)
Carolina Panthers Next Season’s Profit
Economy
Profit X
($ Millions)
Probability
Great
10
0.20
Good
5
0.40
OK
1
0.25
Lousy
-4
0.15
E(X)=10(0.20) + 5(0.40) + 1(0.25) – 4(0.15)
=3.65
SD(X)=4.4
Economic Profit X
Scenario ($ Millions)
Economic Profit X+2
Scenario ($ Millions)
Probability
Probability
Great
x1 10
0.20
Great
x1+2 10+2
0.20
Good
x2 5
0.40
Good
x2+2 5+2
0.40
OK
x3 1
0.25
OK
x3+2 1+2
0.25
Lousy
x4 -4
0.15
Lousy
x4+2 -4+2
0.15
E(X + a) = E(X) + a; SD(X + a)=SD(X); let a = 2
s = 4.40
Probability
0.5
-4
-2
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
0
2
4
Profit
3.65
6
8
10
12
s = 4.40
Probability
14
-4
-2
0
2
4
6
8
Profit
5.65
10
12
14
New Expected Value
Long (UNC-CH) way:
E(X+2)=12(.20)+7(.40)+3(.25)+(-2)(.15)
= 5.65
Smart (NCSU) way:
a=2; E(X+2) =E(X) + 2 = 3.65 + 2 = 5.65
New Variance and SD
Long (UNC-CH) way: (compute from
“scratch”)
Var(X+2)=(12-5.65)2(0.20)+…
+(-2+5.65)2(0.15) = 19.3275
SD(X+2) = √19.3275 = 4.40
Smart (NCSU) way:
Var(X+2) = Var(X) = 19.3275
SD(X+2) = SD(X) = 4.40
Rules for E(X), Var(X) and SD(X):
multiplying by constant b
E(bX)=bE(X)
Var(bX) = b2Var(X)
SD(bX)= |b|SD(X)
 |b| denotes the
absolute value of b
 Example: b =-1
 E(bX)=E(-X)=-E(X)
 Var(bX)=Var(-1X)=
=(-1)2Var(X)=Var(X)
 SD(bX)=SD(-1X)=
=|-1|SD(X)=SD(X)
Expected Value and SD of Linear
Transformation a + bx
Let the random variable X= season field goal shooting percentage
for an NBA team. Suppose E(X)= 45.31 and SD(X)=1.67
The relationship between X and points scored per game for an NBA
team can be described by 14.49 + 1.85X.
What are the mean and standard deviation of the points scored per
game?
Points per game (ppg) = 14.49 + 1.85X
E(ppg) = E(14.49+1.85X)=14.49+1.85E(X)=14.49+1.85*45.31=
= 14.49+83.82=98.31
SD(ppg)=SD(14.49+1.85X)=SD(1.85X)=1.85*SD(X)=1.85*1.67=
=3.09
Note that the shift of 14.49 does NOT affect the
standard deviation.
Addition and Subtraction Rules
for Random Variables
 E(X+Y) = E(X) + E(Y);
 E(X-Y) = E(X) - E(Y)
 When X and Y are independent random variables:
1. Var(X+Y)=Var(X)+Var(Y)
2. SD(X+Y)= Var ( X )  Var (Y )
SD’s do not add:
SD(X+Y)≠ SD(X)+SD(Y)
3. Var(X−Y)=Var(X)+Var(Y)
4. SD(X −Y)= Var ( X )  Var (Y )
SD’s do not subtract:
SD(X−Y)≠ SD(X)−SD(Y)
SD(X−Y)≠ SD(X)+SD(Y)
Motivation for
Var(X-Y)=Var(X)+Var(Y)
 Let X=amount automatic dispensing machine
puts into your 16 oz drink (say at McD’s)
 A thirsty, broke friend shows up.
Let Y=amount you pour into friend’s 8 oz cup
 Let Z = amount left in your cup; Z = ?
 Z = X-Y
Has 2 +
components
Var(Y)
 Var(Z) = Var(X-Y) = Var(X)
Example: rv’s NOT independent
 X=number of hours a randomly selected student from our
class slept between noon yesterday and noon today.
 Y=number of hours the same randomly selected student
from our class was awake between noon yesterday and
noon today. Y = 24 – X.
 What are the expected value and variance of the total hours
that a student is asleep and awake between noon yesterday
and noon today?
 Total hours that a student is asleep and awake between
noon yesterday and noon today = X+Y
 E(X+Y) = E(X+24-X) = E(24) = 24
 Var(X+Y) = Var(X+24-X) = Var(24) = 0.
 We don't add Var(X) and Var(Y) since X and Y are not
independent.
Pythagorean Theorem of Statistics
for Independent X and Y
a2+b2=c2
Var(X+Y)
c2
Var(X) +Var(Y) =Var(X+Y)
Var(X)
a2
a
c
SD(X+Y)
SD(X)
b
SD(Y)
b2
Var(Y)
a+b≠c
SD(X)+SD(Y) ≠SD(X+Y)
Pythagorean Theorem of Statistics
for Independent X and Y
32 + 42 = 52
Var(X)+Var(Y)=Var(X+Y)
25=9+16
Var(X)
9
Var(X+Y)
3
5
SD(X+Y)
SD(X)
4
SD(Y)
16
Var(Y)
3+4≠5
SD(X)+SD(Y) ≠SD(X+Y)
Example: meal plans
Regular plan: X = daily amount spent
E(X) = $13.50, SD(X) = $7
Expected value and stan. dev. of total spent in
2 consecutive days?
E(X
+X
)=E(X
)+E(X
)=$13.50+$13.50=$27
1
2
1
2
SD(X + X ) ≠ SD(X )+SD(X ) = $7+$7=$14
1
2
1
2
SD( X 1  X 2 )  Var ( X 1  X 2 )  Var ( X 1 )  Var ( X 2 )
 ($7)  ($7)  $ 49  $ 49  $ 98  $9.90
2
2
2
2
2
Example: meal plans (cont.)
Jumbo plan for football players Y=daily
amount spent
E(Y) = $24.75, SD(Y) = $9.50
Amount by which football player’s spending
exceeds regular student spending is Y-X
E(Y-X)=E(Y)–E(X)=$24.75-$13.50=$11.25
SD(Y ̶ X) ≠ SD(Y) ̶ SD(X) = $9.50 ̶ $7=$2.50
SD(Y  X )  Var (Y  X )  Var (Y )  Var ( X )
 ($9.50)  ($7)  $ 90.25  $ 49  $ 139.25  $11.80
2
2
2
2
2
For random variables, X+X≠2X
 Let X be the annual payout on a life insurance policy.
From mortality tables E(X)=$200 and SD(X)=$3,867.
1) If the payout amounts are doubled, what are the new
expected value and standard deviation?
The risk to the
 Double payout is 2X. E(2X)=2E(X)=2*$200=$400
insurance co. when
 SD(2X)=2SD(X)=2*$3,867=$7,734 doubling the payout
is notThe
the same
2) Suppose insurance policies are sold to 2 (2X)
people.
as 2
thepeople
risk when
annual payouts are X1 and X2. Assume the
selling policies
behave independently. What are the expected
value to 2
people.
and standard deviation of the total payout?
 E(X1 + X2)=E(X1) + E(X2) = $200 + $200 = $400
SD(X1 + X2 )= Var ( X1  X 2 )  Var ( X1 )  Var ( X 2 )
 (3867)2  (3867)2  14,953,689  14,953,689
 29,907,378  $5,468.76
Related documents