Download Lecture Unit 5 - NCSU Statistics

Chapter 16 Random Variables Streamlining Probability: Probability Distribution, Expected Value and Standard Deviation of Random Variable Graphically and Numerically Summarize a Random Experiment Principal vehicle by which we do this: random variables Random Variables Definition: A random variable is a numerical-valued variable whose value is based on the outcome of a random event. Denoted by upper-case letters X, Y, etc. Examples 1. X = # of games played in a randomly selected World Series Possible values of X are x=4, 5, 6, 7 2. Y=score on 13th hole (par 5) at Augusta National golf course for a randomly selected golfer on day 1 of 2011 Masters y=3, 4, 5, 6, 7 Random Variables and Probability Distributions A probability distribution lists the possible values of a random variable and the probability that each value will occur. Random variables are unknown chance outcomes. Probability distributions tell us what is likely to happen. Data variables are known outcomes. Data distributions tell us what happened. Probability Distribution Of Number of Games Played in Randomly Selected World Series Estimate based on results from 1946 to 2010. x 4 5 6 7 p(x) 12/65=0.185 12/65=0.185 14/65=0.215 27/65=0.415 Probability Histogram Number of Games in Randomly Selected World Series 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.415 0.185 0.185 4 5 0.215 6 7 Probability Distribution Of Score on 13th hole (par 5) at Augusta National Golf Course on Day 1 of 2011 Masters y 3 4 5 6 7 p(x) 0.040 0.414 0.465 0.051 0.030 Score on 13th Hole 0.5 Probability Histogram 0.465 0.45 0.414 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.051 0.04 0.03 0 3 4 5 6 7 Probability distributions: requirements Requirements 1. 0  p(x)  1 for all values x of X 2. all x p(x) = 1 Expected Value of a Random Variable A measure of the “middle” of the values of a random variable Score on 13th Hole Number of Games in Randomly Selected World Series 0.5 0.415 0.4 0.3 0.2 0.185 0.185 0.215 0.1 0 4 5 6 7 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.465 0.414 0.051 0.04 3 4 5 6 The mean of the probability distribution is the expected value of X, denoted E(X) E(X) is also denoted by the Greek letter µ (mu) 0.03 7 Mean or Expected Value x 4 5 6 7 p(x) 12/65=0.185 12/65=0.185 14/65=0.215 27/65=0.415 y 3 4 5 6 7 p(x) 0.040 0.414 0.465 0.051 0.030 k = the number of possible values of random variable E ( x)   = k x i  P(X=x i ) i=1 E(x)= µ = x1·p(x1) + x2·p(x2) + x3·p(x3) + ... + xk·p(xk) Weighted mean Sample Mean Mean or Expected Value X = n X  i i = 1 n x +x +x +...+x n X= 1 2 3 n 1 1 1 1 = x + x + x +...+ x n 1 n 2 n 3 n n k = the number of outcomes E ( x)   = k x i  P(X=x i ) i=1 µ = x1·p(x1) + x2·p(x2) + x3·p(x3) + ... + xk·p(xk) Weighted mean Each outcome is weighted by its probability Other Weighted Means GPA A=4, B=3, C=2, D=1, F=0 Five 3-hour courses: 2 A's (6 hrs), 1 B (3 hrs), 2 C's (6 hrs) GPA: 4 * 6  3*3  2 * 6 15  45  3.0 15 Baseball slugging percentage SLG (hr=4, 3b=3, 2b=2, 1b=1) 4* hr  3*3B  2* 2 B  1*1B AB Babe Ruth 1920 (80 yrs): 458 AB; 54 hr, 9 3B, 36 2B, 73 1B 4*54  9*3  36* 2  73*1 388 SLG    .847 458 458 SLG  Baseball ticket prices Football ticket prices Mean or Expected Value x 4 5 6 7 p(x) 12/65=0.185 12/65=0.185 14/65=0.215 27/65=0.415 y 3 4 5 6 7 p(x) 0.040 0.414 0.465 0.051 0.030 E( X )   = k x i  P(X=x i ) i=1 E(X)= µ =4(0.185)+5(0.185)+6(0.215)+7(0.414) =5.86 games E(Y)= µ=3(.04)+4(0.414)+5(0.465)+6(0.051)+7(0.03) =4.617 strokes Number of Games in Randomly Selected World Series Mean or Expected Value 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.415 0.185 0.185 4 5 0.215 6 7 µ=5.86 E(X)= µ =4(0.185)+5(0.185)+6(0.215)+7(0.414) =5.86 games Interpretation E(x) is not the value of the random variable x that you “expect” to observe if you perform the experiment once Interpretation of E(X) E(X) is a “long run” average. The expected value of a random variable is equal to the average value of the random variable if the chance process was repeated an infinite number of times. In reality, if the chance process is continually repeated, x will get closer to E(x) as you observe more and more values of the random variable x. Example: Green Mountain Lottery State of Vermont choose 3 digits from 0 through 9; repeats allowed win $500 x $0 $500 p(x) .999 .001 E(x)=$0(.999) + $500(.001) = $.50 Example (cont.) E(x)=$.50 On average, each ticket wins $.50. Important for Vermont to know E(x) is not necessarily a possible value of the random variable (values of x are $0 and $500) Expected Value, Surprise Onside Kicks http://www.advancednflstats.com/ The change in expected points for the kicking team: successful 1.9; fail -1.4. X=change in expected points for kicking team when attempting surprise onside kick X 1.9 -1.4 p(x) p 1-p What values of p make surprise onside kicks a good strategy? Expected change should be greater than 0 (1.9) p  (1.4)(1  p)  0 3.3 p  1.4  0 3.3 p  1.4 p  0.424 US Roulette Wheel and Table  The roulette wheel has alternating black and red slots numbered 1 through 36.  There are also 2 green slots numbered 0 and 00.  A bet on any one of the 38 numbers (1-36, 0, or 00) pays odds of 35:1; that is . . .  If you bet $1 on the winning number, you receive $36, so your winnings are $35 American Roulette 0 - 00 (The European version has only one 0.) US Roulette Wheel: Expected Value of a $1 bet on a single number Let x be your winnings resulting from a $1 bet on a single number; x has 2 possible values x p(x) -1 37/38 35 1/38 E(x)= -1(37/38)+35(1/38)= -.05 So on average the house wins 5 cents on every such bet. A “fair” game would have E(x)=0. The roulette wheels are spinning 24/7, winning big $$ for the house, resulting in … Standard Deviation of a Random Variable First center (expected value) Now - spread Standard Deviation of a Random Variable Measures how “spread out” the random variable is Summarizing data and probability Data Histogram measure of the center: sample mean x measure of spread: sample standard deviation s Random variable Probability Histogram measure of the center: population mean  measure of spread: population standard deviation s Example x 0 100 p(x) 1/2 1/2 E(x) = 0(1/2) + 100(1/2) = 50 y 49 51 p(y) 1/2 1/2 E(y) = 49(1/2) + 51(1/2) = 50 Variance Variation n s2 =  (X i  X) 2 i=1 n-1 = 1805.703 = 53.1089 34 The deviations of the outcomes from the mean of the probability distribution xi - µ Xi - X s2 (sigma squared) is the variance of the probability distribution Variance Variation n s2 =  (X i  X) 2 i=1 n-1 = 1805.703 = 53.1089 34 Variance of random variable X s k 2 =  (x i =1   )  P( X = x i ) 2 i Variation s2 x 4 5 6 7 p(x) 12/65=0.185 12/65=0.185 14/65=0.215 27/65=0.415 k = 2 ( x   )  P( X = x i )  i i =1 Example 5.86 5.86 5.86 s2 = (x1-µ)2 · P(X=x1) + (x2-µ)2 · P(X=x2) + 5.86 (x3-µ)2 · P(X=x3) + (x4-µ)2 · P(X=x4) = (4-5.86)2 · 0.185 + (5-5.86)2 · 0.185 + (6-5.86)2 · 0.215 + (7-5.86)2 · 0.415 = 1.3204 P. 207, Handout 4.1, P. 4 Standard Deviation: of More Interest then the Variance The population standard deviation is the square root of the population variance s  s  Standard Deviation Standard Deviation Standard Deviation (s) = Positive Square Root of the Variance s = s2 s2 = 1.3204 s, or SD, is the standard deviation of the probability distribution s (or SD) = s 2 s (or SD) = 1.3204  1.1491 games Expected Value of a Random Variable Example: The probability model for a particular life insurance policy is shown. Find the expected annual payout on a policy. We expect that the insurance company will pay out $200 per policy per year. 33 © 2010 Pearson Education Standard Deviation of a Random Variable Example: The probability model for a particular life insurance policy is shown. Find the standard deviation of the annual payout. 34 © 2010 Pearson Education 68-95-99.7 Rule for Random Variables For random variables x whose probability histograms are approximately moundshaped: P(  s  x    s)  .68 P(  s  x    s)  .95 P( 3s  x    3s)  .997 (  1s,   1s) (50-5, 50+5) (45, 55) P(  s  X    s)  P(45  X  55) =.048+.057+.066+.073+.078+.08+.078+.073+ .066+.057+.048=.724 Rules for E(X), Var(X) and SD(X): adding a constant a If X is a rv and a is Example: a = -1 a constant:  E(X+a) = E(X)+a  E(X+a)=E(X-1)=E(X)-1 Rules for E(X), Var(X) and SD(X): adding constant a (cont.) Var(X+a) = Var(X) SD(X+a) = SD(X) Example: a = -1  Var(X+a)=Var(X-1)=Var(X)  SD(X+a)=SD(X-1)=SD(X) Carolina Panthers Next Season’s Profit Economy Profit X ($ Millions) Probability Great 10 0.20 Good 5 0.40 OK 1 0.25 Lousy -4 0.15 E(X)=10(0.20) + 5(0.40) + 1(0.25) – 4(0.15) =3.65 SD(X)=4.4 Economic Profit X Scenario ($ Millions) Economic Profit X+2 Scenario ($ Millions) Probability Probability Great x1 10 0.20 Great x1+2 10+2 0.20 Good x2 5 0.40 Good x2+2 5+2 0.40 OK x3 1 0.25 OK x3+2 1+2 0.25 Lousy x4 -4 0.15 Lousy x4+2 -4+2 0.15 E(X + a) = E(X) + a; SD(X + a)=SD(X); let a = 2 s = 4.40 Probability 0.5 -4 -2 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 2 4 Profit 3.65 6 8 10 12 s = 4.40 Probability 14 -4 -2 0 2 4 6 8 Profit 5.65 10 12 14 New Expected Value Long (UNC-CH) way: E(X+2)=12(.20)+7(.40)+3(.25)+(-2)(.15) = 5.65 Smart (NCSU) way: a=2; E(X+2) =E(X) + 2 = 3.65 + 2 = 5.65 New Variance and SD Long (UNC-CH) way: (compute from “scratch”) Var(X+2)=(12-5.65)2(0.20)+… +(-2+5.65)2(0.15) = 19.3275 SD(X+2) = √19.3275 = 4.40 Smart (NCSU) way: Var(X+2) = Var(X) = 19.3275 SD(X+2) = SD(X) = 4.40 Rules for E(X), Var(X) and SD(X): multiplying by constant b E(bX)=bE(X) Var(bX) = b2Var(X) SD(bX)= |b|SD(X)  |b| denotes the absolute value of b  Example: b =-1  E(bX)=E(-X)=-E(X)  Var(bX)=Var(-1X)= =(-1)2Var(X)=Var(X)  SD(bX)=SD(-1X)= =|-1|SD(X)=SD(X) Expected Value and SD of Linear Transformation a + bx Let the random variable X= season field goal shooting percentage for an NBA team. Suppose E(X)= 45.31 and SD(X)=1.67 The relationship between X and points scored per game for an NBA team can be described by 14.49 + 1.85X. What are the mean and standard deviation of the points scored per game? Points per game (ppg) = 14.49 + 1.85X E(ppg) = E(14.49+1.85X)=14.49+1.85E(X)=14.49+1.85*45.31= = 14.49+83.82=98.31 SD(ppg)=SD(14.49+1.85X)=SD(1.85X)=1.85*SD(X)=1.85*1.67= =3.09 Note that the shift of 14.49 does NOT affect the standard deviation. Addition and Subtraction Rules for Random Variables  E(X+Y) = E(X) + E(Y);  E(X-Y) = E(X) - E(Y)  When X and Y are independent random variables: 1. Var(X+Y)=Var(X)+Var(Y) 2. SD(X+Y)= Var ( X )  Var (Y ) SD’s do not add: SD(X+Y)≠ SD(X)+SD(Y) 3. Var(X−Y)=Var(X)+Var(Y) 4. SD(X −Y)= Var ( X )  Var (Y ) SD’s do not subtract: SD(X−Y)≠ SD(X)−SD(Y) SD(X−Y)≠ SD(X)+SD(Y) Motivation for Var(X-Y)=Var(X)+Var(Y)  Let X=amount automatic dispensing machine puts into your 16 oz drink (say at McD’s)  A thirsty, broke friend shows up. Let Y=amount you pour into friend’s 8 oz cup  Let Z = amount left in your cup; Z = ?  Z = X-Y Has 2 + components Var(Y)  Var(Z) = Var(X-Y) = Var(X) Example: rv’s NOT independent  X=number of hours a randomly selected student from our class slept between noon yesterday and noon today.  Y=number of hours the same randomly selected student from our class was awake between noon yesterday and noon today. Y = 24 – X.  What are the expected value and variance of the total hours that a student is asleep and awake between noon yesterday and noon today?  Total hours that a student is asleep and awake between noon yesterday and noon today = X+Y  E(X+Y) = E(X+24-X) = E(24) = 24  Var(X+Y) = Var(X+24-X) = Var(24) = 0.  We don't add Var(X) and Var(Y) since X and Y are not independent. Pythagorean Theorem of Statistics for Independent X and Y a2+b2=c2 Var(X+Y) c2 Var(X) +Var(Y) =Var(X+Y) Var(X) a2 a c SD(X+Y) SD(X) b SD(Y) b2 Var(Y) a+b≠c SD(X)+SD(Y) ≠SD(X+Y) Pythagorean Theorem of Statistics for Independent X and Y 32 + 42 = 52 Var(X)+Var(Y)=Var(X+Y) 25=9+16 Var(X) 9 Var(X+Y) 3 5 SD(X+Y) SD(X) 4 SD(Y) 16 Var(Y) 3+4≠5 SD(X)+SD(Y) ≠SD(X+Y) Example: meal plans Regular plan: X = daily amount spent E(X) = $13.50, SD(X) = $7 Expected value and stan. dev. of total spent in 2 consecutive days? E(X +X )=E(X )+E(X )=$13.50+$13.50=$27 1 2 1 2 SD(X + X ) ≠ SD(X )+SD(X ) = $7+$7=$14 1 2 1 2 SD( X 1  X 2 )  Var ( X 1  X 2 )  Var ( X 1 )  Var ( X 2 )  ($7)  ($7)  $ 49  $ 49  $ 98  $9.90 2 2 2 2 2 Example: meal plans (cont.) Jumbo plan for football players Y=daily amount spent E(Y) = $24.75, SD(Y) = $9.50 Amount by which football player’s spending exceeds regular student spending is Y-X E(Y-X)=E(Y)–E(X)=$24.75-$13.50=$11.25 SD(Y ̶ X) ≠ SD(Y) ̶ SD(X) = $9.50 ̶ $7=$2.50 SD(Y  X )  Var (Y  X )  Var (Y )  Var ( X )  ($9.50)  ($7)  $ 90.25  $ 49  $ 139.25  $11.80 2 2 2 2 2 For random variables, X+X≠2X  Let X be the annual payout on a life insurance policy. From mortality tables E(X)=$200 and SD(X)=$3,867. 1) If the payout amounts are doubled, what are the new expected value and standard deviation? The risk to the  Double payout is 2X. E(2X)=2E(X)=2*$200=$400 insurance co. when  SD(2X)=2SD(X)=2*$3,867=$7,734 doubling the payout is notThe the same 2) Suppose insurance policies are sold to 2 (2X) people. as 2 thepeople risk when annual payouts are X1 and X2. Assume the selling policies behave independently. What are the expected value to 2 people. and standard deviation of the total payout?  E(X1 + X2)=E(X1) + E(X2) = $200 + $200 = $400 SD(X1 + X2 )= Var ( X1  X 2 )  Var ( X1 )  Var ( X 2 )  (3867)2  (3867)2  14,953,689  14,953,689  29,907,378  $5,468.76

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture Unit 5 - NCSU Statistics