Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Random Variables Streamlining Probability: Probability Distribution, Expected Value and Standard Deviation of Random Variable Graphically and Numerically Summarize a Random Experiment Principal vehicle by which we do this: random variables A random variable assigns a number to each outcome of an experiment Random Variables Definition: A random variable is a numerical-valued variable whose value is based on the outcome of a random event. Denoted by upper-case letters X, Y, etc. When the number of possible values of X is finite (number of heads in 3 tosses of a coin) or countably infinite (number of tosses until you get 3 heads in a row), the random variable is discrete. (Will study continuous rv’s later). Examples: Discrete rv’s 1. X = # of games played in a randomly selected World Series Possible values of X are x=4, 5, 6, 7 2. Y=score on 13th hole (par 5) at Augusta National golf course for a randomly selected golfer on day 1 of 2015 Masters y=3, 4, 5, 6, 7 Examples: Discrete rv’s Number of girls in a 5 child family Number of customers that use an ATM in a 1-hour period. Number of tosses of a fair coin that is required until you get 3 heads in a row (note that this discrete random variable has a countably infinite number of possible values: x=3, 4, 5, 6, 7, . . .) Data Variables and Data Distributions CUSIP 60855410 40262810 81180410 46489010 69318010 26157010 90249410 4886910 87183910 62475210 36473510 00755P10 23935910 68555910 16278010 51460610 4523710 74555310 80819410 19770920 23790310 11457710 00431L10 29605610 23303110 64124610 59492810 22821010 190710 46978310 531320 49766010 30205210 46065P10 19247910 IND 4 5 4 9 9 7 4 5 9 4 7 9 2 4 4 4 4 4 4 9 4 4 9 4 4 4 6 7 4 6 4 4 4 5 4 CONAME MOLEX INC GULFMARK INTL INC SEAGATE TECHNOLOGY ISOMEDIX INC PCA INTERNATIONAL INC DRESS BARN INC TYSON FOODS INC ATLANTIC SOUTHEAST AIRLINES SYSTEM SOFTWARE ASSOC INC MUELLER (PAUL) CO GANTOS INC ADVANTAGE HEALTH CORP DAWSON GEOPHYSICAL CO ORBIT INTERNATIONAL CP CHECK TECHNOLOGY CORP LANCE INC ASPECT TELECOMMUNICATIONS PULASKI FURNITURE CORP SCHULMAN (A.) INC COLUMBIA HOSPITAL CORP DATA MEASUREMENT CORP BROOKTREE CORP ACCESS HEALTH MARKETING INC ESCALADE INC DBA SYSTEMS INC NEUTROGENA CORP MICROAGE INC CROWN BOOKS CORP AST RESEARCH INC JACO ELECTRONICS INC ADAC LABORATORIES KIRSCHNER MEDICAL CORP EXIDE ELECTRS GROUP INC INTERPROVINCIAL PIPE LN COHERENT INC PE 24.7 21.4 21.3 25.2 21.4 24.5 20.9 20.1 23.7 14.5 15.7 23.3 14.9 15.0 17.1 19.0 25.7 22.0 19.4 18.3 11.3 13.8 22.4 10.8 6.3 27.2 9.0 24.4 9.7 31.9 18.5 33.0 29.0 11.9 40.2 NPM 8.7 8.1 2.2 21.1 4.7 4.5 3.9 15.7 11.6 3.9 1.8 5.3 9.3 3.0 3.2 8.5 8.2 2.1 6.0 3.1 2.6 13.6 11.0 2.0 5.0 9.0 0.5 1.8 7.3 0.4 10.6 0.8 2.4 19.2 1.2 CUSIP IND CONAME 60855410 4 MOLEX INC 40262810 5 GULFMARK INTL INC 81180410 4 SEAGATE TECHNOLOGY 46489010 9 ISOMEDIX INC 69318010 9 PCA INTERNATIONAL INC 26157010 7 DRESS BARN INC PE NPM 24.7 8.7 21.4 8.1 21.3 2.2 25.2 21.1 21.4 4.7 24.5 4.5 Data variables are known outcomes. Data Variables and Data Distributons CUSIP 60855410 40262810 81180410 46489010 69318010 26157010 90249410 4886910 87183910 62475210 36473510 00755P10 23935910 68555910 16278010 51460610 4523710 74555310 80819410 19770920 23790310 11457710 00431L10 29605610 23303110 Class 64124610 (bin) 59492810 22821010 1 190710 46978310 2 531320 49766010 3 30205210 46065P10 4 19247910 IND CONAME 4 MOLEX INC 5 GULFMARK INTL INC 4 SEAGATE TECHNOLOGY 9 ISOMEDIX INC 9 PCA INTERNATIONAL INC 7 DRESS BARN INC 4 TYSON FOODS INC 5 ATLANTIC SOUTHEAST AIRLINES 9 SYSTEM SOFTWARE ASSOC INC 4 MUELLER (PAUL) CO 7 GANTOS INC 9 ADVANTAGE HEALTH CORP 2 DAWSON GEOPHYSICAL CO 4 ORBIT INTERNATIONAL CP 4 CHECK TECHNOLOGY CORP 4 LANCE INC 4 ASPECT TELECOMMUNICATIONS 4 PULASKI FURNITURE CORP 4 SCHULMAN (A.) INC 9 COLUMBIA HOSPITAL CORP 4 DATA MEASUREMENT CORP 4 BROOKTREE CORP 9 ACCESS HEALTH MARKETING INC 4 ESCALADE INC 4 DBA SYSTEMS INC Class 4 NEUTROGENA TallyCORPFrequency Boundary 6 MICROAGE INC 76.00-12.99 CROWN BOOKS |||| | CORP 6 4 AST RESEARCH INC 6 JACO ELECTRONICS INC 10 13.00-19.99 |||| |||| 4 ADAC LABORATORIES 4 KIRSCHNER CORP 20.00-26.99 |||| ||||MEDICAL |||| 14 4 EXIDE ELECTRS GROUP INC 5 INTERPROVINCIAL PIPE LN4 27.00-33.99 |||| 4 COHERENT INC PE NPM 24.7 8.7 21.4 8.1 21.3 2.2 25.2 21.1 21.4 4.7 24.5 4.5 20.9 3.9 20.1 15.7 23.7 11.6 14.5 3.9 15.7 1.8 23.3 5.3 14.9 9.3 15.0 3.0 17.1 3.2 19.0 8.5 25.7 8.2 22.0 2.1 19.4 6.0 18.3 3.1 11.3 2.6 13.8 13.6 22.4 11.0 10.8 2.0 6.3 5.0 Relative 27.2 9.0 Frequency 9.0 0.5 24.4= 0.1711.8 6/35 9.7 7.3 31.9= 0.2860.4 10/35 18.5 10.6 33.0 14/35 = 0.4000.8 29.0 2.4 11.9= 0.114 19.2 4/35 40.2 1.2 CUSIP IND CONAME 60855410 4 MOLEX INC 40262810 5 GULFMARK INTL INC 81180410 4 SEAGATE TECHNOLOGY 46489010 9 ISOMEDIX INC 69318010 9 PCA INTERNATIONAL INC 26157010 7 DRESS BARN INC 5 DATA DISTRIBUTION Price-Earnings Ratios 34.00-40.99 | 1 1/35 = 0.029 PE NPM 24.7 8.7 21.4 8.1 21.3 2.2 25.2 21.1 21.4 4.7 24.5 4.5 Data variables are known outcomes. Data distributions tell us what happened. Handout 2.1, P. 10 Random Variables and Probability Distributions Random variables are unknown chance outcomes. Probability distributions tell us what is likely to happen. Data variables are known outcomes. Data distributions tell us what happened. Notation Economic Scenario Profit X ($ Millions) Probability Great x1 10 0.20 Good x2 5 0.40 OK x3 1 0.25 Lousy x4 -4 0.15 X = the random variable (profits) xi = outcome i x1 = 10 x2 = 5 x3 = 1 x4 = -4 Notation Economic Scenario Profit X ($ Millions) Probability Great x1 10 Pr(X=x1) 0.20 Good x2 5 Pr(X=x2) 0.40 OK x3 1 Pr(X=x3) 0.25 Lousy Pr(X=x4) 0.15 x4 -4 P is the probability p(xi)= Pr(X = xi) is the probability of X being outcome xi p(x1) = Pr(X = 10) = .20 p(x2) = Pr(X = 5) = .40 p(x3) = Pr(X = 1) = .25 p(x4) = Pr(X = -4) = .15 Economic Scenario Probability Histogram Probability .40 .35 .30 .25 .20 .15 Lousy Profit X ($ Millions) Great x1 10 p(x1) 0.20 Good x2 5 p(x2) 0.40 OK x3 1 p(x3) 0.25 Lousy x4 -4 p(x4) 0.15 Good OK Great .10 .05 -4 -2 0 2 Probability P 4 Profit 6 8 10 12 Probability Distribution Of Number of Games Played in Randomly Selected World Series Estimate based on results from 1946 to 2014. x 4 5 6 7 p(x) 12/65=0.185 12/65=0.185 14/65=0.215 27/65=0.415 Probability Histogram Number of Games in Randomly Selected World Series 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.415 0.185 0.185 4 5 0.215 6 7 Probability distributions: requirements Notation: p(x)= Pr(X = x) is the probability that the random variable X has value x Requirements 1. 0 p(x) 1 for all values x of X 2. all x p(x) = 1 Expected Value of a Discrete Random Variable A measure of the “middle” of the values of a random variable Sample Mean Mean or Expected Value X = n X i i = 1 n x +x +x +...+x n X= 1 2 3 n 1 1 1 1 = x + x + x +...+ x n 1 n 2 n 3 n n k = the number of outcomes E ( x) = k x i P(X=x i ) i=1 µ = x1·p(x1) + x2·p(x2) + x3·p(x3) + ... + xk·p(xk) Weighted mean Each outcome is weighted by its probability Other Weighted Means 1. Stock Market: The Dow Jones Industrial Average The “Dow” consists of 30 companies (the 30 companies in the “Dow” change periodically) To compute the Dow Jones Industrial Average, a weight proportional to the company’s “size” is assigned to each company’s stock price 2. GPA A=4, B=3, C=2, D=1, F=0 Five 3-hour courses: 2 A's (6 hrs), 1 B (3 hrs), 2 C's (6 hrs) GPA: 4 * 6 3*3 2 * 6 15 45 15 3.0 Economic Scenario Mean Profit X ($ Millions) Great x1 10 P(X=x1) 0.20 Good x2 5 P(X=x2) 0.40 OK x3 1 P(X=x3) 0.25 Lousy x4 -4 P(X=x4) 0.15 k = the number of outcomes (k=4) E ( x) = k x i Probability P P(X=x i ) i=1 µ = x1·p(x1) + x2·p(x2) + x3·p(x3) + ... + xk·p(xk) EXAMPLE µ = 10*.20 + 5*.40 + 1*.25 – 4*.15 = 3.65 ($ mil) Probability Mean .40 .35 .30 .25 .20 .15 Lousy Good OK Great .10 .05 -4 -2 0 2 4 6 8 10 12 Profit µ=3.65 k = the number of outcomes (k=4) E ( x) = k x i P(X=x i ) i=1 µ = x1·p(x1) + x2·p(x2) + x3·p(x3) + ... + xk·p(xk) EXAMPLE µ = 10·.20 + 5·.40 + 1·.25 - 4·.15 = 3.65 ($ mil) Interpretation E(x) is not the value of the random variable x that you “expect” to observe if you perform the experiment once Interpretation E(x) is a “long run” average; if you perform the experiment many times and observe the random variable x each time, then the average x of these observed xvalues will get closer to E(x) as you observe more and more values of the random variable x. Example: Green Mountain Lottery State of Vermont choose 3 digits from 0 through 9; repeats allowed win $500 x $0 $500 p(x) .999 .001 E(x)=$0(.999) + $500(.001) = $.50 Example (cont.) E(x)=$.50 On average, each ticket wins $.50. Important for Vermont to know E(x) is not necessarily a possible value of the random variable (values of x are $0 and $500) Example (cont.) So the probability distribution of x is: x p(x) 0 1/8 1 3/8 2 3/8 3 1/8 So the probability distribution of X is: Example x p(x) 0 1/8 1 3/8 2 3/8 3 1/8 Let X = number of heads in 3 tosses of a fair coin. E(x) (or μ ) is E(x) 4 x p(x ) i i i 1 (0 1 ) ( 1 3 ) (2 3 ) (3 1 ) 8 8 8 8 12 1.5 8 US Roulette Wheel and Table The roulette wheel has alternating black and red slots numbered 1 through 36. There are also 2 green slots numbered 0 and 00. A bet on any one of the 38 numbers (1-36, 0, or 00) pays odds of 35:1; that is . . . If you bet $1 on the winning number, you receive $36, so your winnings are $35 American Roulette 0 - 00 (The European version has only one 0.) US Roulette Wheel: Expected Value of a $1 bet on a single number Let x be your winnings resulting from a $1 bet on a single number; x has 2 possible values x p(x) -1 37/38 35 1/38 E(x)= -1(37/38)+35(1/38)= -.05 So on average the house wins 5 cents on every such bet. A “fair” game would have E(x)=0. The roulette wheels are spinning 24/7, winning big $$ for the house, resulting in … Summarizing data and probability Data Histogram measure of the center: sample mean x measure of spread: sample standard deviation s Random variable Probability Histogram measure of the center: population mean measure of spread: population standard deviation s Standard Deviation of a Discrete Random Variable Measures how “spread out” the random variable is Variance Variation n s2 = (X i X) 2 i=1 n-1 = 1805.703 = 53.1089 34 The deviations of the individual x ‘s from the mean (expected value) of their probability distribution: Xi - X xi - µ Var(X)=s2 (sigma squared) is the variance of the probability distribution Variance Variation n s2 = (X i X) 2 i=1 n-1 = 1805.703 = 53.1089 34 Variance of discrete random variable X Var(X) = s k 2 = (x ) i i =1 2 P( X = xi ) Economic Scenario Variation s2 Profit X ($ Millions) Probability P Great x1 10 P(X=x1) 0.20 Good x2 5 P(X=x2) 0.40 OK x3 1 P(X=x3) 0.25 Lousy x4 -4 P(X=x4) 0.15 k = 2 ( x ) P( X = x i ) i i =1 Example 3.65 3.65 3.65 s2 = (x1-µ)2 · P(X=x1) + (x2-µ)2 · P(X=x2) + 3.65 (x3-µ)2 · P(X=x3) + (x4-µ)2 · P(X=x4) = (10-3.65)2 · 0.20 + (5-3.65)2 · 0.40 + (1-3.65)2 · 0.25 + (-4-3.65)2 · 0.15 = 19.3275 P. 207, Handout 4.1, P. 4 Standard Deviation: of More Interest then the Variance The population standard deviation is the square root of the population variance s s Standard Deviation Standard Deviation Standard Deviation (s) = Positive Square Root of the Variance s = s2 s2 = 19.3275 s, or SD, is the standard deviation of the probability distribution s (or SD) = s 2 s (or SD) = 19.3275 4.40 ($ mil.) Finance and Investment Interpretation X = return on an investment (stock, portfolio, etc.) E(x) = expected return on this investment s is a measure of the risk of the investment s k 2 Example = 2 ( x E ( X )) P ( X = xi ) i i =1 A basketball player shoots 3 free throws. P(make) =P(miss)=0.5. Let X = number of free throws made. x 0 1 2 3 1 3 3 1 8 8 8 Compute the variance: 8 p( x) E(X) s 2 (0 1.5) 2 18 (1 1.5) 2 83 (2 1.5) 2 83 (3 1.5) 2 18 2.25 18 .25 83 .25 83 2.25 81 .75. s s .75 .866 Expected Value of a Random Variable Example: The probability model for a particular life insurance policy is shown. Find the expected annual payout on a policy. We expect that the insurance company will pay out $200 per policy per year. 37 © 2010 Pearson Education Standard Deviation of a Random Variable Example: The probability model for a particular life insurance policy is shown. Find the standard deviation of the annual payout. 38 © 2010 Pearson Education 68-95-99.7 Rule for Random Variables For random variables x whose probability histograms are approximately moundshaped: P s x s 68 P s x s 9 P( 3s x 3s 997 ( s, s) (50-5, 50+5) (45, 55) P s X s P(45 X 55) =.048+.057+.066+.073+.078+.08+.078+.073+ .066+.057+.048=.724 Rules for E(X), Var(X) and SD(X): adding a constant a If X is a rv and a is Example: a = -1 a constant: E(X+a) = E(X)+a E(X+a)=E(X-1)=E(X)-1 Rules for E(X), Var(X) and SD(X): adding constant a (cont.) Var(X+a) = Var(X) SD(X+a) = SD(X) Example: a = -1 Var(X+a)=Var(X-1)=Var(X) SD(X+a)=SD(X-1)=SD(X) Economic Profit X Scenario ($ Millions) Probability P Economic Profit X+2 Scenario ($ Millions) Great x1 10 P(X=x1) 0.20 Great Good x2 5 P(X=x2) 0.40 OK x3 1 Lousy x4 -4 Probability P P(X=x1) 0.20 Good x1+ 10+2 2 x2+2 5+2 P(X=x3) 0.25 OK x3+2 1+2 P(X=x3) 0.25 P(X=x4) 0.15 Lousy x4+2 -4+2 P(X=x4) 0.15 P(X=x2) 0.40 E(x + a) = E(x) + a; SD(x + a)=SD(x); let a = 2 s = 4.40 Probability -4 -2 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 2 4 Profit 3.65 6 8 10 12 14 s = 4.40 Probability -4 -2 0 2 4 6 8 Profit 5.65 10 12 14 New Expected Value Long (UNC-CH) way: E(x+2)=12(.20)+7(.40)+3(.25)+(-2)(.15) = 5.65 Smart (NCSU) way: a=2; E(x+2) =E(x) + 2 = 3.65 + 2 = 5.65 New Variance and SD Long (UNC-CH) way: (compute from “scratch”) Var(X+2)=(12-5.65)2(0.20)+… +(-2+5.65)2(0.15) = 19.3275 SD(X+2) = √19.3275 = 4.40 Smart (NCSU) way: Var(X+2) = Var(X) = 19.3275 SD(X+2) = SD(X) = 4.40 Rules for E(X), Var(X) and SD(X): multiplying by constant b E(bX)=b E(X) Var(b X) = b2Var(X) SD(bX)= |b|SD(X) Example: b =-1 E(bX)=E(-X)=-E(X) Var(bX)=Var(-1X)= =(-1)2Var(X)=Var(X) SD(bX)=SD(-1X)= =|-1|SD(X)=SD(X) Expected Value and SD of Linear Transformation a + bx Let X=number of repairs a new computer needs each year. Suppose E(X)= 0.20 and SD(X)=0.55 The service contract for the computer offers unlimited repairs for $100 per year plus a $25 service charge for each repair. What are the mean and standard deviation of the yearly cost of the service contract? Cost = $100 + $25X E(cost) = E($100+$25X)=$100+$25E(X)=$100+$25*0.20= = $100+$5=$105 SD(cost)=SD($100+$25X)=SD($25X)=$25*SD(X)=$25*0.55= =$13.75 Addition and Subtraction Rules for Random Variables E(X+Y) = E(X) + E(Y); E(X-Y) = E(X) - E(Y) When X and Y are independent random variables: 1. Var(X+Y)=Var(X)+Var(Y) 2. SD(X+Y)= Var ( X ) Var (Y ) SD’s do not add: SD(X+Y)≠ SD(X)+SD(Y) 3. Var(X−Y)=Var(X)+Var(Y) 4. SD(X −Y)= Var ( X ) Var (Y ) SD’s do not subtract: SD(X−Y)≠ SD(X)−SD(Y) SD(X−Y)≠ SD(X)+SD(Y) Motivation for Var(X-Y)=Var(X)+Var(Y) Let X=amount automatic dispensing machine puts into your 16 oz drink (say at McD’s) A thirsty, broke friend shows up. Let Y=amount you pour into friend’s 8 oz cup Let Z = amount left in your cup; Z = ? Z = X-Y Has 2 + components Var(Y) Var(Z) = Var(X-Y) = Var(X) Example: rv’s NOT independent X=number of hours a randomly selected student from our class slept between 9 am yesterday and 9 am today. Y=number of hours a randomly selected student from our class was awake between 9 am yesterday and 9 am today. Y = 24 – X. What are the expected value and variance of the total hours that a student is asleep and awake between 9 am yesterday and 9 am today? Total hours that a student is asleep and awake between 9 am yesterday and 9 am today = X+Y E(X+Y) = E(X+24-X) = E(24) = 24 Var(X+Y) = Var(X+24-X) = Var(24) = 0. We don't add Var(X) and Var(Y) since X and Y are not independent. Pythagorean Theorem of Statistics for Independent X and Y a2 + b2 = c 2 Var(X)+Var(Y)=Var(X+Y) c2=a2+b2 Var(X) a2 Var(X+Y) a c SD(X+Y) SD(X) b SD(Y) b2 Var(Y) a+b≠c SD(X)+SD(Y) ≠SD(X+Y) Pythagorean Theorem of Statistics for Independent X and Y 32 + 42 = 52 Var(X)+Var(Y)=Var(X+Y) 25=9+16 Var(X) 9 Var(X+Y) 3 5 SD(X+Y) SD(X) 4 SD(Y) 16 Var(Y) 3+4≠5 SD(X)+SD(Y) ≠SD(X+Y) Example: meal plans Regular plan: X = daily amount spent E(X) = $13.50, SD(X) = $7 Expected value and stan. dev. of total spent in 2 consecutive days? E(X +X )=E(X )+E(X )=$13.50+$13.50=$27 1 2 1 2 SD(X + X ) ≠ SD(X )+SD(X ) = $7+$7=$14 1 2 1 2 SD( X 1 X 2 ) Var ( X 1 X 2 ) Var ( X 1 ) Var ( X 2 ) ($7) ($7) $ 49 $ 49 $ 98 $9.90 2 2 2 2 2 Example: meal plans (cont.) Jumbo plan for football players Y=daily amount spent E(Y) = $24.75, SD(Y) = $9.50 Amount by which football player’s spending exceeds regular student spending is Y-X E(Y-X)=E(Y)–E(X)=$24.75-$13.50=$11.25 SD(Y ̶ X) ≠ SD(Y) ̶ SD(X) = $9.50 ̶ $7=$2.50 SD(Y X ) Var (Y X ) Var (Y ) Var ( X ) ($9.50) ($7) $ 90.25 $ 49 $ 139.25 $11.80 2 2 2 2 2 For random variables, X+X≠2X Let X be the annual payout on a life insurance policy. From mortality tables E(X)=$200 and SD(X)=$3,867. 1) If the payout amounts are doubled, what are the new expected value and standard deviation? The risk to the Double payout is 2X. E(2X)=2E(X)=2*$200=$400 insurance co. when SD(2X)=2SD(X)=2*$3,867=$7,734 doubling the payout is notThe the same 2) Suppose insurance policies are sold to 2 (2X) people. as 2 thepeople risk when annual payouts are X1 and X2. Assume the selling policies behave independently. What are the expected value to 2 people. and standard deviation of the total payout? E(X1 + X2)=E(X1) + E(X2) = $200 + $200 = $400 SD(X1 + X2 )= Var ( X1 X 2 ) Var ( X1 ) Var ( X 2 ) (3867)2 (3867)2 14,953,689 14,953,689 29,907,378 $5,468.76