Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CHAPTER 16 Random Variables and Probability Distributions Streamline Treatment of Probability Sample spaces and events are good starting points for probability Sample spaces and events become quite cumbersome when applied to real-life business-related processes Random variables allow us to apply probability, risk and uncertainty to meaningful business-related situations Bring Together Numerical Summaries of Data and Probability In previous chapters we saw that data could be graphically and numerically summarized in terms of midpoints, spreads, outliers, etc. In basic probability we saw how probabilities could be assigned to outcomes of an experiment. Now we bring them together First: Two Quick Examples 1. Hardee’s vs. The Colonel Hardee’s vs The Colonel Out of 100 taste-testers, 63 preferred Hardee’s fried chicken, 37 preferred KFC Evidence that Hardee’s is better? A landslide? What if there is no difference in the chicken? (p=1/2, flip a fair coin) Is 63 heads out of 100 tosses that unusual? Example 2. Mothers Identify Newborns Mothers Identify Newborns After spending 1 hour with their newborns, blindfolded and nose-covered mothers were asked to choose their child from 3 sleeping babies by feeling the backs of the babies’ hands 22 of 32 women (69%) selected their own newborn “far better than 33% one would expect…” Is it possible the mothers are guessing? Can we quantify “far better”? Graphically and Numerically Summarize a Random Experiment Principal vehicle by which we do this: random variables A random variable assigns a number to each outcome of an experiment Random Variables Definition: A random variable is a numerical-valued function defined on the outcomes of an experiment S Random variable Number line Examples S = {HH, TH, HT, TT} the random variable: x = # of heads in 2 tosses of a coin Possible values of x = 0, 1, 2 Two Types of Random Variables Discrete: random variables that have a finite or countably infinite number of possible values Test: for any given value of the random variable, you can designate the next largest or next smallest value of the random variable Examples: Discrete rv’s Number of girls in a 5 child family Number of customers that use an ATM in a 1-hour period. Number of tosses of a fair coin that is required until you get 3 heads in a row (note that this discrete random variable has a countably infinite number of possible values: x=3, 4, 5, 6, 7, . . .) Two types (cont.) Continuous: a random variable that can take on all possible values in an interval of numbers Test: given a particular value of the random variable, you cannot designate the next largest or next smallest value Which is it, Discrete or Continuous? Discrete random variables “count” Continuous random variables “measure” (length, width, height, area, volume, distance, time, etc.) Examples: continuous rv’s The time it takes to run the 100 yard dash (measure) The time between arrivals at an ATM machine (measure) Time spent waiting in line at the “express” checkout at the grocery store (the probability is 1 that the person in front of you is buying a loaf of bread with a third party check drawn on a Hungarian bank) (measure) Examples: cont. rv’s (cont.) The length of a precision-engineered magnesium rod (measure) The area of a silicon wafer for a computer chip coming off a production line (measure) Classify as discrete or continuous a x=the number of customers who enter a particular bank during the noon hour on a particular day a discrete x={0, 1, 2, 3, …} b x=time (in seconds) required for a teller to serve a bank customer b continuous x>0 Classify (cont.) c x=the distance (in miles) between a randomly selected home in a community and the nearest pharmacy c continuous x>0 d x=the diameter of precision-engineered “5 inch diameter” ball bearings coming off an assembly line d continuous; range could be {4.5<x<5.5} Classify (cont.) e x=the number of tosses of a fair coin required to observe at least 3 heads in succession e discrete x=3, 4, 5, ... Data Variables and Data Distributions CUSIP 60855410 40262810 81180410 46489010 69318010 26157010 90249410 4886910 87183910 62475210 36473510 00755P10 23935910 68555910 16278010 51460610 4523710 74555310 80819410 19770920 23790310 11457710 00431L10 29605610 23303110 64124610 59492810 22821010 190710 46978310 531320 49766010 30205210 46065P10 19247910 IND 4 5 4 9 9 7 4 5 9 4 7 9 2 4 4 4 4 4 4 9 4 4 9 4 4 4 6 7 4 6 4 4 4 5 4 CONAME MOLEX INC GULFMARK INTL INC SEAGATE TECHNOLOGY ISOMEDIX INC PCA INTERNATIONAL INC DRESS BARN INC TYSON FOODS INC ATLANTIC SOUTHEAST AIRLINES SYSTEM SOFTWARE ASSOC INC MUELLER (PAUL) CO GANTOS INC ADVANTAGE HEALTH CORP DAWSON GEOPHYSICAL CO ORBIT INTERNATIONAL CP CHECK TECHNOLOGY CORP LANCE INC ASPECT TELECOMMUNICATIONS PULASKI FURNITURE CORP SCHULMAN (A.) INC COLUMBIA HOSPITAL CORP DATA MEASUREMENT CORP BROOKTREE CORP ACCESS HEALTH MARKETING INC ESCALADE INC DBA SYSTEMS INC NEUTROGENA CORP MICROAGE INC CROWN BOOKS CORP AST RESEARCH INC JACO ELECTRONICS INC ADAC LABORATORIES KIRSCHNER MEDICAL CORP EXIDE ELECTRS GROUP INC INTERPROVINCIAL PIPE LN COHERENT INC PE 24.7 21.4 21.3 25.2 21.4 24.5 20.9 20.1 23.7 14.5 15.7 23.3 14.9 15.0 17.1 19.0 25.7 22.0 19.4 18.3 11.3 13.8 22.4 10.8 6.3 27.2 9.0 24.4 9.7 31.9 18.5 33.0 29.0 11.9 40.2 NPM 8.7 8.1 2.2 21.1 4.7 4.5 3.9 15.7 11.6 3.9 1.8 5.3 9.3 3.0 3.2 8.5 8.2 2.1 6.0 3.1 2.6 13.6 11.0 2.0 5.0 9.0 0.5 1.8 7.3 0.4 10.6 0.8 2.4 19.2 1.2 CUSIP IND CONAME 60855410 4 MOLEX INC 40262810 5 GULFMARK INTL INC 81180410 4 SEAGATE TECHNOLOGY 46489010 9 ISOMEDIX INC 69318010 9 PCA INTERNATIONAL INC 26157010 7 DRESS BARN INC PE NPM 24.7 8.7 21.4 8.1 21.3 2.2 25.2 21.1 21.4 4.7 24.5 4.5 Data variables are known outcomes. Data Variables and Data Distributons CUSIP 60855410 40262810 81180410 46489010 69318010 26157010 90249410 4886910 87183910 62475210 36473510 00755P10 23935910 68555910 16278010 51460610 4523710 74555310 80819410 19770920 23790310 11457710 00431L10 29605610 23303110 Class 64124610 (bin) 59492810 22821010 1 190710 46978310 2 531320 49766010 3 30205210 46065P10 4 19247910 IND CONAME 4 MOLEX INC 5 GULFMARK INTL INC 4 SEAGATE TECHNOLOGY 9 ISOMEDIX INC 9 PCA INTERNATIONAL INC 7 DRESS BARN INC 4 TYSON FOODS INC 5 ATLANTIC SOUTHEAST AIRLINES 9 SYSTEM SOFTWARE ASSOC INC 4 MUELLER (PAUL) CO 7 GANTOS INC 9 ADVANTAGE HEALTH CORP 2 DAWSON GEOPHYSICAL CO 4 ORBIT INTERNATIONAL CP 4 CHECK TECHNOLOGY CORP 4 LANCE INC 4 ASPECT TELECOMMUNICATIONS 4 PULASKI FURNITURE CORP 4 SCHULMAN (A.) INC 9 COLUMBIA HOSPITAL CORP 4 DATA MEASUREMENT CORP 4 BROOKTREE CORP 9 ACCESS HEALTH MARKETING INC 4 ESCALADE INC 4 DBA SYSTEMS INC Class 4 NEUTROGENA TallyCORPFrequency Boundary 6 MICROAGE INC 76.00-12.99 CROWN BOOKS |||| | CORP 6 4 AST RESEARCH INC 6 JACO ELECTRONICS INC 10 13.00-19.99 |||| |||| 4 ADAC LABORATORIES 4 KIRSCHNER CORP 20.00-26.99 |||| ||||MEDICAL |||| 14 4 EXIDE ELECTRS GROUP INC 5 INTERPROVINCIAL PIPE LN4 27.00-33.99 |||| 4 COHERENT INC PE NPM 24.7 8.7 21.4 8.1 21.3 2.2 25.2 21.1 21.4 4.7 24.5 4.5 20.9 3.9 20.1 15.7 23.7 11.6 14.5 3.9 15.7 1.8 23.3 5.3 14.9 9.3 15.0 3.0 17.1 3.2 19.0 8.5 25.7 8.2 22.0 2.1 19.4 6.0 18.3 3.1 11.3 2.6 13.8 13.6 22.4 11.0 10.8 2.0 6.3 5.0 Relative 27.2 9.0 Frequency 9.0 0.5 24.4= 0.1711.8 6/35 9.7 7.3 31.9= 0.2860.4 10/35 18.5 10.6 33.0 14/35 = 0.4000.8 29.0 2.4 11.9= 0.114 19.2 4/35 40.2 1.2 CUSIP IND CONAME 60855410 4 MOLEX INC 40262810 5 GULFMARK INTL INC 81180410 4 SEAGATE TECHNOLOGY 46489010 9 ISOMEDIX INC 69318010 9 PCA INTERNATIONAL INC 26157010 7 DRESS BARN INC 5 DATA DISTRIBUTION Price-Earnings Ratios 34.00-40.99 | 1 1/35 = 0.029 PE NPM 24.7 8.7 21.4 8.1 21.3 2.2 25.2 21.1 21.4 4.7 24.5 4.5 Data variables are known outcomes. Data distributions tell us what happened. Handout 2.1, P. 10 Random Variables and Probability Distributions Random variables are unknown chance outcomes. Probability distributions tell us what is likely to happen. Data variables are known outcomes. Data distributions tell us what happened. Profit Scenarios Economic Scenario Profit ($ Millions) Great 10 Good 5 Random variables are unknown chance outcomes. Probability distributions tell us what is likely to happen. Handout 4.1, P. 3 Profit Scenarios Economic Scenario Profit ($ Millions) Great 10 Good 5 OK 1 Lousy -4 Probability Economic Scenario Profit ($ Millions) Probability Great 10 0.20 Good 5 0.40 OK 1 0.25 Lousy -4 0.15 The proportion of the time an outcome is expected to happen. Probability Distribution Economic Scenario Profit ($ Millions) Probability Great 10 0.20 Good 5 0.40 OK 1 0.25 Lousy -4 0.15 Shows all possible values of a random variable and the probability associated with each outcome. Notation Economic Scenario Profit X ($ Millions) Probability Great x1 10 0.20 Good x2 5 0.40 OK x3 1 0.25 Lousy x4 -4 0.15 X = the random variable (profits) xi = outcome i x1 = 10 x2 = 5 x3 = 1 x4 = -4 Notation Economic Scenario Profit X ($ Millions) Probability Great x1 10 Pr(X=x1) 0.20 Good x2 5 Pr(X=x2) 0.40 OK x3 1 Pr(X=x3) 0.25 Lousy Pr(X=x4) 0.15 x4 -4 P is the probability p(xi)= Pr(X = xi) is the probability of X being outcome xi p(x1) = Pr(X = 10) = .20 p(x2) = Pr(X = 5) = .40 p(x3) = Pr(X = 1) = .25 p(x4) = Pr(X = -4) = .15 What are the chances? Economic Scenario Profit X ($ Millions) Probability Great x1 10 0.20 Good x2 5 0.40 OK x3 1 0.25 Lousy x4 -4 0.15 What are the chances that profits will be less than $5 million in 2009? P(X < 5) = P(X = 1 or X = -4) = P(X = 1) + P(X = -4) = .25 + .15 = .40 What are the chances? Economic Scenario Profit X ($ Millions) Probability P Great x1 10 p(x1) 0.20 Good x2 5 p(x2) 0.40 OK x3 1 p(x3) 0.25 Lousy x4 -4 p(x4) 0.15 P(X < 5) = .40 What are the chances that profits will be less than $5 million in 2009 and less than $5 million in 2010? P(X < 5 in 2009 and X < 5 in 2010) = P(X < 5)·P(X < 5) = .40·.40 = .16 Economic Scenario Probability Histogram Profit X ($ Millions) Great x1 10 p(x1) 0.20 Good x2 5 p(x2) 0.40 OK x3 1 p(x3) 0.25 Lousy x4 -4 p(x4) 0.15 Probability .40 .35 .30 .25 .20 .15 .10 .05 -4 -2 0 2 Probability 4 Profit 6 8 10 12 Economic Scenario Probability Histogram Probability .40 .35 .30 .25 .20 .15 Lousy Profit X ($ Millions) Great x1 10 p(x1) 0.20 Good x2 5 p(x2) 0.40 OK x3 1 p(x3) 0.25 Lousy x4 -4 p(x4) 0.15 Good OK Great .10 .05 -4 -2 0 2 Probability P 4 Profit 6 8 10 12 Probability distributions: requirements Notation: p(x)= Pr(X = x) is the probability that the random variable X has value x Requirements 1. 0 p(x) 1 for all values x of X 2. all x p(x) = 1 Example x 0 1 2 p(x) .20 .90 -.10 property 1) violated: p(2) = -.10 x -2 -1 1 2 p(x) .3 .3 .3 .3 property 2) violated: p(x) = 1.2 Example (cont.) x p(x) -1 .25 0 .65 1 .10 OK 1) satisfied: 0 p(x) 1 for all x 2) satisfied: all x p(x) = .25+.65+.10 = 1 Example 20% of light bulbs last at least 800 hrs; you have just purchased 2 light bulbs. X=number of the 2 bulbs that last at least 800 hrs (possible values of x: 0, 1, 2) Find the probability distribution of X S: bulb lasts at least 800 hrs F: bulb fails to last 800 hrs P(S) = .2; P(F) = .8 Example (cont.) Possible outcomes (S,S) (S,F) (F,S) (F,F) probability distribution of x: P(outcome) (.2)(.2)=.04 (.2)(.8)=.16 (.8)(.2)=.16 (.8)(.8)=.64 x 0 1 p(x) .64 .32 x 2 1 1 0 2 .04 Example Outcomes P(outcome) MMM (1/2)3=1/8 3 child family; MMF 1/8 X=#of boys MFM 1/8 M: child is male P(M)=1/2 FMM 1/8 (0.5121; from .5134)MFF 1/8 F: child is female FMF 1/8 P(F)=1/2 FFM 1/8 (0.4879) FFF 1/8 x 3 2 2 2 1 1 1 0 Probability Distribution of x x 0 1 2 3 p(x) 1/8 3/8 3/8 1/8 Probability of at least 1 boy: P(x 1)= 3/8 + 3/8 +1/8 = 7/8 Probability of no boys or 1 boy: p(0) + p(1)= 1/8 + 3/8 = 4/8 = 1/2 Two More Examples 1. X = # of games played in a randomly selected World Series Possible values of X are x=4, 5, 6, 7 2. Y=score on 13th hole (par 5) at Augusta National golf course for a randomly selected golfer on day 1 of 2011 Masters y=3, 4, 5, 6, 7 Probability Distribution Of Number of Games Played in Randomly Selected World Series Estimate based on results from 1946 to 2010. x 4 5 6 7 p(x) 12/65=0.185 12/65=0.185 14/65=0.215 27/65=0.415 Probability Histogram Number of Games in Randomly Selected World Series 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0.415 0.185 0.185 4 5 0.215 6 7 Probability Distribution Of Score on 13th hole (par 5) at Augusta National Golf Course on Day 1 of 2011 Masters y 3 4 5 6 7 p(x) 0.040 0.414 0.465 0.051 0.030 Score on 13th Hole 0.5 Probability Histogram 0.465 0.45 0.414 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.051 0.04 0.03 0 3 4 5 6 7