Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Notes on Conditional Probability 1) Let A1, A2, …., An be n mutually exclusive events where P(A1)+P(A2)+…+P(An)=1 (and collectively exhaustive). Then P(B)=P(B|A1)P(A1)+P(B|A2)P(A2)+…+P(B|An)P(An) 2) The joint probability P(A and B)=P(B|A)P(A)=P(A|B)P(B) So the conditional probability becomes P(A|B) = P(B|A)P(A)/P(B) Example: (problem 35, chapter 5) A team plays 70% of their games at night and 30% of their games during the day. A team wins 50% of their night games and 90% of their day games. What is the probability that a randomly chosen game is a win? Let W be the event the chosen game is a win. Let N be the event that the game is a night game. Let D be the event that the game is a day game. P(W) = P(W|N)P(N) + P(W|D)P(D) = (0.5)(0.7) + (0.9)(0.3) = 0.62 Example: (problem 35, chapter 5) If yesterday’s game is a win, what is the probability that the game was played at night. Let W be the event the chosen game is a win. Let N be the event that the game is a night game. Let D be the event that the game is a day game. P(N|W) = P(W and N)/P(W) = P(W|N)P(N)/P(W) = (0.5)(0.7)/0.62 = 0.5645 Baye’s Theorem: Notes on Conditional Probability P(B)=P(B|A1)P(A1)+P(B|A2)P(A2)+…+P(B|An)P(An) and P(Ai|B) = P(B|Ai)P(Ai)/P(B) P( Ai ) P( B / Ai ) P( Ai | B) P( A1 ) P( B | A1 ) P( A2 ) P( B | A2 ) ... P( An ) P( B | An ) for i = 1,2,…,n. Double check that A1,… An are collectively exhaustive by checking for P(A1)+P(A2)+…+P(An)=1 Examples on Discrete Probability Distributions. Example: Toss a coin three times and let X be the number of heads. What is the PDF and CDF of X? Outcome Prob. X HHH 1/8 3 HHT 1/8 2 HTH 1/8 2 HTT 1/8 1 THH 1/8 2 THT 1/8 1 TTH 1/8 1 TTT 1/8 0 x P(X = x) F(x)=P(X ≤ x) 0 1/8 1/8 1 3/8 =1/8+3/8=1/2 2 3/8 =1/2+3/8=7/8 3 1/8 1 Examples on Discrete Probability Distributions. Example: Let X be the number of trials we roll a die until we get a 1. What is the PDF of X? P(X=1)=1/6 P(X=2)=(5/6)(1/6) P(X=3)=(5/6)(5/6)(1/6) P(X=4)=(5/6)3(1/6) . . . P(X=i)=(5/6)(i-1) (1/6) for i=1,2,3,…. A Probability distribution function is summarized by Summarized by its MEAN –Long-run average value of the Random Variable STANDARD DEVIATION. Expected Value (Mean) Mathematically: The expected value (or mean) of a RV X is µ = E(X) = xp(x) all x Sometimes Additivity: write µX E(X + Y) = E(X) + E(Y) Expected Value (Mean) How many heads “on average” will we get when tossing a coin three times? If we repeat this many times, we expect that In 1/8 of the times, we’ll get 0 In 3/8 of the times, we’ll get 1 In 3/8 of the times, we’ll get 2 In 1/8 of the times, we’ll get 3 So “on average” we’ll get (Average number of heads) 1 3 3 1 3 xp(x) 0 1 2 3 all x 8 8 8 8 2 Variance and Standard Deviation A measure of the variability of a RV is its Variance To compute the variance of a discrete RV X Compute µ For each possible x, compute (x – µ)2 p(x) Add up these values It helps to construct a table In a formula: σ 2 Var(X) (x μ)2 p(x) all x OR σ 2 Var(X) x 2 p(x) 2 all x Standard Deviation (SD): σ Var(X) Variance and Standard Deviation In the three-tosses example: x 0 1 2 3 p(x) 1/8 3/8 3/8 1/8 Had µ = 3/2 Using the first formula, Var(X) = (0 – 3/2)2(1/8) + (1 – 3/2)2 (3/8) + (2 – 3/2)2 (3/8) + (3 – 3/2)2 (1/8) = 3/4 Using the second formula, Var(X) = 02(1/8) + 12 (3/8) + 22 (3/8) + 32 (1/8) – (3/2)2 = 3/4 Discrete Probability Distributions. Example: Let X the numbers of cars sold. The PDF of X is given in the table below. Find the mean and variance of X. Find the Mean of X. Find the Variance of X. Variance and Standard Deviation When X and Y are independent, Var(X + Y) = Var(X) + Var(Y) Note: it is the variances that add up, not the SDs: X+Y X + Y Both variances and SDs are always positive The Binomial Distribution Let X be the number of “successes” in n independent “trials,” each with success probability p, Such an X is a Binomial R.V. with parameters n and p n x p( x) P( X x) p (1 p) n x x where n n! x x!(n x)! n is the number of trials x is the number of observed successes, x=0…n p is the probability of success on each trial In the book: the probability p is denoted by π, p( x) n Cx (1 ) x n x where n! n Cx x!(n x)! The Binomial Distribution Example: Let X = number of 1’s after rolling a die 5 times. What is P(X = 2) = ? X is a Binomial RV with 5 independent trials (n =5) and a probability of success of 1/6 (p =1/6). The PDF of X is: n x 5 1 x 1 5 x n x P( X x) p (1 p) ( ) (1 ) 6 x x 6 for x = 0,1,2,3,4,5 2 3 5 1 5 P(X 2) 0.161 2 6 6 The Binomial Distribution Example: A study concluded that 76.2 percent of drivers used seat belts. A sample of 12 vehicles is selected. What is the probability the drivers in at least 7 of the 12 vehicles are wearing seat belts? If X is the number of drivers wearing seatbelts then X is a binomial RV with parameters n=12 and success probability p=0.762. P(X=7) = 12C7(0.762)7(1-0.762)(12-7) = 0.0902 P(X≥7) = P(X=7) + P(X=8) + P(X=9) + P(X=10)+ P(X=11)+ P(X=12) =0.0902 + 0.1805 + 0.2569 +0.2467+ 0.1436 + 0.0383 = 0.9563 The Binomial Distribution An important part of understanding probability/statistics is recognizing a “binomial situation” Binomial example Number n = number of items, p = probability of a product being defective Number of students in this class who are in senior year n = number of students in this class, p = probability of a student being a senior. Number of defective products in a sample of items. of no-shows for a flight n = number of passengers, p = probability of a no show flight Number of times next week I’ll get stuck in traffic on my way to school n = number of work days per week, p = probability I get stuck in traffic The Binomial Distribution Often define and use q = 1 – p When X is a Binomial RV with parameters n and p, E(X) = np Var(X) = npq, In the “rolling a die 5 times” example E(X) = 5(1/6) = 5/6 The Binomial Distribution Example: A salesperson is about to visit 25 potential customers; the probability of a successful visit (a “sale”) is 0.3, independent of other visits. Let X be the number of sales. X is a binomial RV with parameters n=25 and p=0.3. 1. 2. 3. 4. p(8) = P(X = 8) = ? F(8) = P(X ≤ 8) = ? E(X) = µ = ? SD(X) = = ? If Y is the number of unsuccessful visits, then what is the distribution of Y? The Poisson Distribution If X is a Poisson RV then the PDF of X is: p ( x) P( X x) xe x! Approximation a Binomial distribution with a Poisson distribution o Consider a Binomial RV X with parameters n and p, X~Bin(n,p). If n is large enough and for a small value of p, we can approximate X with a Poisson RV with parameter µ=np, X~ Pois(µ). o Recall that the mean of a Binomial RV is np and the mean of the Poisson RV is µ. o If n is large and p is small, then use a Poisson Random variable by setting µ=np. o The mean of the Poisson process becomes µ=np and the variance is also equal to µ=np. Example: Assume baggage is rarely lost by the Airlines. Suppose a random sample of 1,000 flights shows a total of 300 bags were lost. Assume that the number of lost bags per flight follows a Poisson distribution. What is the PDF? What is the probability that 0 bags are lost per flight? 3 bags are lost per flight? Let X be the number of bags lost per flight. Assume X is Poisson. The mean of X is 300/1000=0.3 bags. We know that µ=the mean of the Poisson process, so µ=0.3. o The number of lost bags per flight follows a Poisson distribution with mean = 0.3. o If X is the number of lost bags then the PDF of X is: 0.3x e 0.3 P( X x) x! 0.30 e 0.3 P( X 0) 0.741 0! 0.33 e 0.3 P( X 3) 0.003 3! Example: An Emergency Room (ER) is located in a town with a population of n=50,000. The probability that a town resident will need to enter the ER during any chosen day is p=0.0001. What is the distribution of the number of ER patients per day? What is the probability that on a certain day the ER is empty? What is the probability that on a certain day the ER has more than three patients? Let X be the of patients entering the ER during a day. The mean of X is µ=np=(50000)(0.0001)=5 patients per day. The PDF of X is: 5x e5 P( X x) x! 50 e5 P( X 0) 0.006738 0! P( X 3) 1 ( P( X 0) P( X 1) P( X 2) P( X 3)) 1 (.006738 .03369 .084224 .140374) 0.735 The Poisson Distribution The Poisson probability distribution describes the number of times some event occurs during a specified interval. The interval may be time, distance, area, or volume. Examples: o o The number of patients that enter the ER from 1pm to 2pm. The number of defects on a cable within 1 meter. Assumptions of the Poisson Distribution: o o The probability is proportional to the length of the interval. The intervals are independent. Continuous Probability Distributions For Discrete RV X, the pdf is given by p(x)=P(X=x) for all possible values of x. For a Continuous RV X, P(X=x)=0 for all values of x. Example: If X is the amount of time you wait in line at Starbucks then P(X=30.567… seconds)=0. The pdf of a continuous RV is represented by a function p(x) for all values of x where the area under p(x) is 1. The Uniform and Normal Distributions are commonly used Continuous Distributions. Uniform Distribution The simplest distribution for a continuous random variable. Rectangular in shape, constant (uniform) height Defined by minimum and maximum values a and b. Areas within the distribution represent probabilities Example: Time to fly on MEA from Beirut to Paris ranges from 4 hrs to 5hrs. Random variable is flight time; it is continuous. P(x) A continuous Uniform Distribution 1/(b-a) a b x Uniform Distribution Mean: ab 2 SD: (b a ) 2 12 Height: 1 if a≤ x ≤b, P( x ) ba 0 elsewhere. Uniform Distribution Area= (height)(base)= 1 (b a ) (b a ) Example 1: If a uniform distribution ranges from 10 to 15, the height is .20, found by 1/(15-10). The base is 5, found by 15-10. The total area is [1/(1510)](15-10)=1.00 Example 2: The Logan Transit Department (LTD) provides free bus service to Logan residents. A bus arrives at the Transit Center every 30 minutes between 6 A.M. and 9:30 P.M. during weekdays. People arrive at the Transit Center at random times. The time that a person waits is uniformly distributed from 0 to 30 minutes. Uniform Distribution QUESTIONS: Draw a graph for this distribution. Show that the area is 1.00 How long will a person “typically” have to wait for a bus? In other words what is the mean waiting time? What is the SD of the waiting times? What is the probability a person will wait more than 25 minutes? What is the probability a person will wait between 10 and 20 minutes? Uniform Distribution The random variable is time a person must wait. Time is measured on a continuous scale, and wait time ranges from 0 minutes to 30 minutes. To draw the uniform distribution, we start by finding the height P(x). P(x)=1/(30-0)= .033 Area calculation: the time people must wait for the bus is uniform over the interval from 0 to 30 minutes so, a=0 and b=30. Area= (height)(base)=[1/(30-0)](30-0) =1.00 Probability .033 0 10 20 30 Length of wait time (minutes) Mean calculation: µ = (a+b)/2= (0+30)/2 = 15 the typical wait time for an LTD bus is 15 minutes. Uniform Distribution (30 0) 2 8.66 12 (b a)2 12 Probability Probability a person waits more than 25 minutes: This probability is graphically represented by the area within the distribution for the interval 25 to 30. From the area formula: P(25< wait time <30)= (height)(base) = [1/(30-0)](5)= .1667 The probability a person waits more than 25 minutes is .1667 Area= .1667 .033 0 10 μ=15 20 25 30 Uniform Distribution Probability a person waits between 10 and 20 minutes: P(10< wait time <20) = (height)(base) =[1/(30-0)](10) = .3333 Probability Area= .333 .033 0 10 20 25 μ=15 Length of wait time (minutes) 3 0 The Normal Distribution A distribution defined by its mean and its standard deviation 1 P( x ) e 2 ( x 2 ) [ ] 2 2 o Is bell-shaped and has a single peak at the center of the distribution. o Is symmetrical about the mean. o Is asymptotic. That is the curve gets closer and closer to the X-axis but never actually touches it. o Has its mean µ to determine its location and its standard deviation, σ, to determine its dispersion. l i : , The Normal Distribution Theoretically, curve extends to infinity Normal curve is symmetrical . - 5 Mean, median, and mode are equal x The Normal Distribution Number of normal probability distributions is unlimited. Each has a different mean µ and different SD σ. Providing tables for each is impossible. Standard normal distribution Used to determine the probabilities for all the normal families With one table for a standard normal distribution, values can be located easily. Any normal distribution can be converted into a standard normal distribution: next Obtain z-values (z-scores, standard normal values, etc.) The Standard Normal Distribution The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1. It is also called the z distribution. A z-value is the signed distance between a selected value, designated X, and the mean µ, divided by the standard deviation, σ. The formula is: z X The Standard Normal Distribution A standard normal distribution has a: Mean of 0 SD of 1 z values on horizontal scale Standard normal distribution table (Appendix B.1 pages 784) lists the probabilities for the distribution. Example: If Z~Normal(0,1) P(0<Z<1.5) z 0.00 0.01 0.02 … =P(1.5<Z<0) … 1.5 0.4332 0.4345 0.4352 The Standard Normal Distribution Example: The bi-monthly starting salaries of recent MBA graduates follow the normal distribution with a mean of $2,000 and a standard deviation of $200. What is the z-value for a salary of $2,200? z= $2,200 - $2000 $200 =1 What is the z-value for a salary of $1,700? z= $1,700 - $2000 =-1.5 $200 A z-value of 1 indicates that the value of $2,200 is one standard deviation above the mean of $2,000. A z-value of –1.50 indicates that $1,700 is 1.5 standard deviation below the mean of $2000. The Normal Distribution Example: The m&m’s factory supervisor wants to know the probability that m&m’s bags weigh between 283 and 285.4 grams. The bag weight of m&m’s follow a normal probability distribution with a mean of 283 grams and a SD of 1.6 grams. We want to know the area under the curve between the mean, 283, and 285.4 grams. Or P(283< weight <285.4) We convert the x values into z values The Normal Distribution z value for 283: z= (x-μ)/σ= (283-283)/1.6 = 0 z value for 285.4: z= (285.4-283)/1.6 = 1.5 Using Appendix B.1 The value in the standardized normal probability table corresponding to our z values is 0.4332. This means that the area under the curve, between 0.00 and 1.5 is 0.4332. It also means that the probability of a bag, selected at random, will weigh between 283 and 285.4 is 0.4332. Empirical Rule About 68% of the area under the normal curve is within 1 SD of the mean; i.e.,μ ± σ. About 95% of the area under the normal curve is within 2 SD’s of the mean; i.e., μ ± 2σ. Practically all of the area under the normal curve is within 3 SD’s of the mean; i.e., μ ± 3σ. Empirical Rule To translate the empirical rule into standard normal deviates terms (z values): between z=-1 (or X= µ - 1σ) and z=+1 (or X= µ +1 σ) 95% between z=-2 (or X= µ - 2σ) and z=+2 (or X= µ +2 σ) ≈all between -3 and +3 68% The Normal Distribution Example: Suppose the distribution of the annual incomes of a group of middle-management employees at HSBC approximates a normal distribution with a mean of $47,200 and a SD of $800. About 68% of the incomes lie between what two amounts? About 95% of the incomes lie between what two amounts? Virtually all the incomes lie between what two amounts? What are the median and the modal values? Is the distribution of incomes symmetrical? The Normal Distribution About 68% of the incomes lie between $47,200 –1($800) = $46,400 and $47,200 +1($800) = $48,000 About 95% of the incomes lie between $47,400 – 2($800) = $45,600 and $47,400 + 2($800) = $48,800 Almost all of the incomes lie between $47,400 – 3($800) = $44,800 and $47,400 + 3($800) = $49,600 The mean, the median, and the mode are equal for normal distributions; in this ex. $47,200. The distribution is symmetrical. The Normal Distribution Example: The distribution of weekly income of workers in a computer assembly company is normal with a mean of $1,000 and a SD of $100. What is the probability of selecting a worker whose income is less than $790? Or P(weekly income < $790)? z value for $790= (790-1000)/100=-2.1 From Appendix B.1, the area between 0 and -2.1=.4821 Area between 0 and -∞ = .5 Area beyond -2.1 = .5-.4821 = .0179. The Normal Distribution Example (continued): What is the probability of selecting, at random, a worker whose income is between $840 and $1,200? Or P($840< weekly income < $1,200)? z value of $840 = (840-1000)/100=-1.6. Using Appendix B.1, area between 0 and -1.6 is .4452. z value of $1,200 = 2.00. Using Appendix B.1, area between 0 and 2.00 is .4772. P($840 < weekly income < $1,200)= .4452+.4772 = .9224; i.e., 92.24% of workers have weekly incomes between $840 and $1,200 The Normal Distribution Example: A tire company wishes to set a minimum mileage guarantee on its new line of tires. Tests reveal that the mean mileage is 67,900 and SD is 2,050 miles. The distribution of miles is normal. They want to set the minimum guaranteed mileage so that no more than 4% of tires are replaced. What minimum guaranteed mileage should they announce? The minimum guaranteed number of mileage is x. P( mileage < x), or the area under the normal curve is 4% or .0400 The Normal Distribution Area between 0 and x =.5000-.0400 =.4600 We look for .4600 in Appendix B.1, the closest value is .4599 and its corresponding z value is 1.75. Because the value is to the left of the mean z is -1.75 . z = (x-µ)/σ -1.75 = (x-67,900)/2,050 x= 64,312 The tire company should announce that it will replace for free any tire that wears out before it reaches 64,312 miles. Under this plan only 4% of the tires will be replaced. EXAMPLES Doug scored a 57 on the Miller Analogies Test which has a mean of 50 and a SD of 5. Jennifer scored 120 on the WISC (intelligence test) which has a mean of 100 and a SD of 15. Compare their scores. Who had a better score? To compare the 2 scores, first convert each to a standard z score. x In Doug’s distribution: z z = (57 – 50)/5 = 1.4 x z In Jennifer’s distribution: z = (120 – 100)/15 = 1.33 Doug’s score is higher, therefore better than Jennifer’s. If x is a normally distributed variable with a mean of 30 and a SD of 6, what is the probability of x falling above 30; i.e. P(x>30)? The z score that corresponds to x=30 is z=0. Before starting with your calculations notice that any value of x falling above 30 will have a positive z score. P (x>30) = P (z>0) = .5000 The assembly times required for the manufacturing of a certain product are normally distributed with a mean of 400 seconds and a SD of 50 seconds. An item is selected at random. Find the probability that its assembly time is between 360 and 440 seconds. For x = 360: z x For x = 440: z 360 400 .80 50 x 440 400 .80 50 P(360 < x < 440) = P( -.8 < x < .8) = .2881 + .2881 = .5762 In a large section of a Western Civilization course, test grades are normally distributed with a mean of 70 and a SD of 7. Grades are to be assigned according to the following rule. Find the numerical limits for each letter grade. A top 10% B between the top 10% and 30% C scores between the top 30% and the bottom 30% D between the bottom 10% and 30% F bottom 10% Construct a standard normal curve. The top 10% refers to the area above .4000 or above a z score of 1.28. The top 30% refers to the area above .2000 or a z value of .52. The bottom 30% refers to the area below .2000 or a z value of -.52. The bottom 10% refers to the area below .4000 or a z value of -1.28. Using the formula x = µ+ z.σ, we convert each z score into an x score. x = 70 + 7(1.28) = 78.96 or 79 x = 70 + 7(.52) = 73.64 or 74 x = 70 + 7(-.52) = 66.36 or 66 x = 70 + 7(-1.28) = 61.04 or 61 Therefore, A 79 or above B 74 – 78 C 66 – 73 D 61 – 65 F 60 and below a) b) Quick Start Company makes 12-volt car batteries. After many years of product testing, the co. knows that the average life of a Quick Start battery is normally distributed, with a mean of 45 months and a SD of 8 months. If Quick Start guarantees a full refund on any battery that fails within the 36-month period after purchase, what percentage of its batteries will the co. expects to replace? If Quick Start does not want to replace more than 10% of its batteries under the full-refund guarantee policy, for how long should the co. guarantee the batteries (to the nearest month)? a) Find the z value that corresponds to x=36. z= (36-45)/8 = -1.125 We need to find P (z<-1.125). Since -1.125 is midway between -1.12 and -1.13 we need to take the average of the 2 areas as our estimate. The areas are .3686 and .3708; their average is .3679. P (z<-1.125)= .5000 - .3679 = .1303 They will replace about 13% of their batteries. b) x is the life span of the battery. Find value of x so that 10% will no longer work and 90% still work. Find the z score with 10% of the area under the curve falling to its left. This means that we need to find the z score for the left area of .4000. z = -1.28 Convert z to x = µ + z.σ = 45 + (-1.28)8 = 34.76 or 35 months The resting heart rate for an adult horse should average about µ=46 beats per minute with (95% of data) range from 22 to 70 beats per minute, based on information from the Merck Veterinary Manual (a classic vet. reference). Let x be a random variable that represents the resting heart rate for an adult horse. Assume that x has a distribution that is approximately normal. a) Estimate the SD of the x distribution. A good estimate is: 95% fall between -2σ and +2σ we can get 1σ by dividing the range by 4. (70-22)/4 = 12 σ= 12 beats per minute b) What is the probability that the heart rate is less than 25 beats per minute? Find the z score that corresponds to x=25. z = (25-46) / 12 = -1.75 Use the table to find the area under the curve for z = -1.75. It is .4599. P (z< -1.175) = .5000 - .4599 = .0401 c) What is the probability that the heart rate is greater than 60 beats per minute? Find the z score that corresponds to x=60. z = (60-46) / 12 = 1.17 Use the table to find the area under the curve for z = 1.17. It is .3790. P (z > 1.17) = .5000 - .3790 = .1210 d) What is the probability that the heart rate is between 25 and 60 beats per minute? Add .4599 and .3790 The probability is .8389 e) A horse whose resting heart rate is in the upper 10% of the probability distribution of heart rates may have a secondary infection or illness that needs to be treated. What is the heart rate corresponding to the upper 10% cutoff point of the probability distribution? First, find the z score that separates the lower 90% from the upper 10% of the data. from the table, find the z score that corresponds to .4000 (closest is .3997). z=1.28. So x= 46+(1.28)12 = 61.36 A horse with a heart rate of 61 or more should be further examined for illness.