Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1/16/15 Learning Objectives 1. Random variable Probability Distributions 2. Probability distributions for discrete random variables 3. Mean of a probability distribution 4. Summarizing the spread of a probability distribution Section 1: How Can We Summarize Possible Outcomes and Their Probabilities? 5. Probability distribution for continuous random variables 1 2 Learning Objective 1: Random Variable Learning Objective 1: Randomness ! Suppose that the numerical values that ! A random variable is a numerical a variable assumes are the result of some random phenomenon, e.g., ! Selecting a random sample for a population or ! Performing a randomized experiment measurement of the outcome of a random phenomenon. 3 4 1 1/16/15 Learning Objective 1: Random Variable Learning Objective 2: Probability Distribution ! Use letters near the end of the alphabet, such as x, to symbolize a particular value of the random variable ! The probability distribution of a random variable specifies its possible values and their probabilities. ! Use a capital letter, such as X, to refer to the random variable itself. Note: In spite of the randomness of the variable, there is a pattern of randomness that allows us to specify probabilities for the outcomes Example: Flip a coin three times ! X=number of heads in the 3 flips; defines the random variable ! x=2; represents a possible value of the random variable 5 Learning Objective 2: Probability Distribution of a Discrete Random Variable 6 Learning Objective 2: Example ! What is the estimated probability of at least three ! A discrete random variable X has separate values home runs? (such as 0,1,2,…) as its possible outcomes P(3)+P(4)+P(5)=0.13+0.03+0.01=0.17 ! Its probability distribution assigns a probability P(x) to each possible value x: ! For each x, the probability P(x) falls between 0 and 1 ! The sum of the probabilities for all the possible x values equals 1 7 8 2 1/16/15 Learning Objective 3: The Mean of a Discrete Probability Distribution Learning Objective 3: Expected Value of X ! The mean of a probability distribution for a ! The mean of a probability distribution of a discrete random variable is µ = ∑ x ⋅ p (x) where the sum is taken over all possible values of x. ! The mean of a probability distribution is denoted by the parameter, µ. ! The mean is a weighted average; values of x that random variable X is also called the expected value of X. ! The expected value reflects not what we’ll observe in a single observation, but rather that we expect for the average in a long run of observations. ! It is not unusual for the expected value of a random variable to equal a number that is NOT a possible outcome. are more likely receive greater weight P(x) 9 10 Learning Objective 4: The Standard Deviation of a Probability Distribution Learning Objective 3: Example ! Find the mean of this probability distribution. The standard deviation of a probability distribution, denoted by the parameter, σ, measures its spread. ! Larger values of σ correspond to greater spread. ! Roughly, The mean: σ describes how far the random variable falls, on the average, from the mean of its distribution µ = ∑ x ⋅ p (x ) = 0(0.23) + 1(0.38) + 2(0.22) + 3(0.13) + 4(0.03) + 5(0.01) = 1.38 11 12 3 1/16/15 Learning Objective 5: Continuous Random Variable Learning Objective 5: Probability Distribution of a Continuous Random Variable ! A continuous random variable has an ! A random variable is continuous if its set of possible values forms an interval. infinite continuum of possible values in an interval. ! Examples are: time, age and size measures such as height and weight. ! Continuous variables are measured in a discrete manner because of rounding. ! Its probability distribution is specified by a density curve: the probability of any interval is the area under the curve and above that interval. ! Each interval has probability between 0 and 1. ! The interval containing all possible values has probability equal to 1. 13 14 Learning Objectives 1. Normal Distribution 2. 68-95-99.7 Rule for normal distributions 3. Z-Scores and the Standard Normal Distribution 4. The Standard Normal Table: Finding Probabilities Section 2: How Can We Find Probabilities for Bell-Shaped Distributions? 5. Using the TI-calculator: find probabilities 15 16 4 1/16/15 Learning Objectives Learning Objective 1: Normal Distribution 6. Using the Standard Normal Table in The normal distribution is symmetric, bellshaped and characterized by its mean µ and standard deviation σ. Reverse 7. Using the TI-calculator: find z-scores 8. Probabilities for Normally Distributed Random Variables 9. Percentiles for Normally Distributed Random Variables 10. Using Z-scores to Compare Distributions ! The normal distribution is the most important distribution in statistics Many distributions have an approximate normal distribution ! Approximates many discrete distributions well when there are a large number of possible outcomes ! Many statistical methods use it even when the data are not bell shaped ! 17 18 Learning Objective 1: Normal Distribution Learning Objective 1: Normal Distribution ! Normal distributions are ! Bell shaped ! Within what interval do almost all of the ! Symmetric men’s heights fall? Women’s height? around the mean ! The mean (µ) and the standard deviation (σ) completely describe the density curve ! Increasing/decreasing µ moves the curve along the horizontal axis ! Increasing/decreasing σ controls the spread of the curve 19 20 5 1/16/15 Learning Objective 2: 68-95-99.7 Rule for Any Normal Curve Learning Objective 2: Example : 68-95-99.7% Rule ! 68% of the observations fall within one standard deviation of the mean ! 95% of the observations fall within two standard deviations of the mean ! 99.7% of the observations fall within three standard deviations of the mean ! Heights of adult women ! can ! µ be approximated by a normal distribution = 65 inches; σ = 3.5 inches ! 68-95-99.7 Rule for women’s heights " 68% are between 61.5 and 68.5 inches " 95% are between 58 and 72 inches " 99.7% are between 54.5 and 75.5 inches [ µ - σ = 65 - 3.5 ] [ µ + 2σ = 65 + 2(3.5) = 65 + 7 ] [ µ - 3σ = 65 - 3(3.5) = 65 - 10.5 ] 21 Learning Objective 3: Z-Scores and the Standard Normal Distribution Learning Objective 2: Example : 68-95-99.7% Rule ! The z-score for a value x of a random variable is ! What proportion of women are less than 69 the number of standard deviations that x falls from the mean inches tall? z= ? = 84% 16% 22 68% (by 68-95-99.7 Rule) ! A negative (positive) z-score indicates that the value is below (above) the mean ! z-scores can be used to calculate the probabilities ? -1 x −µ σ of a normal random variable using the normal tables in the back of the book +1 65 68.5 23 (height values) 24 6 1/16/15 Learning Objective 3: Z-Scores and the Standard Normal Distribution Learning Objective 4: Table A: Standard Normal Probabilities ! A standard normal distribution has mean Table A enables us to find normal probabilities ! It tabulates the normal cumulative probabilities falling below the point µ+zσ µ=0 and standard deviation σ=1 ! When a random variable X has a normal distribution and its values are converted to zscores (by subtracting the mean and dividing by the standard deviation), the new random variable Z whose values are these z-scores has the standard normal distribution. To use the table: ! Find the corresponding z-score up the closest standardized score (z) in the table. ! Look ! ! First column gives z to the first decimal place First row gives the second decimal place of z ! The corresponding probability found in the body of the table gives the probability of falling below the z-score 25 26 Learning Objective 4: Example: Using Table A Learning Objective 4: Example: Using Table A ! Find the probability that a normal random variable ! Find the probability that a normal random variable takes a value greater than 1.43 standard deviations above µ: P(z>1.43)=1-.9236=.0764 takes a value less than 1.43 standard deviations above µ; P(z<1.43)=.9236 TI Calculator = Normcdf(-1e99,1.43,0,1)= .9236 27 TI Calculator = Normcdf(1.43,1e99,0,1)= 0.0764 28 7 1/16/15 Learning Objective 4: Example: Learning Objective 5: Using the TI Calculator ! Find the probability that a normal random variable To calculate the cumulative probability assumes a value within 1.43 standard deviations of µ ! Probability below 1.43σ = .9236 ! Probability below -1.43σ = .0764 (1-.9236) ! P(-1.43<z<1.43) =.9236-.0764=.8472 ! 2nd DISTR; 2:normalcdf(lower bound, upper bound,mean,sd) ! Use –1E99 for negative infinity and 1E99 for positive infinity 29 30 TI Calculator = Normcdf(-1.43,1.43,0,1)= .8472 Learning Objective 6: How Can We Find the Value of z for a Certain Cumulative Probability? Learning Objective 5: Find Probabilities Using TI Calculator ! Find probability to the left of -1.64 ! P(z<-1.64)=normcdf(-1e99,-1.64,0,1)=.0505 ! To solve some of our problems, we will need to find the value of z that corresponds to a certain normal cumulative probability ! To do so, we use Table A in reverse ! Find probability to the right of 1.56 ! P(z>1.56)=normcdf(1.56,1e99,0,1)=.0594 ! Rather than finding z using the first column (value of z up to one decimal) and the first row (second decimal of z) ! Find probability between -.50 and 2.25 ! P(-.5<z<2.25)=normcdf(-.5,2.25,0,1)=.6793 ! ! 31 Find the probability in the body of the table The z-score is given by the corresponding values in the first column and row 32 8 1/16/15 Learning Objective 6: How Can We Find the Value of z for a Certain Cumulative Probability? Learning Objective 6: How Can We Find the Value of z for a Certain Cumulative Probability? ! Example: Find the value of z for a cumulative ! Example: Find the value of z for a cumulative probability of 0.975. probability of 0.025. ! Look up the cumulative probability of 0.975 in the ! Look up the cumulative probability of 0.025 in the body of Table A. body of Table A. ! A cumulative probability of 0.025 corresponds to z = -1.96. ! Thus, the probability that a normal ! A cumulative probability of 0.975 corresponds to z = 1.96. ! Thus, the probability that a normal random variable takes a value no more than 1.96 standard deviations above the mean is 0.975. random variable falls at least 1.96 standard deviations below the mean is 0.025. 33 34 Learning Objective 7: Using the TI Calculator to Find Z-Scores for a Given Probability Learning Objective 7: Examples ! 2nd DISTR 3:invNorm; Enter ! The probability that a standard normal random variable assumes a value that is ≤ z is 0.975. What is z? Invnorm(.975,0,1)=1.96 ! The probability that a standard normal random variable assumes a value that is > z is 0.0275. ! invNorm(percentile,mean,sd) ! Percentile is the probability under the curve from negative infinity to the z-score What is z? Invnorm(.975,0,1)=1.96 ! The probability that a standard normal random ! Enter variable assumes a value that is ≥ z is 0.881. What is z? Invnorm(1-.881,0,1)=-1.18 ! The probability that a standard normal random variable assumes a value that is < z is 0.119. What is z? Invnorm(.119,0,1)= -1.18 35 36 9 1/16/15 Learning Objective 7: Example Learning Objective 8: Finding Probabilities for Normally Distributed Random Variables ! Find the z-score z such that the probability 1. State the problem in terms of the observed random variable X, i.e., P(X<x) within z standard deviations of the mean is 0.50. 2. Standardize X to restate the problem in terms of a standard normal variable Z ⎛ x − µ⎞ P(X < x) = P⎜ Z < z = ⎟ ⎝ σ ⎠ 3. Draw a picture to show the desired probability under the standard normal curve 4. Find the area under the standard normal curve using Table A ! Invnorm(.75,0,1)= .67 ! Invnorm(.25,0,1)= -.67 ! Probability = P(-.67<Z<.67)=.5 37 38 Learning Objective 8: P(X<x) Learning Objective 8: P(X>x) ! Adult systolic blood pressure is normally ! Adult systolic blood pressure is normally distributed distributed with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure less than 100? ⎛ (100 −120) ⎞ = P(z < −1.00) = .1587 ! P(X<100) = P⎜ Z < ⎟ with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure greater than 100? ! P(X>100) = 1 – P(X<100) ! Normcdf(-1E99,100,120,20)=.1587 ! P(X>100)= 1-.1587=.8413 ⎝ 20 ⎛ (100 −120) ⎞ = P(Z < −1.00) = .1587 P⎜ Z < ⎟ 20 ⎝ ⎠ ⎠ ! Normcdf(100,1e99,120,20)=.8413 ! 84.1% of adults have systolic blood pressure greater than ! 15.9% of adults have systolic blood pressure less 100 than 100 39 40 10 1/16/15 Learning Objective 8: P(X>x) Learning Objective 8: P(a<X<b) ! Adult systolic blood pressure is normally distributed ! Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure greater than 133? ! P(X>133) = 1 – P(X<133) ! P(100<X<133) = P(X<133)-P(X<100) with µ = 120 and σ = 20. What percentage of adults have systolic blood pressure between 100 and 133? ⎛ (133 −120) ⎞ = P(Z < .65) = .7422 P⎜ Z < ⎟ 20 ⎝ ⎠ ⎛ (133 −120) ⎞ − P⎛ Z < (100 −120) ⎞ = P⎜ Z < ⎟ ⎜ ⎟ 20 20 ⎝ ⎠ ⎝ ⎠ P(Z < .65) − P(Z < −1.00) = .7422 − .1587 = .5835 ! P(X>133)= 1-.7422=.2578 ! Normcdf(133,1E99,120,20)=.2578 ! 25.8% of adults have systolic blood pressure greater than ! Normcdf(100,133,120,20)=.5835 ! 58% of adults have systolic blood pressure between 100 133 and 133 41 42 Learning Objective 9: Find X Value Given Area to Left Learning Objective 9: Find X Value Given Area to Right ! Adult systolic blood pressure is normally distributed ! Adult systolic blood pressure is normally distributed with µ = 120 and σ = 20. What is the ! P(X<x)=.25, find x: ! ! 1st with µ = 120 and σ = 20. 10% of adults have systolic blood pressure above what level? ! P(X>x)=.10, find x. quartile? Look up .25 in the body of Table A to find z= -0.67 Solve equation to find x: ! x = µ + zσ =120 + (−0.67) * 20 =106.6 ! Check: ! P(X<106.6) P(Z<-0.67)=0.25 ! TI Calculator = Invnorm(.25,120,20)=106.6 ! ! P(X>x)=1-P(X<x) Look up 1-0.1=0.9 in the body of Table A to find z=1.28 Solve equation to find x: x = µ + zσ =120 + (1.28) * 20 =145.6 ! Check: ! P(X>145.6) =P(Z>1.28)=0.10 ! TI Calculator = Invnorm(.9,120,20)=145.6 43 44 11 1/16/15 Learning Objective 10: Using Z-scores to Compare Distributions Z-scores can be used to compare observations from different normal distributions ! Example: ! You score 650 on the SAT which has µ=500 and σ=100 and 30 on the ACT which has µ=21.0 and σ=4.7. On which test did you perform better? ! Compare z-scores SAT: ACT: 30 − 21 650 − 500 z= = 1.91 z= = 1.5 4.7 100 ! Section 3: How Can We Find Probabilities When Each Observation Has Two Possible Outcomes? Since your z-score is greater for the ACT, you performed better on this exam 45 46 Learning Objective 1: The Binomial Distribution Learning Objectives The Binomial Distribution Conditions for a Binomial Distribution Probabilities for a Binomial Distribution Factorials Examples using Binomial Distribution Do the Binomial Conditions Apply? Mean and Standard Deviation of the Binomial Distribution 8. Normal Approximation to the Binomial ! Each observation is binary: it has one of two 1. 2. 3. 4. 5. 6. 7. possible outcomes. ! Examples: ! ! ! 47 Accept, or decline an offer from a bank for a credit card. Have, or do not have, health insurance. Vote yes or no on a referendum. 48 12 1/16/15 Learning Objective 2: Conditions for the Binomial Distribution Learning Objective 3: Probabilities for a Binomial Distribution ! Each of n trials has two possible outcomes: ! Denote the probability of success on a trial by “success” or “failure”. p. ! Each trial has the same probability of success, ! For n independent trials, the probability of x denoted by p. successes equals: ! The n trials are independent. ! The binomial random variable X is the number of successes in the n trials. P(x) = n! p x (1− p) n−x , x = 0,1,2,...,n x!(n - x)! 49 50 Learning Objective 4: Factorials Learning Objective 5: Example: Finding Binomial Probabilities Rules for factorials: ! John Doe claims to possess ESP. ! An experiment is conducted: ! n!=n*(n-1)*(n-2)…2*1 ! ! 1!=1 ! 0!=1 ! For example, ! ! 4!=4*3*2*1=24 ! 51 A person in one room picks one of the integers 1, 2, 3, 4, 5 at random. In another room, John Doe identifies the number he believes was picked. Three trials are performed for the experiment. Doe got the correct answer twice. 52 13 1/16/15 Learning Objective 5: Example 1 Learning Objective 5: Example 1 If John Doe does not actually have ESP and is actually guessing the number, what is the probability that he’d make a correct guess on two of the three trials? # The three ways John Doe could make two correct guesses in three trials are: SSF, SFS, and FSS. # Each of these has probability: (0.2)2(0.8)=0.032. The probability of exactly 2 correct guesses is the binomial probability with n = 3 trials, x = 2 correct guesses and p = 0.2 probability of a correct guess. P(2) = # The total probability of two correct guesses is 3(0.032)=0.096. 3! (0.2) 2 (0.8)1 = 3(0.04)(0.8) = 0.096 2!1! 2nd Vars 0:binampdf(n,p,x) Binampdf(3,.2,2)=0.096 53 54 Learning Objective 5: Binomial Example 2 Learning Objective 6: Do the Binomial Conditions Apply? ! 1000 employees, 50% Female ! Before using the binomial distribution, ! None of the 10 employees chosen for management training were female. ! Binary data (success or failure). same probability of success for each trial (denoted by p). ! Independent trials. ! The ! The probability that no females are chosen is: P(0) = check that its three conditions apply: 10! (0.50) 0 (0.50)10 = 0.001 0!10! ! Binompdf(10,.5,0)=9.765625E-4 ! It is very unlikely (one chance in a thousand) that none of the 10 selected for management training would be female if the employees were chosen randomly 55 56 14 1/16/15 Learning Objective 7: Binomial Mean and Standard Deviation Learning Objective 6: Do the Binomial Conditions Apply to Example 2? ! The data are binary (male, female). ! The binomial probability distribution for n trials ! If employees are selected randomly, the probability of selecting a female on a given trial is 0.50. ! With random sampling of 10 employees from a large population, outcomes for one trial does not depend on the outcome of another trial with probability p of success on each trial has mean µ and standard deviation σ given by: µ = np, σ = np(1 - p) 57 58 Learning Objective 7: Example: Racial Profiling? Learning Objective 7: Example: Racial Profiling? ! Assume: ! 262 car stops represent n = 262 trials. ! Data: ! 262 police car stops in Philadelphia in 1997. ! 207 of the drivers stopped were African-American. ! In 1997, Philadelphia’s population was 42.2% African-American. ! Successive police car stops are independent. ! P(driver is African-American) is p = 0.422. ! Calculate the mean and standard deviation ! Does the number of African-Americans stopped suggest possible bias, being higher than we would expect (other things being equal, such as the rate of violating traffic laws)? of this binomial distribution: µ = 262(0.422) = 111 59 σ = 262(0.422)(0.578) = 8 60 15 1/16/15 Learning Objective 7: Example: Racial Profiling? Learning Objective 7: Example: Racial Profiling? ! Recall: Empirical Rule ! When a distribution is bell-shaped, close to 100% of the observations fall within 3 standard deviations of the mean. u - 3σ = 111 - 3(8) = 87 µ + 3σ = 111 + 3(8) = 135 ! If there is no racial profiling, we would not be surprised if between about 87 and 135 of the 262 drivers stopped were African-American. ! The actual number stopped (207) is well above these values. ! The number of African-Americans stopped is too high, even taking into account random variation. Limitation of the analysis: Different people do different amounts of driving, so we don’t really know that 42.2% of the potential stops were AfricanAmerican. 61 62 Learning Objective 8: Approximating the Binomial Distribution with the Normal Distribution ! The binomial distribution can be well approximated by the normal distribution when the expected number of successes, np, and the expected number of failures, n(1-p) are both at least 15. 63 16