Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Today’s agenda Unit 2: Probability and distributions Lecture 2: Binomial and Normal distribution Tree diagrams recap (from the reading 2.2.6) The binomial distribution and its parameters Statistics 101 The normal distribution and its parameters Gary Larson The normal approximation to the binomial Relevant reading: 3.4 (thru p142), 3.1. July 7, 2015 Statistics 101 (Gary Larson) Binary outcomes July 7, 2015 2 / 57 Binary outcomes Milgram experiment Milgram experiment (cont.) Stanley Milgram, a Yale University psychologist, conducted a series of experiments on obedience to authority starting in 1963. These experiments measured the willingness of participants to obey an authority who instructed them to perform acts conflicting with their personal conscience. Experimenter (E) orders the teacher (T), the subject of the experiment, to give severe electric shocks to a learner (L) each time the learner answers a question incorrectly. About 65% of people would obey authority and give such shocks, and only 35% refused. Over the years, additional research suggested this number is approximately consistent across communities and time. The learner is an actor and the shocks are not real. But a prerecorded sound is played each time the teacher administers a shock. Statistics 101 (Gary Larson) U2 - L2: Normal distribution U2 - L2: Normal distribution July 7, 2015 3 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 4 / 57 Binary outcomes Binomial distribution Suppose we randomly select four individuals to participate in this experiment. What is the probability that exactly 1 of them will refuse to administer the shock? Binary outcomes Observing the behavior of each “teacher” in Milgram’s experiment can be thought of as a trial which has two possible outcomes (binary outcome). Let’s call these people Allen (A), Brittany (B), Caroline (C), and Damian (D). Each one of the four scenarios below will satisfy the condition of “exactly 1 of them refuses to administer the shock”: A trial is labeled a success if that teacher refuses to administer a severe shock, and failure if he does administer a shock. Scenario 1 Scenario 2 Since only 35% of people refused to administer a shock, the probability of success is p = 0.35. Scenario 3 When an individual trial has a binary outcome of “success”/“failure” and an associated probability of success, the trial is also called a Bernoulli random variable. Scenario 4 Statistics 101 (Gary Larson) U2 - L2: Normal distribution Binomial distribution July 7, 2015 (A) refuse .35 (A) shock .65 (A) shock .65 (A) shock .65 × × × × (B) shock .65 (B) refuse .35 (B) shock .35 (B) shock .35 × × × × (C) shock .65 (C) shock .65 (C) refuse .35 (C) shock .65 × × × × (D) shock .65 (D) shock .65 (D) shock .65 (D) refuse .35 = 0.0961 = 0.0961 = 0.0961 = 0.0961 The probability of exactly one 1 of 4 people refusing to administer the shock is the sum of all of these probabilities. Today we are studying sequences of Bernoulli trials, using the binomial distribution. 0.0961 + 0.0961 + 0.0961 + 0.0961 = 4 × 0.0961 = 0.3844 5 / 57 Statistics 101 (Gary Larson) The binomial distribution U2 - L2: Normal distribution Binomial distribution Binomial distribution July 7, 2015 6 / 57 The binomial distribution Counting the # of scenarios The question from the prior slide asked for the probability of given number of successes, k , in a given number of trials, n, (k = 1 success in n = 4 trials), and we calculated this probability as Earlier we wrote out all possible scenarios that fit the condition of exactly one person refusing to administer the shock. If n was larger and/or k was different than 1, for example, n = 9 and k = 2: # of scenarios × P (single scenario ) RRSSSSSSS SRRSSSSSS SSRRSSSSS # of scenarios: there is a less tedious way to figure this out, ··· we’ll get to that shortly... SSRSSRSSS P (single scenario ) = p k (1 − p )(n−k ) ··· probability of success to the power of number of successes, probability of failure to the power of number of failures SSSSSSSRR The binomial distribution describes the probability of having exactly k successes in n independent Bernouilli trials with probability of success p. Statistics 101 (Gary Larson) Considering many scenarios U2 - L2: Normal distribution July 7, 2015 writing out all possible scenarios would be incredibly tedious and prone to errors. 7 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 8 / 57 Binomial distribution The binomial distribution Binomial distribution Calculating the # of scenarios The binomial distribution Conditions for the Binomial Distribution Choose function The choose function is useful for calculating the number of ways to choose k successes in n trials. ! n n! = k k !(n − k )! k = 1, n = 4: 4 k = 2, n = 9: 9 1 2 = 4! 1!(4−1)! = 9! 2!(9−2)! = 4×3×2×1 1×(3×2×1) =4 = 9×8×7! 2×1×7! 72 2 = 1 The number of trials n is fixed. 2 The trials are independent. 3 Each trial outcome can be classified as a success or failure. 4 The probability of success p is the same for each trial. The parameters of the binomial distribution are n and p = 36 Note: You can also use R for these calculations: > choose(9,2) [1] 36 Statistics 101 (Gary Larson) U2 - L2: Normal distribution Binomial distribution July 7, 2015 9 / 57 Statistics 101 (Gary Larson) The binomial distribution U2 - L2: Normal distribution Binomial distribution Binomial distribution (cont.) July 7, 2015 10 / 57 Example Example Binomial probabilities If p represents probability of success, (1 − p ) represents probability of failure, n represents number of independent trials, and k represents number of successes ! P (k successes in n trials ) = n k p (1 − p )(n−k ) k At Duke University, 82% of students live in university owned or affiliated housing. A group of 12 students was randomly chosen to speak to incoming students at orientation. What is the probability that (1) zero, (2) one, (3) two of these students live in student housing? P (Zero UH) = P (One UH) = P (Two UH) = # of scenarios × P (single scenario ) Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 11 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 12 / 57 Binomial distribution Example Binomial distribution Example Example Participation question Which of the following is not a condition that needs to be met for the binomial distribution to be applicable? Now suppose we are going to randomly select 5 of these students to be on a panel to speak to the parents. What is the probability that at least four (ie four or more) of these selected students live in student housing? (a) the trials must be independent (b) the number of trials, n, must be fixed (c) each trial outcome must be classifiable as a success or a failure P (At least four) = (d) the number of desired successes, k , must be greater than the number of trials P (At least one) = (e) the probability of success, p, must be the same for each trial Statistics 101 (Gary Larson) U2 - L2: Normal distribution Binomial distribution July 7, 2015 13 / 57 Statistics 101 (Gary Larson) Example U2 - L2: Normal distribution Binomial distribution July 7, 2015 14 / 57 Example Participation question Participation question A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 10 Americans, what is the probability that exactly 8 are obese? A 2012 Gallup survey suggests that 26.2% of Americans are obese. Among a random sample of 10 Americans, what is the probability that exactly 8 are obese? (a) pretty high (a) 0.2628 × 0.7382 (b) pretty low (b) 8 × 0.2628 × 0.7382 10 8 2 (c) 10 8 × 0.262 × 0.738 2 8 (d) 10 8 × 0.262 × 0.738 Gallup: http:// www.gallup.com/ poll/ 160061/ obesity-rate-stable-2012.aspx , January 23, 2013. Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 15 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 16 / 57 Binomial distribution Expected value and variability of successes Binomial distribution Expected value Expected value and variability of successes Expected value and its variability Mean and standard deviation of binomial distribution A 2012 Gallup survey suggests that 26.2% of Americans are obese. µ = np Among a random sample of 100 Americans, how many would you expect to be obese? σ= q np (1 − p ) Going back to the obesity rate: Easy enough, 100 × 0.262 = 26.2. σ= Or more formally, µ = np = 100 × 0.262 = 26.2. √ q np (1 − p ) = 100 × 0.262 × 0.738 ≈ 4.4 We would expect 26.2 out of 100 randomly sampled Americans to be obese, give or take 4.4. But this doesn’t mean in every random sample of 100 people exactly 26.2 will be obese. In fact, that’s not even possible. In some samples this value will be less, and in others more. How much would we expect this value to vary? Note: Mean and standard deviation of a binomial might not always be whole numbers, and that is alright, these values represent what we would expect to see on average. Statistics 101 (Gary Larson) U2 - L2: Normal distribution Binomial distribution July 7, 2015 17 / 57 Statistics 101 (Gary Larson) Expected value and variability of successes U2 - L2: Normal distribution Binomial distribution July 7, 2015 18 / 57 Expected value and variability of successes Participation question Unusual observations Using the notion that observations that are more than 2 standard deviations away from the mean are considered unusual and the mean and the standard deviation we just computed, we can calculate a range for the plausible number of obese Americans in random samples of 100. An August 2012 Gallup poll suggests that 13% of Americans think home schooling provides an excellent education for children. Would a random sample of 1,000 Americans where only 100 share this opinion be considered unusual? (a) No (b) Yes 26.2 ± (2 × 4.4) = (17.4, 35) 26.2: a measure of center 2: a multiple 4.4: a measure of spread result: a range of values This procedure (and variants thereof) will reappear many times and be studied much more in-depth in this course. Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 http:// www.gallup.com/ poll/ 156974/ private-schools-top-marks-educating-children.aspx 19 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 20 / 57 Binomial distribution Expected value and variability of successes Binomial distribution Expected value and variability of successes An analysis of Facebook users A recent study found that “Facebook users get more than they give”. For example: 40% of Facebook users in our sample made a friend request, but 63% received at least one request Users in our sample pressed the like button next to friends’ content an average of 14 times, but had their content “liked” an average of 20 times A recent study found that approximately 25% of all Facebook users are power users. For a Facebook user with 245 friends, what is the probability that he/she has 70 or more friends who are power users? We are given that n = 245, p = 0.25, and we are asked for the probability P (K ≥ 70). P (K ≥ 70) = P (K = 70 or K = 71 or K = 72 or · · · or K = 245) Users sent 9 personal messages, but received 12 = P (K = 70) + P (K = 71) + P (K = 72) + · · · + P (K = 245) 12% of users tagged a friend in a photo, but 35% were themselves tagged in a photo This seems like an awful lot of work... Any guesses for how this pattern can be explained? http:// www.pewinternet.org/ Reports/ 2012/ Facebook-users/ Summary.aspx Statistics 101 (Gary Larson) U2 - L2: Normal distribution Binomial distribution July 7, 2015 21 / 57 Statistics 101 (Gary Larson) Expected value and variability of successes U2 - L2: Normal distribution Binomial distribution Histograms of number of successes July 7, 2015 22 / 57 Expected value and variability of successes We need some more tools... Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300. What happens as n increases? Ideally, we would like to be able to avoid adding up the heights of all those little bars. 0 2 4 6 0 2 n = 10 4 6 8 So, instead of doing all that tediuousness, we are going to drape a density curve over the histogram. (what’s a density curve, you ask? hang on.) 10 n = 30 This turns our discrete distribution (binomial) into a continuous distribution. A very special continuous distribution indeed. 0 5 10 15 20 10 20 n = 100 Statistics 101 (Gary Larson) 30 40 50 n = 300 U2 - L2: Normal distribution July 7, 2015 23 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 24 / 57 Binomial distribution Expected value and variability of successes Binomial distribution But first...what’s a density curve? Expected value and variability of successes Imagining Density Curves A density curve is a smoothed histogram where the total area under the curve is 1. We have a very large sample size. What does our histogram look like? What does this mean the density curve will look like? Measuring areas under a density curve corresponds to measuring probabilities To draw a density curve from a histogram simply connect the peaks of a histogram with a smooth line, and normalize (i.e. adjust) the values of the y-axis such that the area under the curve is 1. 0 2 4 6 0 2 n = 10 0 5 10 4 6 8 10 n = 30 15 20 10 n = 100 20 30 40 50 n = 300 This distribution is very important, so we pause in our binomial discussion to explore it a little. Statistics 101 (Gary Larson) U2 - L2: Normal distribution Introducting the Normal distribution July 7, 2015 25 / 57 Normal distribution model Statistics 101 (Gary Larson) U2 - L2: Normal distribution Introducting the Normal distribution Normal distribution July 7, 2015 26 / 57 Normal distribution model Heights of males Denoted as N (µ, σ) → Normal with two parameters: mean µ and standard deviation σ (or presented using variance σ2 ) Unimodal and symmetric, bell shaped curve, that also follows well-known rules about how the data are distributed around µ While many variables in the real world are distributed nearly normal, virtually none are exactly normal Arguably the most important distribution in the history of statistics “The male heights on OkCupid very nearly follow the expected normal distribution – except the whole thing is shifted to the right of where it should be. Almost universally guys like to add a couple inches.” “You can also see a more subtle vanity at work: starting at roughly 5’ 8”, the top of the dotted curve tilts even further rightward. This means that guys as they get closer to six feet round up a bit more than usual, stretching for that coveted psychological benchmark.” http:// blog.okcupid.com/ index.php/ the-biggest-lies-in-online-dating/ Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 27 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 28 / 57 Introducting the Normal distribution Normal distribution model Introducting the Normal distribution Heights of females 68-95-99.7 Rule 68-95-99.7 Rule 68% 95% 99.7% “When we looked into the data for women, we were surprised to see height exaggeration was just as widespread, though without the lurch towards a benchmark height.” µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ For nearly normally distributed data, about 68% falls within 1 SD of the mean, about 95% falls within 2 SD of the mean, about 99.7% falls within 3 SD of the mean. It is possible for observations to fall 4, 5, or more standard deviations away from the mean, but these occurrences are very rare if the data are nearly normal. http:// blog.okcupid.com/ index.php/ the-biggest-lies-in-online-dating/ Statistics 101 (Gary Larson) U2 - L2: Normal distribution Introducting the Normal distribution July 7, 2015 29 / 57 Statistics 101 (Gary Larson) 68-95-99.7 Rule U2 - L2: Normal distribution Introducting the Normal distribution July 7, 2015 30 / 57 Standardizing with Z scores SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. ACT scores are distributed nearly normally with mean 21 and standard deviation 5. A college admissions officer wants to determine which of the two applicants scored better on their standardized test with respect to the other test takers: Pam, who earned an 1800 on her SAT, or Jim, who scored a 24 on his ACT? Describing variability using the 68-95-99.7 Rule SAT scores are distributed nearly normally with mean 1500 and standard deviation 300. 68% Jim Pam 95% 99.7% 600 900 1200 1500 1800 2100 2400 ∼68% of students score between 1200 and 1800 on the SAT. ∼95% of students score between 900 and 2100 on the SAT. 600 900 1200 1500 1800 2100 2400 6 11 16 21 26 31 36 ∼99.7% of students score between 600 and 2400 on the SAT. Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 31 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 32 / 57 Introducting the Normal distribution Standardizing with Z scores Introducting the Normal distribution Standardizing with Z scores Standardizing with Z scores (cont.) Since we cannot just compare these two raw scores, we instead compare how many standard deviations beyond the mean each observation is. These are called standardized scores, or Z scores. 1800−1500 = 1 standard deviation above the mean. 300 24−21 = 0.6 standard deviations above the mean. 5 Pam’s score is Jim’s score is Standardizing with Z scores Z score of an observation is the number of standard deviations it falls above or below the mean. Z scores Z= Jim Pam observation − mean SD Z scores are defined for distributions of any shape, but (BONUS!) when the distribution is normal, we can use Z scores to calculate percentiles / areas under the normal distribution density curve (i.e. probabilities!). −2 −1 Statistics 101 (Gary Larson) 0 1 U2 - L2: Normal distribution Introducting the Normal distribution Observations that are more than 2 SD away from the mean (|Z | > 2) are usually considered unusual. 2 July 7, 2015 33 / 57 Standardizing with Z scores 1200 34 / 57 Standardizing with Z scores 1500 1800 2100 Percentile is the percentage of observations that fall below a given data point. Graphically, percentile is the area below the probability distribution curve to the left of that observation. 2400 600 Statistics 101 (Gary Larson) July 7, 2015 Percentiles Approximately what percent of students score below 1800 on the SAT? The mean SAT score is 1500, with a standard deviation of 300 (Hint: Use the 68-95-99.7% rule.) 900 U2 - L2: Normal distribution Introducting the Normal distribution Approximating percentiles 600 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 35 / 57 Statistics 101 (Gary Larson) 900 1200 1500 1800 U2 - L2: Normal distribution 2100 2400 July 7, 2015 36 / 57 Introducting the Normal distribution Standardizing with Z scores Introducting the Normal distribution Standardizing with Z scores Z-Scores, N (µ, σ2 ), and N (0, 1) Calculating percentiles - using computation The standard normal distribution is defined as N (0, 1): the normal distribution with µ = 0 and σ = 1. There are many ways to compute percentiles/areas under the curve: R: Very useful theoretical result: > pnorm(1800, mean = 1500, sd = 300) [1] 0.8413447 If a random variable X ∼ N (µ, σ2 ), then the random variable Z = (X − µ)/σ is distributed N (0, 1) Applet: http:// www.socr.ucla.edu/ htmls/ SOCR Distributions.html The above implies that if your original data is (approximately) normally distributed, the Z scores are distributed (approximately) N (0, 1). “A z-score puts values from any normal distribution N (µ, σ2 ) onto a common scale” ... by “converting” the N (µ, σ2 ) values to values from the standard normal distribution N (0, 1). This means, if we had N (0, 1) percentiles, we could use them to calculate percentiles for any N (µ, σ2 ) distribution. Statistics 101 (Gary Larson) U2 - L2: Normal distribution Introducting the Normal distribution July 7, 2015 37 / 57 Statistics 101 (Gary Larson) Standardizing with Z scores U2 - L2: Normal distribution Introducting the Normal distribution July 7, 2015 38 / 57 Standardizing with Z scores Calculating percentiles - using tables Participation question Which of the following is false? (a) Z scores are helpful for determining how unusual a data point is compared to the rest of the data in the distribution. (b) Majority of Z scores in a right skewed distribution are negative. (c) Regardless of the shape of the distribution (symmetric vs. skewed) the Z score of the mean is always 0. (d) In a normal distribution, Q1 and Q3 are more than one SD away from the mean. Z 0.00 0.01 0.02 Second decimal place of Z 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 1.1 1.2 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 Z-score = 1 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 39 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 40 / 57 Introducting the Normal distribution Standardizing with Z scores Introducting the Normal distribution Standardizing with Z scores Example What percent of the standard normal distribution is above Z = 0.82? Choose the closest answer. (a) 79.4% The average daily temperature in June in LA is 77 F, with a standard deviation of 5 degrees. Suppose the temperatures in June closely follow a normal distribution. What is the probability of observing a temperature of at most 83 F on a randomly chosen day in June? ) (b) 20.6% T ∼ N (mean = 77, sd = 5) (c) 82% (d) 18% (e) Need to be provided the mean and the standard deviation of the distribution in order to be able to solve this problem. ! 83 − 77 P (T ≤ 83) = P Z ≤ = P (Z ≤ 1.2) ≈ 0.885 5 The probability of observing a temperature of at most 83 F on a randomly chosen day in June is approximately 0.885, or 88.5%. Statistics 101 (Gary Larson) U2 - L2: Normal distribution Introducting the Normal distribution July 7, 2015 41 / 57 Standardizing with Z scores Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 42 / 57 Normal approximation to the binomial Example (cont) The average daily temperature in June in LA is 77 F, with a standard deviation of 5 degrees. Suppose the temperatures in June closely follow a normal distribution. What is the probability of observing a temperature of at least 83 F on a randomly chosen day in June? ) A recent study found that approximately 25% of all Facebook users are power users. For a Facebook user with 245 friends, what is the probability that he/she has 70 or more friends who are power users? We are given that n = 245, p = 0.25, and we are asked for the probability P (K ≥ 70). T ∼ N (mean = 77, sd = 5) P (K ≥ 70) = P (K = 70 or K = 71 or K = 72 or · · · or K = 245) = P (K = 70) + P (K = 71) + P (K = 72) + · · · + P (K = 245) P (T ≥ 83) = 1 − P (T ≤ 83) ≈ 1 − 0.885 = 0.115 This seems like an awful lot of work... The probability of observing a temperature of at least 83 F on a randomly chosen day in June is approximately 0.115, or 11.5%. Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 43 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 44 / 57 Normal approximation to the binomial Normal approximation to the binomial Histograms of number of successes Density Curves Hollow histograms of samples from the binomial model where p = 0.10 and n = 10, 30, 100, and 300. What happens as n increases? 0 2 4 6 0 2 4 n = 10 0 5 10 8 10 n = 30 15 20 10 20 30 n = 100 Statistics 101 (Gary Larson) 6 A density curve is a smoothed histogram where the total area under the curve is 1. Measuring areas under a density curve corresponds to measuring probabilities To draw a density curve from a histogram simply connect the peaks of a histogram with a smooth line, and normalize (i.e. adjust) the values of the y-axis such that the area under the curve is 1. 40 50 n = 300 U2 - L2: Normal distribution July 7, 2015 45 / 57 Normal approximation to the binomial Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 46 / 57 Normal approximation to the binomial Normal approximation to the binomial How large is large enough? When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with p parameters µ = np and σ = np (1 − p ). In the case of the Facebook power users, n = 245 and p = 0.25. µ = 245 × 0.25 = 61.25 σ= √ To use the normal approximation instead of the binomial distribution, the sample size must be large enough; n is large enough if the expected number of successes (np) and failures (n(1 − p )) are both at least 10. np ≥ 10 and n(1 − p ) ≥ 10 245 × 0.25 × 0.75 = 6.78 Bin(n = 245, p = 0.25) ≈ N (µ = 61.25, σ = 6.78). 0.06 Bin(245,0.25) N(61.5,6.78) 0.05 0.04 0.03 0.02 0.01 0.00 20 40 60 80 100 k Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 47 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 48 / 57 Normal approximation to the binomial Normal approximation to the binomial Normal approximation to the binomial When the sample size is large enough, the binomial distribution with parameters n and p can be approximated by the normal model with p parameters µ = np and σ = np (1 − p ). Participation question Below are the parametres for four different binomial distribution parameters. Which one can be approximated by the normal distribution? In the case of the Facebook power users, n = 245 and p = 0.25. (a) n = 100, p = 0.95 Bin(n = 245, p = 0.25) ≈ N (µ = 61.25, σ = 6.78). µ = 245 × 0.25 = 61.25 (b) n = 25, p = 0.45 0.06 (c) n = 150, p = 0.05 0.05 (d) n = 500, p = 0.015 0.04 σ= √ 245 × 0.25 × 0.75 = 6.78 Bin(245,0.25) N(61.5,6.78) 0.03 0.02 0.01 0.00 20 40 60 80 100 k Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 49 / 57 Statistics 101 (Gary Larson) Normal approximation to the binomial July 7, 2015 50 / 57 Normal approximation to the binomial What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? (a) 0.0251 (c) 0.1128 (b) 0.0985 (d) 0.9015 Statistics 101 (Gary Larson) U2 - L2: Normal distribution U2 - L2: Normal distribution July 7, 2015 51 / 57 What is the probability that the average Facebook user with 245 friends has 70 or more friends who would be considered power users? (a) 0.0251 (c) 0.1128 (b) 0.0985 (d) 0.9015 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 52 / 57 Application exercises Finding probabilities // Quality control Application exercises At Heinz ketchup factory the amounts which go into bottles of ketchup are supposed to be normally distributed with mean 36 oz. and standard deviation 0.11 oz. Once every 30 minutes a bottle is selected from the production line, and its contents are Finding cutoff points // Hot bodies Body temperatures of healthy humans are distributed nearly normally with mean 98.2◦ F and standard deviation 0.73◦ F. What is the cutoff for the highest 10% of human body temperatures? noted precisely. If the amount of the bottle goes below 35.8 oz. or above 36.2 oz., then the bottle fails the quality control inspection. What percent of bottles pass the Mackowiak, Wasserman, and Levine (1992), A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body quality control inspection? Temperature, and Other Legacies of Carl Reinhold August Wunderlick. Statistics 101 (Gary Larson) U2 - L2: Normal distribution Application exercises July 7, 2015 53 / 57 Statistics 101 (Gary Larson) Conditional probability // SAT scores U2 - L2: Normal distribution Application exercises July 7, 2015 54 / 57 Finding missing parameters // Auto insurance premiums SAT scores (out of 2400) are distributed normally with mean 1500 and standard deviation 300. Suppose a school council awards a certificate of excellence to all students who score at least 1900 on the SAT. What percent of the students who received this certificate scored above 2100? P (SAT > 2100 | SAT > 1900) = = P (SAT > 2100) = = Suppose a newspaper article states that the distribution of auto insurance premiums for residents of California is approximately normal with a mean of $1,650. The article also states that 25% of California residents pay more than $1,800. P (SAT > 2100 and SAT > 1900) P (SAT > 1900) P (SAT > 2100) P (SAT > 1900) ! 2100 − 1500 P 300 P (Z > 2) = 1 − 0.9772 = 0.0228 1. What is the standard deviation of this distribution? 2. What is the IQR of this distribution? P (SAT > 1900) = P (Z > 1.33) = 1 − 0.9082 = 0.0918 0.0228 P (SAT > 2100 | SAT > 1900) = ≈ 0.25 → 25% of students 0.0918 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 55 / 57 Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 56 / 57 To Do To Do PS 3 due tomorrow in class Reading assignment, by Thursday: Chapter 4 Sections 4.1 - 4.2.3 (A sampling distribution for the mean) Statistics 101 (Gary Larson) U2 - L2: Normal distribution July 7, 2015 57 / 57