Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class 04. Wunderdog and the Normal Distribution EMBS Section 6.2 Class 03 Assignment • Answers are posted on the course website • My office hours are IN THE CLASSROOM. – 3 to 430 on class days – Or email me for an appointment [email protected] • TA Office hours – Sundays and Tuesday Nights – MCoB 266 – 7 to 8:30 pm What we learned last class • Hypothesis Testing – H0: She is guessing – Randomized double blind experiment – Test statistic: number correct – Specify α=0.05 (level of significance) – Observe 7 correct – P(x>=7│H0) = 1-BINOMDIST(6,10,.5,true) = 0.17 – Since this pvalue > α, the result is NOT statistically significant. Case: Wunderdog Sports Picks Wunderdog is just like LTT? • • • • • • H0: He is guessing (p=.5, independent events) Ha: He is skillful (p>.5) Test statistic: Number correct = 87. P( X≥87 │H0 ) = 1 – BINOMDIST(86,149,.5,true) = 0.024 Conclusion: Statistically significant at the α=0.05 level. Wunderdog • • • • • X is number correct X is binomial, n=149, p=0.5, if H0 is true. Mean = E(X) = n*p = 74.5 Variance = Var(X) = n*p*(1-p) = 37.25 Standard deviation = 37.25^.5 = 6.1 Binomial pmf with n=149, p=0.5 Each possible outcome x has a mass of probability calculated as BINOM.DIST(x,149,.5,false) 0.07 0.06 0.05 P(x) 0.04 0.03 0.02 0.01 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 133 137 141 145 149 0 x = number of successes As n gets big, the binomial “looks like” the normal (bell-shaped curve) • So if n is big, we sometimes use the normal distribution to approximate the binomial. – X is actually binomial. – It would be better to use BINOMDIST – But the probabilities we calculate come out pretty much the same if we use the appropriate normal distribution. Binomials distributions for n=149 All three are “bell-shaped curves” 0.08 P(x) 0.06 P=0.5 0.04 0.02 1 13 25 37 49 61 73 85 97 109 121 133 145 0 0.1 0.08 0.08 0.06 0.04 P=0.2 P(x) 0.1 0.06 0.04 0.02 0 0 1 13 25 37 49 61 73 85 97 109 121 133 145 0.02 P=0.8 1 13 25 37 49 61 73 85 97 109 121 133 145 P(x) x=number correct x=number correct x=number correct 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 f(x) 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 P(x) 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 x=number correct 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 x 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 f(x) 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 P(x) 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 x=number correct 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 x The Normal Distribution • X is continuous • Applies to LOTS of random variables • Parameters are mean μ and the standard deviation σ. – Mean or E(X) = μ – Variance = σ2 – Standard deviation = σ – Symmetric: mean = median = mode (all = μ) EMBS Fig 6.4, p 249 To calculate probabilities • P(X=x) = 0 • P(X≤x) = NORMDIST(x,μ,σ,true) • P(X<x) = NORMDIST(x,μ,σ,true) 0.07 0.06 Mean=E(X)=74.5 P(x) 0.05 Standard deviation = 6.1 0.04 0.03 0.02 0.01 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 0 x=number correct 0.07 Normal with μ=74.5, σ=6.1 0.06 0.04 0.03 0.02 0.01 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 f(x) 0.05 x Just like the binomial, the normal is a FAMILY of distributions. The member of the Normal family we want to use is the one with the mean and standard deviation that match our binomial. 0.07 X is discrete P(x≥87) = 0.06 1-BINOMDIST(86,149,.5,true) =0.024 0.05 P(x) P(x=87) = 0.008 0.04 0.03 0.02 0.01 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 0 x=number correct 0.07 X is continuous P(x≥87) = P(x>87) = 0.06 1-NORMDIST(87,74.5,6.1,true) 0.04 =0.020 0.03 0.02 0.01 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134 141 148 f(x) P(x=87) = 0 0.05 x To calculate probabilities • P(X=x) = 0 • P(X≤x) = NORMDIST(x,μ,σ,true) • P(X<x) = NORMDIST(x,μ,σ,true) • P(X>x) = 1 – NORMDIST(x,μ,σ,true) • P(x1<X<x2) = NORMDIST(x2, μ,σ,true)-NORMDIST(x1,μ,σ,true) Weights of CEO’s are normally distributed with µ = 155 and σ=25. What percentage of CEO’s do we expect weigh between 160 and 200? =NORMDIST(200,155,25,true)-NORMDIST(160,155,25,true) = 0.964 – 0.579 = 0.385 To go backwards from a p to an x • The find the x value such that P(X<x) = p, use =NORMINV(p,μ,σ) EMBS problem 21, page 260 A person must score in the top 2% of the population on an IQ test to qualify for membership in MENSA (U.S. Airways Attache, September 2000). If the population of IQ scores is normal with mean of 100 and standard deviation of 15, what score qualifies one for MENSA? We want the score, x, such that the probability(X<x) is 0.98. =NORMINV(.98,100,15) = 130.8 Fun facts about the normal distribution • Let X be normal with mean μ and standard deviation σ. – X ~ N(μ,σ) • If Y = a + b*X • Then Y will be normal with mean a+b*μ and standard deviation b*σ – Y ~ N(a+b*μ,b*σ) • So if weight in pounds is normal, weight in kilograms will also be normal. • If Temperature in degrees F is Normal, temperature in degrees C will also be normal. • If I add 10 points to all exams, I add ten points to the mean but do not change the standard deviation. • If I multiply all scores by 1.5, I multiply the mean and the standard deviation by 1.5. More Fun Facts • There are a multitude of normal distributions…one for each possible pair of μ and σ values. • But…they all follow the same “curve” and have identical properties so that, in that sense, there is only ONE normal distribution. EMBS Fig 6.4, p 249 Before there was NORMDIST • We asked everyone to convert their probability question about x into a probability question about z. Because then we needed only ONE table of normal probabilities. Those that applied to z. z tells us where x z is all we need to answer a probability quesgtion. 𝑥−𝜇 𝑧= 𝜎 is on its normal curve. z is how far x is above/below the mean in units of standard deviation. A changing world… • We can use =NORMDIST(x,μ,σ,true) to answer our probability questions. • We used to have to use =NORMSDIST([x- μ]/σ) The standard normal distribution. Uses z as the input. We needed calculate the z in order to answer probability questions. There is math and calculus behind all this… =NORMDIST(x1,μ,σ,true) 𝑥1 = −∞ 1 𝜎 2𝜋 𝑒 −(𝑥−𝜇)2 /2𝜎 2 𝑑𝑥