Download Week 7 - Massey University

Week 7 Sample Means & Proportions Variability of Summary Statistics  Variability in shape of distn of sample  Variability in summary statistics   Mean, median, st devn, upper quartile, … Summary statistics have distributions Parameters and statistics  Parameter describes underlying population     Summary statistic    Constant Greek letter (e.g. , , , …) Unknown value in practice Random Roman letter (e.g. m, s, p, …) We hope statistic will tell us about corresponding parameter Distn of sample vs Sampling distn of statistic  Values in a single random sample have a distribution  Single sample --> single value for statistic  Sample-to-sample variability of statistic is its sampling distribution. Means     Unknown population mean,  Sample mean, X, has a distribution — its sampling distribution. Usually x ≠  A single sample mean, x, gives us information about  Sampling distribution of mean If sample size, n, increases:  Spread of distn of sample is (approx) same.  Spread of sampling distn of mean gets smaller.   x is likely to be closer to  x becomes a better estimate of  Sampling distribution of mean Population with mean , st devn  Random sample (n independent values) Sample mean, X, has sampling distn with:   Mean,  X   St devn,    X n (We will deal later with the problem that  and   are unknown in practice.) Weight loss Estimate mean weight loss for those attending clinic for 10 weeks   Random sample of n = 25 people Sample mean, x How accurate? Let’s see, if the population distn of weight loss is: X ~ normal  8lb,  5lb Some samples Four random samples of n = 25 people: 1. Mean = 8.32 pounds, st devn = 4.74 pounds 2. Mean = 8.32 pounds, st devn = 4.74 pounds 3. Mean = 8.48 pounds, st devn = 5.27 pounds 4. Mean = 7.16 pounds, st devn = 5.93 pounds N.B. In all samples, x ≠  Sampling distribution Means from simulation of 400 samples Theory: mean =  = 8 lb, s.d.( x ) =  n  5  1 lb 25 (How does this compare to simulation? To popn distn?) Errors in estimation Population X ~ normal  8lb,  5lb Sampling distribution of mean  mean =  = 8 lb,   n  5  1 lb 25 From 70-95-100 rule    s.d.( x ) = x will be almost certainly within 8 ± 3 lb x is unlikely to be more than 3 lb in error Even if we didn’t know   x is unlikely to be more than 3 lb in error Increasing sample size, n If we sample n = 100 people instead of 25: s.d.( x ) =  5   0.5 lb. n 100 Larger samples  more accurate estimates Central Limit Theorem  If population is normal (, )    X ~ normal,   n   If popn is non-normal with (, ) but n is large     X approx ~ normal,   n  Guideline: n > 30 even if very non-normal  Other summary statistics E.g. Lower quartile, proportion, correlation  Usually not normal distns  Formula for standard devn of samling distn sometimes  Sampling distn usually close to normal if n is large Lottery problem Pennsylvania Cash 5 lottery  5 numbers selected from 1-39  Pick birthdays of family members (none 32-39)  P(highest selected is 32 or over)? Statistic: H = highest of 5 random numbers (without replacement) Lottery simulation Theory? Fairly hard. Simulation: Generated 5 numbers (without replacement) 1560 times Highest number > 31 in about 72% of repetitions Normal distributions    Family of distributions (populations) Shape depends only on parameters  (mean) &  (st devn) All have same symmetric ‘bell shape’ = 65 inches,  = 2.7 inches Importance of normal distn  A reasonable model for many data sets  Transformed data often approx normal  Sample means (and many other statistics) are approx normal. Standard normal distribution  Z ~ Normal ( = 0,  = 1) -3  Prob ( Z < z* ) -2 -1 0 1 2 3 Probabilities for normal (0, 1) Check from tables: P(Z  -3.00) P(Z  −2.59) P(Z  1.31) P(Z  2.00) P(Z  -4.75) = = = = = 0.0013 0 .0048 0 .9049 0 .9772 0 .000001 Probability Z > 1.31 P(Z > 1.31) = 1 – P(Z  1.31) = 1 – .9049 = .0951 Prob ( Z between –2.59 and 1.31) P(-2.59  Z  1.31) = P(Z  1.31) – P(Z  -2.59) = .9049 – .0048 = .9001 Standard devns from mean  Normal (, )      Heights of students  = 65 inches,  = 2.7 inches Probability and area X ~ normal ( = 65 ,  = 2.7 ) P (X ≤ 67.7) = area Probability and area (cont.)  Normal (, )     Exactly  P(X within  of ) = 0.683  P(X within 2 of ) = 0.954  P(X within 3 of ) = 0.997 70-95-100 rule approx 70% approx 95% approx 100% Finding approx probabilities Ht of college woman, X ~ normal ( = 65 ,  = 2.7 ) Prob (X ≤ 62 )? 1. Sketch normal density 2. Estimate area P (X ≤ 62) = area About 1/8 Translate question from X to Z   X ~ Normal (, ) Find P(X ≤ x*)    x*  Translate to z-score: X   Z   Z ~ Normal ( = 0,  = 1)  -3 -2 z*-1 0 1 2 3 Finding probabilities Prob (height of randomly selected college woman ≤ 62 )?  62  65  P X  62  P Z   2.7    P Z  1.11  .1335 About 13%. Prob (X > value) Ht of college woman, X ~ normal ( = 65 ,  = 2.7 ) Prob (X > 68 inches)? 68  65   P X  68  P Z    PZ  1.11  1  PZ  1.11 2.7    1  .8665  .1335 Finding upper quartile Blood Pressures are normal with mean 120 and standard deviation 10. What is the 75th percentile? Step 1: Solve for z-score Closest z* with area of 0.7500 (tables) z = 0.67 Step 2: Calculate x = z* +  x = (0.67)(10) + 120 = 126.7 or about 127. Probabilities about means  Blood pressure ~ normal ( = 120,  = 10)  8 people given drug  If drug does not affect blood pressure,  Find P(average blood pressure > 130) P ( X > 130) ?        X ~ normal ( = 120,  = 10) n=8   10 X ~ normal  X  120,  X   3.54   8 130 120 z   2.83 3.54 prob = 0.0023 Very little chance! Distribution of sum X ~ distn with (, )  e.g. miles to kilometers aX ~ distn with (a, a)    X ~ distn with ,   n  X    nX ~ distn with n, n   Central Limit Theorem implies approx normal Probabilities about sum  Profit in 1 day ~ normal (= $300, = $200)  Prob(total profit in week < $1,000)?  Total = X  ~ normal 7  2,100, 1000  2100 z    2.08 529   Prob = 0.0188   7  529 Assumes independence Categorical data  Most important parameter is   = Prob (success)  Corresponding summary statistic is  p = Proportion (success) N.B. Textbook uses p and p^ Number of successes   Easiest to deal with count of successes before proportion. If… 1. 2. 3. 4.  n “trials” (fixed beforehand). Only “success” or “failure” possible for each trial. Outcomes are independent. Prob (success), remains same for all trials, . • Prob (failure) is 1 – . X = number of successes ~ binomial (n, ) Examples Binomial Probabilities nk n! k P X  k    1   for k = 0, 1, 2, …, n k! n  k ! You won’t need to use this!! Prob (win game) = 0.2 Plays of game are independent. What is Prob (wins 2 out of 3 games)? What is P(X = 2)? 32 3! 2 P X  2  .2 1 .2 2! 3  2!  3(.2)2 (.8)1  0.096 Mean & st devn of Binomial For a binomial (n, ) Mean   n Standard deviation   n 1    Extraterrestrial Life? 50% of large population would say “yes” if asked, “Do you believe there is extraterrestrial life?” Sample of n = 100 X = # “yes” ~ binomial (n = 100,  = 0.5) Mean   E X   100(.5)  50 Standard deviation   100(.5).5  5 Extraterrestrial Life? Sample of n = 100 X = # “yes” ~ binomial (n = 100,  = 0.5)   E X   100(.5)  50   100(.5).5  5 70-95-100 rule of thumb for # “yes”   About 95% chance of between 40 & 60  Almost certainly between 35 & 65 Normal approx to binomial If X is binomial (n , ), and n is large, then X is also approximately normal, with Mean   E X   n Standard deviation   n 1   Conditions: Both n and n(1 – ) are at least 10.  (Justified by Central Limit Theorem) Number of H in 30 Flips X = # heads in n = 30 flips of fair coin X ~ binomial ( n = 30, = 0.5) Bell-shaped & approx normal.   E X   30(.5)  15   30(.5).5  2.74 Opinion poll n = 500 adults; 240 agreed with statement If  = 0.5 of all adults agree, what P(X ≤ 240) ? X is approx normal with   E X   500(.5)  250   100(.5).5  11.2  240  250 P X  240  P Z   P Z  .89  .1867 11.2    Not unlikely to see 48% or less, even if 50% in population agree. Sample Proportion  Suppose (unknown to us) 40% of a population carry the gene for a disease, ( = 0.40).  Random sample of 25 people; X = # with gene.  X ~ binomial (n = 25 ,  = 0.4) p = proportion with gene X p  n Distn of sample proportion   X ~ binomial (n , )  X  n  X  n 1   X p  n p    p   Large n: p is approx normal  1   n (n ≥ 10 & n (1 – ) ≥ 10) Examples  Election Polls: to estimate proportion who favor a candidate; units = all voters.  Television Ratings: to estimate proportion of households watching TV program; units = all households with TV.  Consumer Preferences: to estimate proportion of consumers who prefer new recipe compared with old; units = all consumers.  Testing ESP: to estimate probability a person can successfully guess which of 5 symbols on a hidden card; repeatable situation = a guess. Public opinion poll Suppose 40% of all voters favor Candidate A. Pollsters sample n = 2400 voters. Propn voting for A is approx normal  p    0.4 p   1   n Simulation 400 times & theory.  0.4  0.6  0.01 2400 Probability from normal approx If 40% of voters favor Candidate A, and n = 2400 sampled  p  0.4  p  0.01 Sample proportion, p, is almost certain to be between 0.37 and 0.43  Prob 0.95 of p being between 0.38 and 0.42

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Week 7 - Massey University