Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow STT 350: SURVEY SAMPLING Dr. Cuixian Chen Chapter 3 Chapter 3: Some basic concepts of statistics 1 Population versus sample • Population: The entire group of individuals in which we are interested but can’t usually assess directly. Population • A parameter is a number describing a characteristic of the population. Sample: The part of the population we actually examine and for which we do have data. Sample A statistic is a number describing a characteristic of a sample. Population versus Sample 3 Population Numbers that describe the population are called _________________ Population mean is represented by ________ Population variance is represented by ________ Sample Numbers that describe the sample are called __________________ Sample mean is represented by ________ Sample variance is represented by ________ Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Sample mean and variance 4 Use the following data set: 5,9,8,7,6,5,8,4,1,3 Calculate sample mean: Calculate sample variance: Sample standard deviation: Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Population Mean and Standard deviation 5 Population mean: m = E(Y) = Syip(yi) Population variance: Population standard deviation: s(Y)=sqrt(V(Y)) V(Y) = s2 = S(yi-m)2p(yi) Eg: Use the following information to calculate Population mean, variance and standard deviation: Y P(Y) 1 0.1 2 0.6 3 0.2 4 0.1 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Sampling distribution from infinite populations 6 For randomly selected samples from infinite populations, mathematical properties of expected value can be used to derive the facts that: It can also be shown that the variance of the sample mean can be estimated unbiasedly by: Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Section 3.3 Summarizing Info in Populations and Samples: The Finite Population Case 7 If the population is infinitely large, we can assume sampling without replacement (probabilities of selecting observations are independent) However, if population is finite, then probabilities of selecting elements will change as more elements are selected (Example: rolling a die versus selecting cards from standard 52 card deck) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Estimating total population 8 We will represent the total of a population as t and the statistic as t-hat More to come on this in the next few chapters Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Sampling without replacement 9 Same idea can be used with sampling without replacement, but probabilities become more difficult to find (STT 315 helps to understand how to calculate these). Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 3.4 Sampling distribution (class activity) 10 In your introductory statistics class, you discovered that the sampling distribution of ybar was normally distributed (if n was large enough) with mean m and standard deviation s/sqrt(n). Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Tchebysheff’s theorem 11 If n is NOT large enough to assume CLT and the population distribution is NOT normal, then we can still use Tchebysheff’s theorem to get a lower bound: For any k > 1, at least (1-(1/k2)) of the measurements in any set will lie within k SD of the mean (this is a LOWER BOUND!!) . E.g.: within 1 SD, at least 0% (not very useful) of measurements; within 2 SDs, at least 75% of measurements; within 3 SDs, at least 88.88889% of measurements. Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Finite population size 12 All the theory in introductory statistics class (and so far in this class) assumes INDEPENDENT observations (infinite population…..or so large that we can assume infinite population) What happens when this is not true? Excel Tool for applet Rcode x.bar.dist1<-function(n) {xbar<-vector(length=1000) for (i in 1:1000) { temp<-rgamma(n,shape=0.5,scale=9) xbar[i]<-mean(temp) } return(xbar)} hist(x.bar.dist1(80)) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 3.5 Covariance and Correlation 13 Relationship between two random variables: covariance The covariance indicates how two variables “covary” Positive covariance indicates a positive “covary” or association Negative covariance indicates a negative “covary” or association Zero covariance indicates no association (NOT necessarily independence!!!) Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 More on Covariance 14 We calculate covariance by E[(y1-m1)(y2-m2)]. Look at graphs to discuss covariance (measure of LINEAR dependency) However, covariance depends on the scale of the two variables Correlation “standardizes” the covariance Correlation = cov(y1,y2)/(s1s2) = r Note that -1<r<1 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 3.6 Estimation 15 • 1. 2. Since we do not know parameters, we estimate them with statistics!! If q is the parameter of interest, then q-hat is the estimator of q. We want the following properties to hold: E(q-hat) = q V(q-hat) = s2(q-hat) is small Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Error of Estimations and Bounds 16 The error of estimation is defined as |(q-hat)-q| Set a bound on this error of estimation (B) such that P(|(q-hat)-q| < B) = 1-a The value of B (bound) can be thought of as the margin of error. In fact, this is how confidence intervals (when the sampling distribution of the statistics is normally distributed). Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Chapter 3 Elementary Survey Sampling, 7E, Scheaffer, Mendenhall, Ott and Gerow Review STT 215: CHAPTER5: SURVEY SAMPLING Dr. Cuixian Chen Review STT215 Chapter 5: Sampling Distributions 17 Chapter 5.1 Sampling Distributions of sample mean X-bar Simple random sample (SRS) Data are summarized by statistics (mean, standard deviation, median, quartiles, correlation, etc..) Concerns: 1) Is sample mean related to population mean? 2) If yes, what will be the relationship? Or say, how far or how close is a sample mean away from the population mean? Population Distribution for 10 random digits Population distribution of 0-9 random digits X Prob 0 1 2 1/10 1/10 1/10 3 4 5 6 1/10 1/10 1/10 1/10 7 8 1/10 1/10 9 1/10 Population mean: m = E(Y) = Syip(yi) Population standard deviation: V(Y) = s2 = S(yi-m)2p(yi) Sampling Distribution of sample mean of 10 random digits (1) Select 10 random digits from Table B, and then take the sample mean; (2) Repeat this process 4 times for each student from Dr. Chen’s class. More details with illustration: 1. Based on Table B (random digit table), we randomly select a line, for example line 106 in this case: 2. Take sample average of random digits of (6, 8, 4, 1, 7, 3, 5, 0, 1, 3). We will have sample mean as sample mean #1=(6+8+4+1+7+3+5+0+1+3) /10=3.8; sample SD #1 = sd(c(6, 8, 4, 1, 7, 3, 5, 0, 1, 3)) = 2.699794. Now we move forward to another set of 10 random digits of (1, 5, 5, 2, 9, 7, 2, 7, 6, 5). We will have the sample mean as sample mean #2=(1+5+5+2+9+7+2+7+6+5) /10=4.9; sample SD #2 = sd(c(1, 5, 5, 2, 9, 7, 2, 7, 6, 5)) =2.558211. Repeat this procedure 4 times until you get sample mean #4. Sampling Distribution of sample mean of 10 random digits (2)2.2 2.2 (7)2.9 2.9 3.0 3.0 3.0 3.0 3.0 (8)3.1 3.1 3.2 3.2 3.3 3.4 3.4 3.5 (12)3.6 3.6 3.7 3.8 3.8 3.8 3.8 3.8 3.8 3.8 3.9 4.0 (16)4.1 4.1 4.2 4.2 4.2 4.3 4.3 4.3 4.3 4.4 4.4 4.4 4.5 4.5 4.5 4.5 (10)4.6 4.7 4.7 4.7 4.8 4.9 4.9 4.9 4.9 4.9 (8)5.2 5.2 5.3 5.3 5.3 5.3 5.5 5.5 (4)5.9 6.0 6.0 6.0 (4)6.2 6.2 6.3 6.4 (1)6.8 Q: Draw a histogram with classes as: Class Counts (2, 2.5] (2.5,3] (3, 3.5] (3.5, 4] (4, 4.5] (4.5, 5] (5, 5.5] (5.5, 6] (6, 6.5] (6.5, 7] Sampling Distribution of sample mean of 10 random digits Class (2, 2.5] (2.5,3] (3, 3.5] (3.5, 4] (4, 4.5] (4.5, 5] (5, 5.5] (5.5, 6] (6, 6.5] (6.5, 7] Counts 2 7 8 12 16 10 8 4 4 1 Sampling distribution of “x bar” Histogram of some sample averages Q: Write a journal about how to get the sampling distribution of Sample mean X-bar today, by answering the following questions: 1) How to obtain X-bar’s from Table B for each student? 2) How many X-bar’s did we have totally in the class? 3) How to make a histogram for X-bar? What is the name of the histogram? 4) What did the smooth curve represent? 5) For the smooth curve, what did the horizontal axis and vertical axis present? Q: How to get the sampling distribution of Sample mean X-bar, from our IN-class EX? 1)How to obtain X-bar’s from Table B for each student? 2)How many X-bar’s did we have totally in the class? 3)How to make a histogram for X-bar? What is the name of the histogram? 4)What did the smooth curve represent? 5)For the smooth curve, what did the horizontal axis and vertical axis present? Sampling Distribution Select 10 random digits from Table B 1st Sample 3 8 6 8 3 4 9 4 Sample mean 8 =6 8 = 4.5 7 2nd Sample 9 0 8 4 6 3 4 2 5 6 2 7 2 7 6 0 3 7 1 25th Sample Population 5 There is some variability in values of a statistic over different samples. 0 9 1 6 3 4 9 8 1 = 4.6 Sampling Distribution of sample mean of 10 random digits (1) Select 10 random digits from Table B, and then take the sample mean; (2) Repeat this process 25 times for each students Spring 2012. (3) Make a histogram of sample mean’s from the class with 1098 X-bar’s. The probability distribution looks like a Normal distribution. Sampling distribution of “x bar” Histogram of some sample averages The probability distribution of a statistic is called its sampling distribution. For the histogram: Center of X-bar = 4.541 SD of X-bar = 0.9 X Min. 1st Qu. Median Mean 3rd Qu. Max. 1.400 3.600 4.400 4.451 5.400 7.800 Sampling Distribution of sample mean of 10 random digits 9 0 8 4 6 3 2 5 1 7 7 8 8 4 4 6 2 3 6 3 0 8 2 7 3 6 8 4 7 9 1 8 3 7 6 2 3 5 9 6 2 8 7 1 13 549 2 6 0 3 9 4 5 4 0 3 1 Population Center of X-bar = 4.5 0.9 SD of X-bar = Mean and standard deviation of a sample mean For any population with mean m and standard deviation s: The mean, or center of the sampling distribution of x bar, is equal to the population mean m : The standard deviation of the sampling distribution is s/√n, where n is the sample size : Sample Mean’s are less variable than individual observations. For normally distributed populations When a variable in a population is normally distributed, the sampling distribution of x bar for all possible samples of size n is also normally distributed. Sampling distribution If the population is N(m, s) then the sample means distribution is N(m, s/√n). Population Sampling distribution of a sample mean=distribution of X Population Distribution of X, (n=1): Exact N(m , s ) Sampling distribution of X , (n>1) : Exact N(m , s ) n Not Exact Normal, but with Approximately N(m , s ) n Mean m , and SD s Standardize: Z-score of X (By Central Limit Theorem) X m s ; Reverse: X m *Z s n n 30 Example: Soda Drink Let X denote the actual volume of soda in a randomly selected can. Suppose X~N(12oz, 0.4oz), 16 cans are to be selected. a) The average volume is normally distributed with mean____ and standard deviation___. b) Find the probability that the sample average is greater than 12.1 oz. Mean of x-bar = 12; SD of x-bar = 0.1; P(Z>1) = 1-0.8413 = 0.1587. If the population is N(m, s) then the sample means distribution is N(m, s/√n). Exercise 5.21, page 310 Diabetes during pregnancy. A patient is classified as having gestational diabetes if the glucose level is above 140 mg/dl one hour after a sugary drink. Patient Sheila’s glucose level follows a Normal distribution with m125 mg/dl, s10 mg/dl. (a) If a single glucose measurement is made, what is the probability that Sheila is diagnosed as having gestational diabetes. (b) If measurements are made instead on three separated days and the mean result is compared with criterion 140 mg/dl, what is the probability that Sheila is diagnosed as having gestational diabetes. (a) n=1: Let X be Sheila’s measured glucose level. (a) P(X > 140) = P(Z > 1.5) = 0.0668. (b) n=3: If x is the mean of three measurements, then x-bar has a N(125, 10/√3 ) or N(125 mg/dl, 5.7735 mg/dl) distribution, and P(x > 140) = P(Z >2.60) = 0.0047. If the population is N(m, s) then the sample means distribution is N(m, s/√n). For Normal distributed populations If the population is N(m, s) then the sample means distribution is N(m, s/√n). Concern: What will happen when sample size gets bigger and bigger? Review----Sampling Distribution of sample mean of 10 random digits 9 0 8 4 6 3 2 5 1 7 7 8 8 4 4 6 2 3 6 3 0 8 2 7 3 6 8 4 7 9 1 8 3 7 6 2 3 5 9 6 2 8 7 1 13 549 2 6 0 3 9 4 5 4 0 3 1 Population Center of X-bar = 4.5 0.9 SD of X-bar = Central Limit Theorem (CLT) m Population with strongly skewed distribution Sampling distribution of for n = 10 observations s Sampling distribution of for n = 2 observations Sampling distribution of for n = 25 observations For Non-Normal distributed populations CLT says that: Even if the population is NOT Normal, but with mean m and SD s, when sample size is large enough, the sample means distribution is N(m, s/√n) approximately. Concern: What will happen when sample size gets bigger and bigger? Sampling distribution of a sample mean=distribution of X Population Distribution of X, (n=1): Exact N(m , s ) Sampling distribution of X , (n>1) : Exact N(m , s ) n Not Exact Normal, but with Approximately N(m , s ) n Mean m , and SD s Standardize: Z-score of X (By Central Limit Theorem) X m s ; Reverse: X m *Z s n n 37 IQ scores: population vs. sample In a large population of adults, the mean IQ is 112 with standard deviation 20. Suppose 200 adults are randomly selected for a market research campaign. The distribution of the sample mean IQ is: A) Exactly normal, mean 112, standard deviation 20 B) Approximately normal, mean 112, standard deviation 20 C) Approximately normal, mean 112 , standard deviation 1.414 D) Approximately normal, mean 112, standard deviation 0.1 C) Approximately normal, mean 112 , standard deviation 1.414 Population distribution : N(112; 20) Sampling distribution for n = 200 is N(112; 1.414) Examples #5.12, page 309 Songs on an iPod. An ipod has about 10,000 songs. The distribution of the play time for these songs is highly skewed. Assume that the standard deviation for the population is 280 seconds. (a) What is the standard deviation of the average time when you take an SRS of 10 songs from this population? (b) How many songs would you need to sample if you wanted the standard deviation of x-bar to be 15 seconds? (a) The standard deviation is σ/√10 = 280/√10 ~ 88.5438 seconds. (b) In order to have σ/√n = 280/√n = 15 seconds, we need √n = 280/15 ~ 18.667, so n ~ (18.667)^2 = 348.5 — use n = 349. Example: children’s attitudes toward reading In the journal Knowledge Quest (Jan/Feb 2002), education professors at the University of Southern California investigated children’s attitudes toward reading. One study measured third through sixth graders’ attitudes toward recreational reading on a 140-point scale. The mean score for this population of children was 106 with a standard deviation of 16.4. In a random sample of 36 children from this population, a) what is the sampling distribution of x-bar? b) find P(x<100). Answer to Example 4 X follows Approximately N(m , s )=N(106, Standardize: Z-score of X s n )= N(106, 2.7333) 36 n X m 16.4 = 100 106 2.7333 2.20 Z=-2.20 Probability=normalcdf(-E99, -2.20, 0, 1)=0.0139 More Exercise on Chapter 5.1: 1. You were told that the weight of a new born baby follows normal distribution with mean 7 pounds and SD 0.5 pounds. The average weight of the next 16 new born in your local hospital is around ______, with SD _____. what’s the prob that the average is between 7.2 and 7.5 pounds? 2. The carbon monoxide in a certain brand of cigarette (in milligrams) follows normal distribution with mean 12 and SD 1.8. For 40 randomly selected cigarettes, a) What is the sampling distribution of sample mean? b) Find the prob that the average carbon monoxide is between 10 and 13. 3. The amount of time that a drive-through bank teller spends on a customer follows normal distribution with mean 4 minutes and SD 1.5 minutes. For the next 50 customers, find the prob that the average time spent is more than 5 minutes 4. The rate of water usage per hour (in Thousands of gallons) by a community follows normal distribution with mean 5 and SD 2. For the next 30 hours, a) What is the sampling distribution of sample mean? b) Find the probability that the average rate of usage per hour is less than 4? Answer: 1. new SD=0.125, Z7.2=1.6, Z7.5=4, area=1-0.9452=0.0548 2. new SD=0.285, Z10=-7.02, Z13=3.5, area is almost 100% 3. new SD=0.212, Z5=4.72, area is almost zero. 4. new SD=0.365, Z4=-2.74, area=1-0.9452=0.0031. 42 EX: 5.7, 5.8, 5.18(a-c), 5.24, 5.21,5.12 Chapter 5.2 Sampling Distributions of sample proportion p-hat Review: Sampling proportion p-hat Sample proportion: (p-hat, or relative frequency) p̂ Population proportion: p count in the sample Total Reminder from Chapter 3: Sampling variability Each time we take a random sample from a population, we are likely to get a different set of individuals and calculate a different statistic. This is called sampling variability. If we take a lot of random samples of the same size from a given population, the variation from sample to sample—the sampling distribution—will follow a predictable pattern. Sampling Distribution of sample proportion of 10 random digits (1) Select 10 random digits from Table B, and then take the sample proportion of EVEN numbers; (2) Repeat this process 4 times for each student from Dr. Chen’s class. More details with illustration: 1. Based on Table B (random digit table), we randomly select a line, for example line 106 in this case: 2. Take sample proportion of EVEN numbers of random digits of (6, 8, 4, 1, 7, 3, 5, 0, 1, 3). We will have sample proportion of EVEN #’s and gives sample proportion #1 = 4/10=0.4; Now we move forward to another set of 10 random digits of (1, 5, 5, 2, 9, 7, 2, 7, 6, 5), and we will have sample mean and gives sample proportion #2 = 3 /10=0.3; Repeat this procedure 4 times until you get sample proportion #4. Sampling Distribution of sample mean of 10 random digits (1)0.1 (2)0.2 0.2 (5)0.3 0.3 0.3 0.3 0.3 (21)0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 (17)0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 (17)0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 (7)0.7 0.7 0.7 0.7 0.7 0.7 0.7 (5)0.8 0.8 0.8 0.8 0.8 (1)0.9 Q: Draw a histogram with classes as: (for line 101-120 in Table B) Class Counts (0, 0.1] (0.1, 0.2] (0.2, 0.3] (0.3, 0.4] (0.4, 0.5] (0.5, 0.6] (0.6, 0.7] (0.7, 0.8] (0.8, 0.9] Sampling Distribution of sample mean of 10 random digits Class (0, 0.1] (0.1, 0.2] (0.2, 0.3] (0.3, 0.4] (0.4, 0.5] (0.5, 0.6] (0.6, 0.7] (0.7, 0.8] (0.8, 0.9] Counts 1 2 5 21 17 17 7 5 1 Q: Write a journal about how to get the sampling distribution of Sample proportion p-hat today, by answering the following questions: Sampling distribution of “p hat” Histogram of some sample proportion 1)How to obtain p-hat’s from Table B for each student? 2)How many p-hat’s did we have totally in the class? 3)How to make a histogram for p-hat? What is the name of the histogram? 4)What did the smooth curve represent? 5)For the smooth curve, what did the horizontal axis and vertical axis present? Sampling Distribution Select 10 random digits from Table B and find sample proportion of even # 1st Sample 3 8 6 8 3 4 9 4 8 7 2nd Sample 9 0 8 4 6 3 4 2 5 6 2 7 2 7 6 8 0 3 7 1 25th Sample Population 5 There is some variability in values of a statistic over different samples. 0 9 1 6 3 4 9 8 1 Sample proportion Sampling Distribution of sample proportion of even # of 10 random digits (1) Select 10 random digits from Table B, and then take the sample proportion of even #. (2) Repeat this process a lot of times, say 10,000 times. (3) Make a histogram of these 10,000 sample mean’s. The probability distribution looks like a Normal distribution. Sampling distribution of “p-hat” Histogram of some sample proportion The probability distribution of a statistic is called its sampling distribution. Center of p-hat = 0.5018 SD of p-hat = 0.1598 Note: n=10. SD of p-hat = p(1 p) n Sampling distribution of the sample proportion The sampling distribution of p̂ is never exactly normal. But as the sample size increases, the sampling distribution of p̂becomes approximately normal. The normal approximation is most accurate for any fixed n when p is close to 0.5, and least accurate when p is near 0 or near 1. Sampling Distribution of p̂ If data are obtained from a SRS and np>10 and n(1-p)>10, then the sampling distribution of p̂ has the following form: For sample percentage: p̂ is approximately normal with mean p and standard deviation: p(1 p) n Sampling distribution of a sample Proportion=distribution of p follows Approximately N(p , p (1 p ) ) n Standardize: Z-score of p p p p (1 p ) ; n Reverse: p p p (1 p ) *Z n 53 p Example 1 Maureen Webster, who is running for mayor in a large city, claims that she is favored by 53% of all eligible voters of that city. Assume that this claim is true. In a random sample of 400 registered voters taken from this city. Find Population proportion p= _________. a.) What is the sampling distribution of p-hat? b) What is the probability of getting a sample proportion less than 49% in which will favor Maureen Webster? c.) Find the probability of getting a sample proportion in between 50% and 55%. (b) Z=(0.49-0.53)/0.02495 = -1.60 Pr(Z<-1.60) =normalcdf(-E99, -1.6, 0, 1) = 0.0548 (c) Z=(0.5-0.53)/0.02495 = -1.20; Z=(0.55-0.53)/0.02495 = 0.80; Pr(-1.20 <Z<0.80) =normalcdf(-1.20, 0.80, 0, 1) =0.673 Example 2 The Gallup Organization surveyed 1,252 debit cardholders in the U.S. and found that 180 had used the debit card to purchase a product or service on the Internet (Card Fax, November 12, 1999). Suppose the true percent of debit cardholders in the U.S. that have used their debit cards to purchase a product or service on the Internet is 15%. Calculate p hat (sample proportion ). The sample proportion (p hat ) is approximately normal with mean = ______ and standard deviation = ______. Find the probability of getting a sample proportion smaller than 14.4%. ANS: Z=(0.144-0.15)/0.01=-0.6 Pr(Z<-0.6)= normalcdf(-E99, -0.6, 0, 1) = 0.2743 More Exercise on Chapter 5.2: 1. 30% of all autos undergoing an emissions inspection at a city fail in the inspection. Among 200 cars randomly selected in the city, the percentage of cars that fail in the inspection is around_____, with SD______. Find the prob that the percentage is between 31% and 35%. 2. 60% of all residents in a big city are Democrats. Among 400 residents randomly selected in the city, a) What is the sampling distribution of p-hat? b) Find Pr(sample percentage<58%) 3. In airport luggage screening it is known that 3% of people have questionable objects in their luggage. For the next 1600 people, use normal approximation to find the prob that at least 4% of the people have questionable objects. 4. It is known that 60% of mice inoculated with a serum are protected from a certain disease. If 80 mice are inoculated, a) What is the sampling distribution of p-hat? b) find the prob that at least 70% are protected from the disease. 56 HWQ: 5.22, 5.23(a,b) 5.73 Sampling distribution of a sample mean=distribution of X Population Distribution of X, (n=1): Exact N(m , s ) Sampling distribution of X , (n>1) : Exact N(m , s ) n Not Exact Normal, but with Approximately N(m , s ) n Mean m , and SD s Standardize: Z-score of X (By Central Limit Theorem) X m s ; Reverse: X m *Z s n n 57 Sampling distribution of the sample proportion The sampling distribution of p̂ is never exactly normal. But as the sample size increases, the sampling distribution of p̂becomes approximately normal. The normal approximation is most accurate for any fixed n when p is close to 0.5, and least accurate when p is near 0 or near 1. Summary to Chapter 5 1. If X~N(µ, σ) exactly, then a) what is the mean of X-bar? b) what is SD of X-bar? c) what is the sampling distribution of X-bar? (You need to specify what the curve look like? What is the center/Mean? What is the Spread/SD? Is it EXACT, or Approximate by Central Limit Theorem.) 2. If X is NOT normal, but with population mean µ and population SD σ. When sample size is big enough, a) what is the mean of X-bar? b) what is SD of X-bar? c) what is the sampling distribution of X-bar? (You need to specify what the curve look like? What is the center/Mean? What is the Spread/SD? Is it EXACT, or Approximate by Central Limit Theorem.) 3. With population proportion p and sample size n, a) what is the mean of p-hat? b) what is SD of p-hat? c) what is the sampling distribution of p-hat? (You need to specify what the curve look like? What is the center/Mean? What is the Spread/SD? Is it EXACT, or Approximate by Central Limit Theorem.) Summary to Chapter 5 (Popup Quiz) 1. If X~N(µ, σ) exactly, then a) what is the mean of X-bar? b) what is SD of X-bar? c) what is the sampling distribution of X-bar? Is it EXACT, or Approximate by Central Limit Theorem? 2. If X is NOT normal, but with population mean µ and population SD σ. When sample size is big enough, a) what is the mean of X-bar? b) what is SD of X-bar? c) what is the sampling distribution of X-bar? Is it EXACT, or Approximate by Central Limit Theorem? 3. With population proportion p and sample size n, a) what is the mean of p-hat? b) what is SD of p-hat? c) what is the sampling distribution of p-hat? Is it EXACT, or Approximate by Central Limit Theorem?