Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
North Carolina State University STAT 370: Probabilityy and Statistics for Engineers [Section 002] Announcements • HW 11 (sampling dist and C.I., 10pt) due Apr 27 @ 11:59PM • HW 12 (hypothesis testing, 10pt) due Apr 27 @ 11:59PM • Final (100 pt): Mon, May 7 @ 8AM-11AM, comprehensive Instructor: Hua Zhou Harrelson Hall 210 11:45AM-1:00PM, Apr 18, 2012 Plan • Last time: Sampling distribution, CLT • Today: Continue with sampling distribution, CLT, confidence interval Sampling distribution for the sample mean • The mean of the sampling distribution of X is the mean of the population , regardless of the size of the sample, or the shape of the population • This result tells us that X is unbiased for , i.e. it doesn’t systematically overestimate or underestimate, but gives the right answer on average. 2 • The variance of the sampling distribution of X is / n • Means are less variable than individual observations; • Means of larger samples are more precise than means of small samples. 1 Sampling distribution for the sample mean • If the population is normally distributed then the sampling distribution of X is also normally distributed – The sampling distribution is a tighter bell curve than the population’s bell curve. • If the population is not normally distributed, what is the sampling distribution of X ? Central Limit Theorem (CLT) • Let X1, X2 ,..., Xn be a sample from X, which has mean and variance 2. If the sample size (n) is large then the sampling distribution of X is at least approximately normal with mean and variance 2 / n , regardless of the population distribution. Equivalently: X has standard normal distribution. n / Thi is This i called ll d the th Central C t l Limit Li it Theorem Th (CLT). (CLT) Note: n 30 is considered large enough for CLT to apply. In class exercise The serum HDL cholesterol level of females 20-29 years olds is normally distributed with a mean of 53 and standard deviation of 13.4: (a) What is the probability that a randomly selected female 20-29 years of age will have a serum cholesterol above 60? (b) What is the probability that a randomly sample of 16 females 20-29 years of old will have average serum cholesterol above 60? (c) What is the probability that a randomly sample of 100 females 20-29 years of old will have average serum cholesterol above 60? Solution • (a) P(X>60) = P(Z<-7/13.4) = 0.3015 • (b) 0.0183 • (c) 8.76E-8 2 In class exercise The serum HDL cholesterol level of females 20-29 years olds has a mean of 53 and standard deviation of 13.4: (a) What is the probability that a randomly selected female 20-29 years of age will have a serum cholesterol above 60? (b) What is the probability that a randomly sample of 16 females 20-29 years of old will have average serum cholesterol above 60? ((c)) What is the p probability y that a randomly y sample p of 100 females 20-29 years of old will have average serum cholesterol above 60? Roulette Players bet $1 that the ball will land in a red slot and win $1 if it does. Let Xi be the net winnings on the i-th day. (a) What is distribution distribution, mean and variance of Xi ? (b) Suppose you play once per day for 365 days, what does CLT say about your average winning? (c) What is the probability that your average winning is positive? Solution • (a) cannot compute since the population distribution is unknown • (b) cannot compute because the sample size is too small for CLT to apply • (c) approximately 8.76E-8 Solution • (a) E(Xi) = -0.0526, Var(Xi) = 0.9972 • (b) average payoff after 365 days is approximately normal with mean -0.0526 -0 0526 and standard deviation 0.0523 • (c) P( X 0) = 0.1562 3 Take home exercise Take home exercise • The scores of high school students seniors on the ACT college entrance examination in 2003 had mean mu=20.8 mu 20.8 and standard deviation sigma = 4.8. The distribution of the scores is normal. a) What is the approximate probability that a single student randomly chosen from all those taking the test scores 23 or higher? b) Take a SRS of 25 students who took the test. What are the mean and standard deviation of the sample mean score of these 25 students? c) What is the approximate probability that the mean score of these students is 23 or higher ? • A $1 bet in a state lottery game pays $500 if the 3 digit number you choose exactly matches the winning number, which is drawn at random. Here’s the distribution of the payoff X : Payoff X Probability $0 0.999 $500 0.001 a) What are the mean and standard deviation of X b) Joe buys a lottery ticket every day for 60 days days. What does the CLT say about the distribution of Joe’s average payoff after 60 days? Take home exercise Take home exercise • An automatic grinding machine in an auto parts plant prepares axles with a target diameter µ=40.125mm. The machine has some variability so the standard deviation of the diameters is σ=0.002 mm. A sample of 40 axles is inspected and the sample mean diameter is recorded. • The scores of high school seniors on ACT college entrance examination in 2003 had a mean µ=20.8 and standard deviation σ =4.8. =4 8 Find the probability that the sample mean diameter differs from the target value by 0 0.004 004 or more? a) What is the probability that a single student randomly chosen from all those taking the test scores 23 or higher? p of 35 students who took the b)) Now take a sample test. What are the mean and standard deviation of the sample mean score of these 35 students? c) What is the probability that the sample mean score of these students is 23 or higher? 4 Sampling Distributions (cont’d) Population distribution has mean µ and standard deviation σ • The sample mean X has mean µ and standard deviation T-distribution • Challenge: What happens to the sampling distribution of the mean of a sample, if we don’t know the true σ (population standard deviation ?) / n • If the population distribution is normal with mean µ and standard deviation σ, then X is also normal. • If the population distribution is not normal, then X is approximately normal when n>=30 • If the population distribution is normal with mean µ and with unknown standard deviation, then X ??? • If the population distribution is not normal and with unknown standard deviation, then X ??? • The sampling distribution will be different. T – distribution • Suppose that a simple random sample (SRS) of size n is drawn from a N(µ, σ2 ). Then the sampling distribution of the statistic X t S / T – distribution • Density n is T-distribution with (n-1) degrees of freedom (Tn-1), where S is the sample standard deviation. Remarks: 1. Note that the true standard deviation was estimated by the sample standard deviation. 2. If the underlying population is not normal, then the distribution of t is approximately Tn-1, for large n (n≥30). 5 T – distribution Summary on Sampling Distributions • The T-distribution was discovered by William S. Gosset in 1908. has the following characteristics • Symmetric / Bell-shaped • Mean = 0 • Width of the distribution or the flatness of the distribution is determined by degrees of freedom (df) • Flatter than the standard normal distribution • As the degrees of freedom (df) increase, the T-distribution looks more like a Normal • Cutoff points are larger than those for the normal distribution • For example, for the 97.5th percentile: t 20 2.086 t30 2.042 t 50 2.009 t 1.96 Population distribution has mean µ and standard deviation σ • The sample mean X has mean µ and standard deviation / n • If the population distribution is normal with mean µ and standard deviation σ, then X is also normal. • If the population distribution is normal with mean µ and unknown standard deviation, then t X is a tS / n di t ib ti with distribution ith df n-1. 1 • If the population distribution is not normal with known standard deviation, then X is approximately normal/t when n>=30 (Central Limit Theorem). Remarks • The sampling distribution of sample mean is especially applicable to problems on confidence intervals and hypothesis testing. 6