Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of statistics wikipedia , lookup
Foundations of statistics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Misuse of statistics wikipedia , lookup
• • • • t distributions t confidence intervals for a population mean Sample size required to estimate hypothesis tests for In the 2012-2013 NFL season Adrian Peterson of the Minn. Vikings rushed for 2,097 yards. The all-time single-season rushing record is 2,105 yards (Eric Dickerson 1984 LA Rams). Shown below are Peterson’s rushing yards in each game: 84 60 86 102 88 79 153 123 182 171 108 210 154 212 86 199 We would like to estimate Adrian Peterson’s mean rushing ABILITY during the 2012-2013 season with a confidence interval. When we select simple random samples of size n, the sample means we find will vary from sample to sample. We can model the distribution of these sample means with a probability model that is N , n z x n Note that SD( x ) n SD( x ) n The sample standard deviation s provides an estimate of the population standard deviation For a sample of size n, 1 2 s ( x x ) i the sample standard deviation s is: n 1 n − 1 is the “degrees of freedom.” The value s/√n is called the standard error of x , denoted SE(x). s SE ( x ) n Substitute s (sample standard deviation) for x x z s s s zs ss s n s n Note quite correct Not knowing means using z is no longer correct Suppose that a Simple Random Sample of size n is drawn from a population whose distribution can be approximated by a N(µ, σ) model. When is known, the sampling model for the mean x is N(, /√n). When is estimated from the sample standard deviation s, the sampling model for the mean x follows a t distribution t(, s/√n) with degrees of freedom n − 1. x t s n is the 1-sample t statistic CONFIDENCE INTERVAL for s x t where: n t = Critical value from t-distribution with n-1 degrees of freedom = Sample mean x s = Sample standard deviation n = Sample size For very small samples (n < 15), the data should follow a Normal model very closely. For moderate sample sizes (n between 15 and 40), t methods will work well as long as the data are unimodal and reasonably symmetric. For sample sizes larger than 40, t methods are safe to use unless the data are extremely skewed. If outliers are present, analyses can be performed twice, with the outliers and without. Very similar to z~N(0, 1) Sometimes called Student’s t distribution; Gossett, brewery employee Properties: i) symmetric around 0 (like z) ii) degrees of freedom if > 1, E(t ) = 0 if > 2, = - 2, which is always bigger than 1. x - x z = x x - x s t = , sx = sx n Z -3 -3 -2 -2 -1 -1 00 11 22 33 z= x - x x - x t= s n n Z t -3 -3 -2 -2 -1 -1 00 11 22 33 Figure 11.3, Page 372 Degrees of Freedom s = x - x t= s n s2 n s2 = 2 (X X) i i=1 Z n -1 t1 -3 -3 -2 -2 -1 -1 00 11 22 33 Figure 11.3, Page 372 Degrees of Freedom s = x - x t= s n s2 n s2 = 2 (X X) i i=1 Z n -1 t1 t7 -3 -3 -2 -2 -1 -1 00 11 22 33 Figure 11.3, Page 372 90% confidence interval; df = n-1 = 10 Degrees of Freedom 1 2 . . 10 0.80 3.0777 1.8856 . . 1.3722 0.90 6.314 2.9200 . . 1.8125 0.95 0.98 12.706 4.3027 . . 2.2281 31.821 6.9645 . . 2.7638 . . . . . . . . . . 100 1.2901 1.282 1.6604 1.6449 1.9840 1.9600 s 90% confidence interval : x 1.8125 11 2.3642 2.3263 0.99 63.657 9.9250 . . 3.1693 . . 2.6259 2.5758 P(t > 1.8125) = .05 P(t < -1.8125) = .05 .90 .05 -1.8125 0 .05 1.8125 t10 z z z z = = = = 1.645 1.96 2.33 2.58 Conf. level 90% 95% 98% 99% n = 30 t = 1.6991 t = 2.0452 t = 2.4620 t = 2.7564 In the 2012-2013 NFL season Adrian Peterson of the Minn. Vikings rushed for 2,097 yards. The all-time single-season rushing record is 2,105 yards (Eric Dickerson 1984 LA Rams). Shown below are Peterson’s rushing yards in each game: 84 60 86 102 88 79 153 123 182 171 108 210 154 212 86 199 Construct a 95% confidence interval for Peterson’s mean rushing ABILITY during the 2012-2013 season. x 131.06 s 51.59 s x t n d. f . n 1 x 131.06 s 51.59 degrees of freedom 16 1 15 from t - table, for 95% confidence t 2.1314 s 51.59 x t 131.06 2.1314 n 16 131.06 27.49 103.57,158.55 We are 95% confident that the interval (103.57, 158.55) contains Peterson' s mean rushing ABILITY per game. Because cardiac deaths increase after heavy snowfalls, a study was conducted to measure the cardiac demands of shoveling snow by hand The maximum heart rates for 10 adult males were recorded while shoveling snow. The sample mean and sample standard deviation were x 175 s the 15 population mean Find a 90% CI ,for max. heart rate for those who shovel snow. s x t n d. f . n 1 x 175, s 15 n 10 From the t - table, t 1.8331 15 175 1.8331 175 8.70 10 (166.30, 183.70) We are 90% confident that the interval (166.30, 183.70) contains the mean maximum heart rate for snow shovelers Determining Sample Size to Estimate Required Sample Size To Estimate a Population Mean • If you desire a C% confidence interval for a population mean with an accuracy specified by you, how large does the sample size need to be? • We will denote the accuracy by ME, which stands for Margin of Error. Example: Sample Size to Estimate a Population Mean • Suppose we want to estimate the unknown mean height of male undergrad students at NC State with a confidence interval. • We want to be 95% confident that our estimate is within .5 inch of • How large does our sample size need to be? Confidence Interval for In terms of the margin of error ME, the CI for can be expressed as x ME The confidence interval for is s x t n * s so ME tn 1 n * n 1 So we can find the sample size by solving this equation for n: ME t * n 1 s n t s which gives n ME * n 1 2 • Good news: we have an equation • Bad news: 1. Need to know s 2. We don’t know n so we don’t know the degrees of freedom to find t*n-1 A Way Around this Problem: Use the Standard Normal Use the corresponding z* from the standard normal to form the equation s ME z n Solve for n: * zs n ME * 2 Sampling distribution of x Confidence level .95 1.96 n 1.96 n ME ME set ME 1.96 1.96 n ME 2 n and solve for n Estimating s • Previously collected data or prior knowledge of the population • If the population is normal or nearnormal, then s can be conservatively estimated by s range 6 • 99.7% of obs. Within 3 of the mean Example: sample size to estimate mean height µ of NCSU undergrad. male students z s n ME * We want to be 95% confident that we are within .5 inch of , so ME = .5; z*=1.96 • Suppose previous data indicates that s is about 2 inches. • n= [(1.96)(2)/(.5)]2 = 61.47 • We should sample 62 male students 2 Example: Sample Size to Estimate a Population Mean -Textbooks • Suppose the financial aid office wants to estimate the mean NCSU semester textbook cost within ME=$25 with 98% confidence. How many students should be sampled? Previous data shows is about $85. 2 z *σ (2.33)(85) n 62.76 25 ME round up to n = 63 2 Example: Sample Size to Estimate a Population Mean -NFL footballs • The manufacturer of NFL footballs uses a machine to inflate new footballs • The mean inflation pressure is 13.5 psi, but uncontrollable factors cause the pressures of individual footballs to vary from 13.3 psi to 13.7 psi • After throwing 6 interceptions in a game, Peyton Manning complains that the balls are not properly inflated. The manufacturer wishes to estimate the mean inflation pressure to within .025 psi with a 99% confidence interval. How many footballs should be sampled? Example: Sample Size to Estimate a n z * Population Mean ME • The manufacturer wishes to estimate the mean inflation pressure to within .025 pound with a 99% confidence interval. How may footballs should be sampled? • 99% confidence z* = 2.58; ME = .025 • = ? Inflation pressures range from 13.3 to 13.7 psi • So range =13.7 – 13.3 = .4; range/6 = .4/6 = .067 2.58 .067 n 47.8 48 .025 2 . . . 1 2 3 48 2 Chapter 23 Hypothesis Tests for a Population Mean 33 25 pitchers with highest average fastball velocity: 2007: y1 95.92 mph 2013: 96.33mph s2 .794 mph Was they2 ABILITY of the top 25 pitchers in 2013 to throw hard greater than the ABILITY of the top 25 pitchers in 2007 to throw hard? As in any hypothesis tests, a hypothesis test for requires a few steps: 1. State the null and alternative hypotheses (H0 versus HA) a) Decide on a one-sided or two-sided test 2. Calculate the test statistic t and determine its degrees of freedom 3. Find the area under the t distribution with the t-table or technology 4. Determine the P-value with technology (or find bounds on the P-value) and interpret the result Step 1: State the null and alternative hypotheses (H0 versus HA) Decide on a one-sided or two-sided test H0: = 0 versus HA: > 0 (1 –tail test) H0: = 0 versus HA: < 0 (1 –tail test) H0: = 0 versus HA: ≠ 0 (2 –tail test) Step 2: obtain data and calculate y and s We perform a hypothesis test with null hypothesis H0 : = 0 using the test statistic y 0 t SE ( y ) where the standard error of y is . s SE ( y ) n When the null hypothesis is true, the test statistic follows a t distribution with n-1 degrees of freedom. We use that model to obtain a P-value. The one-sample t-test; P-Values Recall: The P-value is the probability, calculated assuming the null hypothesis H0 is true, of observing a value of the test statistic more extreme than the value we actually observed. The calculation of the P-value depends on whether the hypothesis test is 1-tailed (that is, the alternative hypothesis is HA : < 0 or HA : > 0) or 2-tailed (that is, the alternative hypothesis is HA : ≠ 0). 38 P-Values Assume the value of the test statistic t is t0 If HA: > 0, then P-value=P(t > t0) If HA: < 0, then P-value=P(t < t0) If HA: ≠ 0, then P-value=2P(t > |t0|) 39 25 pitchers with highest average fastball velocity: 2007: y1 95.92 mph 2013: 96.33mph s2 .794 mph Was they2 ABILITY of the top 25 pitchers in 2013 to throw hard greater than the ABILITY of the top 25 pitchers in 2007? H0: μ = 95.92 HA: μ > 95.92 where is the average fastball velocity of the top 25 2013 pitchers t, 24 df H0: μ = 95.92 HA: μ > 95.92 .008 n = 25; df = 24 0 y 96.33, s .794 t yμ 96.33 95.92 2.58 s .794 n 25 2. 58 P-value = .008 P value P(t > 2.58) Reject H0: Since P-value < .05, there is sufficient evidence that top 25 pitchers in 2013 on average throw harder Conf. Level Two Tail One Tail df 24 0.1 0.9 0.45 0.3 0.7 0.35 0.5 0.5 0.25 0.1270 0.3900 0.6848 0.7 0.3 0.15 0.8 0.9 0.2 0.1 0.1 0.05 Values of t 1.0593 1.3178 1.7109 0.95 0.05 0.025 0.98 0.02 0.01 0.99 0.01 0.005 2.0639 2.4922 2.7969 t y 0 s 1.02 0 2.58 n 1.196 10 2.4922 < t = 2.58 < 2.7969; thus 0.01 < p < 0.005. t, 24 df .008 0 2. 58 42 A popcorn maker wants a combination of microwave time and power that delivers high-quality popped corn with less than 10% unpopped kernels, on average. After testing, the research department determines that power 9 at 4 minutes is optimum. The company president tests 8 bags in his office microwave and finds the following percentages of unpopped kernels: 7, 13.2, 10, 6, 7.8, 2.8, 2.2, 5.2. Do the data provide evidence that the mean percentage of unpopped kernels is less than 10%? H0: μ = 10 HA: μ < 10 where μ is true unknown mean percentage of unpopped kernels t, 7 df H0: μ = 10 HA: μ < 10 .02 n = 8; df = 7 0 y 6.775, s 3.64 t y 6.775 10 2.51 s 3.64 n 8 -2. 51 Exact P-value = .02 P value P(t < 2.51) Reject H0: there is sufficient evidence that true mean percentage of unpopped kernels is less than 10% Conf. Level Two Tail One Tail df 7 0.1 0.9 0.45 0.3 0.7 0.35 0.5 0.5 0.25 0.1303 0.4015 0.7111 0.7 0.3 0.15 0.8 0.9 0.2 0.1 0.1 0.05 Values of t 1.1192 1.4149 1.8946 0.95 0.05 0.025 0.98 0.02 0.01 0.99 0.01 0.005 2.3646 2.9980 3.4995 2.3646 < |t| = 2.51 < 2.9980 so .01 < P-value < .025