Survey

Document related concepts

German tank problem wikipedia, lookup

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Degrees of freedom (statistics) wikipedia, lookup

Foundations of statistics wikipedia, lookup

Transcript

Chapter 18 & 20: Inferences Involving One Population Student’s t, df = 25 Student’s t, df = 15 Student’s t, df = 5 0 t 1 Chapter Goals • Learned about confidence intervals • Consider inference about p, the probability of success • Assumed s was known • Consider inference about m when s is unknown 2 Inferences About the Probability of Success • Possibly the most common inference of all • Many examples of situations in which we are concerned about something either happening or not happening • Two possible outcomes, and multiple independent trials 3 Background 1. p: the binomial parameter, the probability of success on a single trial 2. p ': the observed or sample binomial probability x x represents the number of successes that p' n occur in a sample consisting of n trials 3. For the binomial random variable x: m np, s npq , where q 1 p 4. The distribution of x is approximately normal if n is larger than 20 and if np and nq are both larger than 5 4 Sampling Distribution of p' Sampling Distribution of p': If a sample of size n is randomly selected from a large population with p = P(success), then the sampling distribution of p' has: 1. a mean m p ' equal to p, 2. a standard error s p ' equal to ( pq) / n , and 3. an approximately normal distribution if n is sufficiently large In practice, use of the following guidelines will ensure normality: 1. The sample size is greater than 20 2. The sample consists of less than 10% of the population 3. The products np and nq are both larger than 5 5 The Assumptions... The assumptions for inferences about the binomial parameter p: The n random observations forming the sample are selected independently from a population that is not changing during the sampling Confidence Interval Procedure: The unbiased sample statistic p' is used to estimate the population proportion p The formula for the 1 a confidence interval for p is: p ' z(a/2) p' q' n to p ' + z(a/2) p' q' n where p' x / n and q' 1 p' 6 Example Example: A recent survey of 300 randomly selected fourth graders showed 210 participate in at least one organized sport during one calendar year. Find a 95% confidence interval for the proportion of fourth graders who participate in an organized sport during the year. Solution: 1. Describe the population parameter of concern The parameter of interest is the proportion of fourth graders who participate in an organized sport during the year 2. Specify the confidence interval criteria a. Check the assumptions The sample was randomly selected Each subject’s response was independent 7 Solution Continued b. Identify the probability distribution z is the test statistic p' is approximately normal n 300 20 np ' 300(210 / 300) 210 5 nq ' 300(90 / 300) 90 5 c. Determine the level of confidence: 1 a 0.95 3. Collect and present sample evidence Sample information: n = 300, and x = 210 The point estimate: p' x / n 210 / 300 0.70 8 Solution Continued 4. Determine the confidence interval a. Determine the confidence coefficients: Using Table A: z(a /2) = z(0.025) = 1.96 b. The maximum error of estimate: p' q' (0.70)(0.30) E z(a /2) 1.96 n 300 (1.96) 0.0007 (1.96)(0.0265) 0.0519 c. Find the lower and upper confidence limits: p ' E 0.70 0.0519 0.6481 to to to p '+ E 0.70 + 0.0519 0.7519 9 Solution Continued d. The Results 0.6481 to 0.7519 is a 95% confidence interval for the true proportion of fourth graders who participate in an organized sport during the year z(a /2) ] 2 p * q * [ Sample Size Determination: n E2 E: maximum error of estimate 1 a: confidence level p*: provisional value of p (q* = 1 p*) If no provisional values for p and q are given use p* = q* = 0.5 (Always round up) 10 Example Example: Determine the sample size necessary to estimate the true proportion of laboratory mice with a certain genetic defect. We would like the estimate to be within 0.015 with 95% confidence. Solution: 1. Level of confidence: 1 a = 0.95, z(a/2) = z(0.025) = 1.96 2. Desired maximum error is E = 0.015. 3. No estimate of p given, use p* = q* = 0.5 [ z(a /2) ]2 p * q * (1.96) 2 (0.5) (0.5) 4. Use the formula for n: n 2 E (0.015) 2 0.9604 4268.44 n 4269 0.000225 11 Note Note: Suppose we know the genetic defect occurs in approximately 1 of 80 animals Use p* = 1/80 = 0.125: [z(a /2) ]2 p * q * (1.96) 2 (0.0125) (0.9875) n 2 E (0.015) 2 0.0474 210.75 n 211 0.000225 As illustrated here, it is an advantage to have some indication of the value expected for p, especially as p becomes increasingly further from 0.5 12 Inference About mean m (s unknown) • Inferences about m are based on the sample mean x • If the sample size is large or the sample population is normal: z* ( x m ) /(s / n ) has a standard normal distribution • If s is unknown, use s as a point estimate for s • Estimated standard error of the mean: s / n 13 Student’s t-Statistic 1. When s is used as an estimate for s, the test statistic has two sources of variation: x and s 2. The resulting test statistic: xm t Known as the Student’s t-statistic s n 3. Assumption: samples are taken from normal populations 4. The population standard deviation, s, is almost never known in real-world problems The standard error will almost always be estimated using s n Almost all real-world inference about the population mean will be completed using the Student’s t-statistic 14 Properties of the t-Distribution (df>2) 1. t is distributed with a mean of 0 2. t is distributed symmetrically about its mean 3. t is distributed so as to form a family of distributions, a separate distribution for each different number of degrees of freedom (df 1) 4. The t-distribution approaches the normal distribution as the number of degrees of freedom increases 5. t is distributed with a variance greater than 1, but as the degrees of freedom increase, the variance approaches 1 6. t is distributed so as to be less peaked at the mean and thicker at the tails than the normal distribution 15 Student’s t-Distributions Normal distribution Student’s t, df = 15 Student’s t, df = 5 0 t Degrees of Freedom, df: A parameter that identifies each different distribution of Student’s t-distribution. For the methods presented in this chapter, the value of df will be the sample size minus 1, df = n 1. 16 Notes 1. The number of degrees of freedom associated with s2 is the divisor (n 1) used to define the sample variance s2 Thus: df = n 1 2. The number of degrees of freedom is the number of unrelated deviations available for use in estimating s2 3. Table for Student’s t-distribution (Table C) is a table of critical values. Left column = df. When df > 100, critical values of the tdistribution are the same as the corresponding critical values of the standard normal distribution. 4. Notation: t(df, a) Read as: t of df, a 17 t-Distribution Showing t(df, a) a 0 t (df, a ) t 18 Example Example: Find the value of t(12, 0.025) 0.025 0.025 - t (12, 0.025) 2.18 Portion of Table 6 df 0 t (12, 0.025) 2.18 t Amount of a in one-tail ... ... 0.025 .. . 12 2.18 19 Notes 1. If the df is not listed in the left-hand column of Table C, use the next smaller value of df that is listed 2. Most computer software packages will calculate either the area related to a specified t-value or the t-value that bounds a specified area 3. The cumulative distribution function (CDF) is often used to find area from to t 4. If the area from to t is known and the value of t is wanted, then the inverse cumulative distribution function (INVCDF) is used cumulative probability t 20 The Assumption... The assumption for inferences about mean m when s is unknown: The sampled population is normally distributed Confidence Interval Procedure: 1. Procedure for constructing confidence intervals similar to that used when s is known 2. Use t in place of z, use s in place of s 3. The formula for the 1 a confidence interval for m is: s x t(df, a/2) n to s + x t(df, a/2) n where df n 1 21 Example Example: A study is conducted to learn how long it takes the typical tax payer to complete their federal income tax return. A random sample of 17 income tax filers showed a mean time (in hours) of 7.8 and a standard deviation of 2.3. Find a 95% confidence interval for the true mean time required to complete a federal income tax return. Assume the time to complete the return is normally distributed. Solution: 1. Parameter of Interest The mean time required to complete a federal income tax return 2. Confidence Interval Criteria a. Assumptions: Sampled population assumed normal, s unknown b. Test statistic: t will be used c. Confidence level: 1 a = 0.95 22 Solution Continued 3. The Sample Evidence: n 17, x 7.8, and s 2.3 4. The Confidence Interval a. Confidence coefficients: t(df, a /2) = t(16, 0.025) = 2.12 b. Maximum error: s 2.3 ( 2.12)(0.5578) 1.18 t (16, 0.025) E ( 2.12) n 17 xE to x+E 7.8 1.18 6.62 to to 7.8 + 1.18 8.98 c. Confidence limits: 5. The Results: 6.62 to 8.98 is the 95% confidence interval for m 23