Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STAT 111 Introductory Statistics Lecture 10: Confidence Intervals and Hypothesis Tests June 8, 2004 Today’s Topics • Confidence intervals revisited • Margin of error for confidence intervals • Introduction to hypothesis testing Confidence Intervals Revisited • A level C confidence interval for some population parameter θ is an interval [L, U] computed from sample data by a method that has probability C of producing an interval containing the true value of the parameter • In other words, P(L ≤ θ ≤ U) = C, C can be 90%, 95%, 99%, etc. Confidence Intervals • The general form of a confidence interval is given by estimate ± margin of error • The estimate is our guess for the value of the unknown population parameter θ. • The margin of error shows how accurate we believe our guess is, based on the variability of the estimate. Confidence Interval for a Population Mean • Suppose we choose a simple random sample of size n from a population with unknown mean µ and known standard deviation σ. Then a level C confidence interval for µ is xz • z* satisfies * n – P(-z* ≤ Z ≤ z*) = C – P(Z < -z*) = P(Z > z*) = (1 – C)/2 Confidence Interval for a Parameter P(-z* Z z* ) C 1 C P( Z z * ) 2 z * 0 z * Z P( Z z * ) 1 C 2 Confidence Interval for a Population Mean • Recall the Central Limit Theorem. • Suppose we have any population whose distribution has mean µ and standard deviation σ. If we draw a large enough SRS from this population, then X is approximat ely N , n • This is true regardless of what the actual population distribution is. Confidence Interval for a Population Mean • Hence, if the population follows a normal distribution, or the sample size is sufficiently large, we have * * * * P z X z P( z Z z ) C n n • This leads to * * P X z X z C n n Confidence Intervals for a Population Mean • For any confidence interval, there are two possibilities: – The interval contains the true value of the parameter (in this case, µ). – Our SRS was one of the few samples for which µ is not contained in the interval. • It is incorrect to say that there is probability C that the unknown population parameter (µ) lies within our particular confidence interval. Confidence Interval for a Population Mean • It means that if we repeatedly sample from the population, then the true population mean µ will be covered by the constructed confidence intervals (100C)% of the time. • Remember! It is incorrect to say that the probability that the true population mean µ lies within the confidence interval is C. • JAVA Applet for demonstrating confidence intervals Confidence Interval for a Population Mean Lower Confidence Limit xz * Upper Confidence Limit x xz n * n Width on each side(Margin of error) mz Width of the CI 2 z * * n n Commonly Used Confidence Levels Confidence level(C) 1-C (1-C)/2 z* (z(1-C)/2) 99% .01 0.005 2.575 98% .02 0.01 2.33 95% .05 0.025 1.96 90% .10 0.05 1.645 80% .20 0.1 1.28 Example 1 • The number and the types of television programs and commercials targeted at children is affected by the amount of time children watch TV. • A survey was conducted among 100 American children, in which they were asked to record the number of hours they watched TV per week. • The sample mean is 27.191. • The known population standard deviation is 8. • Estimate the average watch time at a 95% confidence level. Example 2 • A study of preferred height for an experimental keyboard with large forearm-wrist support was conducted. 31 trained typists were selected, and the preferred keyboard height was determined for each of them. • The resulting sample average height was 80 cm. • Assume the preferred height is normally distributed with σ = 2 cm. • Calculate a 90% confidence interval for µ, the true average preferred height for the population. Example 3 • Suppose we desire a confidence interval for the true average stray-load loss µ (in watts) for a certain type of induction motor when the line current is held at 10 amps for a speed of 1500 rpm. Assume that stray-load loss is normally distributed with σ = 3.0 • If the a sample of size 100 produces a mean strayload loss of 58.3, compute a 99% confidence interval for µ. Example 4 • The yield point of a particular type of mild steelreinforcing bar is known to be normally distributed with σ = 100. • The composition of the bar has been slightly modified without affecting either the normality or the value of σ. • If a sample of 25 modified bars results in a sample average yield point of 8439 lb, compute a 92% confidence interval for the true average yield point of the modified bar. Confidence Intervals (cont.) • Confidence intervals for other parameters in a population can also be constructed. • In particular, confidence intervals can be constructed on the standard deviation/variance of a population whose distribution has known mean µ. • Also on populations in which some event occurs with proportion p. (More on this one later on.) Margin of Error of a Confidence Interval • The margin of error m is mz * n • Margin of error measures precision of our estimate, but covers only random sampling errors. • The size of the margin of error depends on – Confidence level – Sample size – Population standard deviation Confidence Interval • The length (width) of a confidence interval is width 2 z * n • The length (width) of a confidence interval increases if the margin of error increases. • The width of a confidence interval increases if – Confidence level increases – Sample size decreases – Population standard deviation increases Choosing the Sample Size • Fixing the confidence level, a confidence interval for a population mean will have a specified margin of error m when the sample size is z n m * 2 • By achieving a specified margin of error, we can estimate the mean to within that margin of error units. Example 1 • To estimate the amount of lumber that can be harvested in a tract of land, the mean diameter of trees in the tract must be estimated to within one inch with 99% confidence. What sample size should be taken? (Assume diameters are normally distributed with σ = 6 inches.) Example 2 • Suppose that the standard deviation of the salaries of a population of individuals is 30K, how many individuals do we need to sample so that the 90% CI has a margin of error no more than 5K? Example 3 • Monitoring of a computer time-sharing system has suggested that response time to a particular command is normally distributed with σ = 25 ms. • A new operating system is installed, and we wish to estimate the true average response time µ for the new environment. • Assuming that response times are still normally distributed with σ = 25, what sample size is necessary to ensure that the resulting 95% confidence interval has a width of at most 10? Cautions on CI for Population Mean • The data must be an SRS from the population. • Formula is incorrect for more complex probability sampling designs. • Formula requires carefully produced data. • Confidence interval is not resistant to outliers. • When sample size is small, examine data for skewness and other signs of non-normality. • Formula requires standard deviation of population to be known, which is not realistic in practice. Introduction: Hypothesis Testing • Confidence intervals are one of the two most common types of formal statistical inference. • We prefer confidence intervals when our goal is to estimate a population parameter. • Second common type of inference is used when we want to assess the evidence provided by the data in favor of some claim (hypothesis) about the population. Hypothesis Testing • Examples of claims to which hypothesis testing can be applied: – Are less than 10% of all circuit boards produced by a particular manufacturer defective? – Is the true average inside diameter of a certain type of pipe 0.75 cm? – Does one type of twine have a higher average breaking strength than a second type of twine? – For a pharmaceutical company, is a new drug effective for a certain disease? Hypothesis Testing • The hypothesis is a statement about the parameters in a population or model. • The results of a test are expressed in terms of a probability that measures how well the data and the hypothesis agree. • In hypothesis testing, we need to set up two hypotheses: – The null hypothesis H0 – The alternative hypothesis Ha (sometimes denoted H1) Hypothesis Testing • The null hypothesis is the claim which is initially favored or believed to be true. • The null hypothesis is also the claim that we will try to find evidence against. • Usually the null hypothesis is a statement of “no effect” or “no difference.” • The test of significance is designed to assess the strength of the evidence against the null hypothesis. Hypothesis Testing • The alternative hypothesis is the claim that we hope or suspect is true instead of H0. • We often begin with the alternative hypothesis Ha and then set up H0 as the statement that the hoped-for effect is not present. • Stating Ha is often a difficult task. • Hypotheses in general refer to some population or model and not to any particular outcome. Hypothesis Testing • The alternative hypothesis Ha can be either onesided or two-sided. • One-sided alternative hypotheses: –μ>0 – p ≤ 0.5 –σ<2 • Two-sided alternative hypotheses: –μ≠0 – p ≠ 0.5 –σ≠2 Example • Experiments on learning in animals sometimes measure how long it takes a mouse to find its way through a maze. The mean time is 18 second for one particular maze. A researcher thinks that a loud noise will cause the mice to complete the maze faster. She measures how long each of 10 mice takes with a noise as stimulus. • Let μ be the mean time of mice to find their way through a particular maze when noise is presented as a stimulus. – H0: μ = 18 – Ha: μ < 18 One-sided Ha Example • Does more than half of the American population have faith in the economy? 100,000 Americans are sampled. • Let p be the population proportion of people who have faith in the economy. – H0: p ≤ 0.5 – Ha: p > 0.5 One-sided Ha Example • The Census Bureau reports that households spend an average of 31% of their total spending on housing. A homebuilders association in Cleveland wonders if the national finding applies in their area. They interview a sample of 40 households in the Cleveland metropolitan area to learn what percent of their spending goes toward housing. • Let μ be the mean percent of spending of households in Cleveland on housing. – H0: μ = 0.31 Two-sided Ha – Ha: μ ≠ 0.31 Example • Does one type of twine have a higher average breaking strength than a second type of twine? • Let μ1 be the average breaking strength of the first type of twine, and let μ2 be the average breaking strength of the second type. – H0: μ1 = μ2 – Ha: μ1 ≠ μ2 Two-sided Ha Hypothesis Testing • The alternative hypothesis in general should express the hopes or suspicions we bring to the data. • We should not, however, look first at the data and then frame Ha to fit what the data show. • Use a two-sided alternative unless you have a specific direction firmly in mind beforehand. • In some circles, it is argued that the two-sided alternative should always be used in testing.