Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Student's t-test wikipedia, lookup

Taylor's law wikipedia, lookup

Bootstrapping (statistics) wikipedia, lookup

Resampling (statistics) wikipedia, lookup

Misuse of statistics wikipedia, lookup

Psychometrics wikipedia, lookup

Transcript

Lesson 8: One-Sample Tests of Hypothesis Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-1 Outline Review of CLT and CI What is a hypothesis? Type I and type II errors? One-Tailed Tests Two-Tailed Tests p-Value in Hypothesis Testing Tests Concerning Proportion Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-2 Central Limit Theorem #1 5 balls in the bag: 0 1 2 3 4 Draw 50 ball 1000 times with replacement. Compute the sample mean. Plot a relative frequency histogram (empirical probability histogram) of the 1000 sample means. The Central Limit Theorem says 1. The empirical histogram looks like a normal density. 2. Expected value (mean of the normal distribution) = 2. 3. Variance of the sample means = 2/50=0.04. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-3 Confidence Interval #1 Five numbered balls in the bag: ? ? ? ? ? Draw one sample of 50 balls with replacement. Compute the sample mean and sample standard deviation. Suppose the sample mean is 10 and the sample standard deviation is 0.04. Can you tell us the range of possible values the population mean may take, at 95% confidence level? m Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-4 Hypothesis testing #1 Five numbered balls in the bag: ? ? ? ? ? Draw one sample of 50 balls with replacement. Compute the sample mean and sample standard deviation. Suppose the sample mean is 10 and the sample standard deviation is 0.04. Do you think the balls in this bag has a mean of 2? 2 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-5 What is a Hypothesis? A Hypothesis is a statement about the value of a population parameter developed for the purpose of testing. Examples of hypotheses made about a population parameter are: The mean monthly income for systems analysts is $3,625. Twenty percent of all customers at Bovine’s Chop House return for another meal within a month. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-6 What is Hypothesis Testing? Hypothesis testing is a procedure, based on sample evidence and probability theory, used to determine whether the hypothesis is a reasonable statement and should not be rejected, or is unreasonable and should be rejected. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Hypothesis Testing Step 1: state null and alternative hypothesis Step 2: select a level of significance Step 3: identify the test statistic Step 4: formulate a decision rule Step 5: Take a sample, arrive at a decision Do not reject null Ka-fu Wong © 2004 Reject null and accept alternative ECON1003: Analysis of Economic Data Lesson8-8 Null and Alternative Hypothesis Null Hypothesis H0: A statement about the value of a population parameter. Alternative Hypothesis H1: A statement that is accepted if the sample data provide evidence that the null hypothesis is false. Level of Significance: The probability of rejecting the null hypothesis when it is actually true. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-9 Objectivity in formulating a hypothesis In court, the defendant is presumed innocent until proven beyond reasonable doubt to be guilty of stated charges. The “null hypothesis”, i.e. the denial of our theory, is presumed true until we prove beyond reasonable doubt that it is false. “Beyond reasonable doubt” means that the probability of claiming that our theory is true when it is not (null hypothesis true) is less than an a priori set significance level (usually 5% or 1%). Is the defendant guilty? Null: the defendant is not guilty. Alternative: the defendant is guilty. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-10 Type I and type II errors Type I Error: conclude the defendant guilty when the defendant did not commit the crime. Level of significance is also the maximum probability of committing a type I error. We want to limit this Type I Error to some small number. Type II Error: Conclude the defendant not guilty when the defendant actually committed the crime. Committed Crime Court Decision (Guilty or not) Ka-fu Wong © 2004 Yes No Guilty Correct decision Type I error Not Guilty Type II error Correct decision ECON1003: Analysis of Economic Data Lesson8-11 Type I and type II errors Type I Error: Rejecting the null hypothesis when it is actually true. Level of significance is also the maximum probability of committing a type I error. We want to limit this Type I Error to some small number. Type II Error: Accepting the null hypothesis when it is actually false. State of nature Decision based Don’t reject on the sample null statistic Reject null Ka-fu Wong © 2004 Null true Null false Correct decision Type II error Type I error Correct decision ECON1003: Analysis of Economic Data Lesson8-12 Critical values Test statistic: A value, determined from sample information, used to determine whether or not to reject the null hypothesis. Critical value: The dividing point between the region where the null hypothesis is rejected and the region where it is not rejected. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-13 One-Tailed Tests A test is one-tailed when the alternate hypothesis, H1 , states a direction, such as: H1: The mean yearly commissions earned by full-time realtors is more than $35,000. (µ>$35,000) H1: The mean speed of trucks traveling on I-95 in Georgia is less than 60 miles per hour. (µ<60) H1: Less than 20 percent of the customers pay cash for their gasoline purchase. ( < .20) Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-14 Sampling Distribution for the Statistic Z for a One Tailed Test, .05 Level of Significance .95 probability Critical Value z=1.65 .05 probability 0 Ka-fu Wong © 2004 1 2 3 4 Rejection region Reject the null if the test statistic falls into this region. ECON1003: Analysis of Economic Data Two-Tailed Tests A test is two-tailed when no direction is specified in the alternate hypothesis H1 , such as: H1: The mean amount spent by customers at the WalMart in Georgetown is not equal to $25. (µ $25). H1: The mean price for a gallon of gasoline is not equal to $1.54. (µ $1.54). Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-16 Sampling Distribution for the Statistic Z for a Two Tailed Test, .05 Level of Significance .95 probability Critical Value z=-1.96 Critical Value z=1.96 .025 probability .025 probability -4 -3 -2 -1 Rejection region #1 Ka-fu Wong © 2004Reject 0 1 2 3 4 Rejection region #2 the null if the test statistic falls into these two regions. ECON1003: Analysis of Economic Data Copyright© 2002 by The McGraw-Hill Companies, Inc. All rights reserved Testing for the Population Mean: Large Sample, Population Standard Deviation Known When testing for the population mean from a large sample and the population standard deviation is known, the test statistic is given by: X m z / n Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-18 EXAMPLE 1 The processors of Fries’ Catsup indicate on the label that the bottle contains 16 ounces of catsup. The standard deviation of the process is 0.5 ounces. A sample of 36 bottles from last hour’s production revealed a mean weight of 16.12 ounces per bottle. At the .05 significance level is the process out of control? That is, can we conclude that the mean amount per bottle is different from 16 ounces? Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-19 EXAMPLE 1 continued Step 1: State the null and the alternative hypotheses: H0: m = 16; H1: m 16 Step 2: Select the level of significance. In this case we selected the .05 significance level. Step 3: Identify the test statistic. Because we know the population standard deviation, the test statistic is z. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-20 EXAMPLE 1 continued Step 4: State the decision rule: Reject H0 if z > 1.96 or z < -1.96 Step 5: Compute the value of the test statistic and arrive at a decision. X m 16.12 16.00 z 1.44 n 0.5 36 Do not reject the null hypothesis. We cannot conclude the mean is different from 16 ounces. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-21 p-Value in Hypothesis Testing A p-Value is the probability, assuming that the null hypothesis is true, of finding a value of the test statistic at least as extreme as the computed value for the test. The “critical probability” for our decision to reject the null. If the p-Value is smaller than the significance level, H0 is rejected. If the p-Value is larger than the significance level, H0 is not rejected. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-22 Computation of the p-Value One-Tailed Test: p-Value = P{z ≥absolute value of the computed test statistic value} Two-Tailed Test: p-Value = 2P{z ≥ absolute value of the computed test statistic value} From EXAMPLE 1, z = 1.44, and because it was a two-tailed test, the p-Value = 2P{z ≥ 1.44} = 2(.5-.4251) = .1498. Because .1498 > .05, do not reject H0. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-23 Testing for the Population Mean: Large Sample, Population Standard Deviation Unknown Here is unknown, so we estimate it with the sample standard deviation s. As long as the sample size n 30, z can be approximated with: X m z s/ n Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-24 EXAMPLE 2 Roder’s Discount Store chain issues its own credit card. Lisa, the credit manager, wants to find out if the mean monthly unpaid balance is more than $400. The level of significance is set at .05. A random check of 172 unpaid balances revealed the sample mean to be $407 and the sample standard deviation to be $38. Should Lisa conclude that the population mean is greater than $400, or is it reasonable to assume that the difference of $7 ($407-$400) is due to chance? Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-25 EXAMPLE 2 continued Step 1: H0: m $400, H1: m > $400 Step 2: The significance level is .05 Step 3: Because the sample is large we can use the z distribution as the test statistic. Step 4: H0 is rejected if z>1.65 Step 5: Perform the calculations and make a decision. X m $407 $400 z 2.42 s n $38 172 H0 is rejected. Lisa can conclude that the mean unpaid balance is greater than $400. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-26 Testing for a Population Mean: Small Sample, Population Standard Deviation Unknown The test statistic is the t distribution. The test statistic for the one sample case is given by: X m t s/ n Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-27 Example 3 The current rate for producing 5 amp fuses at Neary Electric Co. is 250 per hour. A new machine has been purchased and installed that, according to the supplier, will increase the production rate. A sample of 10 randomly selected hours from last month revealed the mean hourly production on the new machine was 256 units, with a sample standard deviation of 6 per hour. At the .05 significance level can Neary conclude that the new machine is faster? Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-28 Example 3 continued Step 1: State the null and the alternate hypothesis. H0: m 250; H1: m > 250 Step 2: Select the level of significance. It is .05. Step 3: Find a test statistic. It is the t distribution because the population standard deviation is not known and the sample size is less than 30. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-29 Example 3 continued Step 4: State the decision rule. There are 10 – 1 = 9 degrees of freedom. The null hypothesis is rejected if t > 1.833. Step 5: Make a decision and interpret the results. X m 256 250 t 3.162 s n 6 10 The null hypothesis is rejected. The mean number produced is more than 250 per hour. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-30 Tests Concerning Proportion A Proportion is the fraction or percentage that indicates the part of the population or sample having a particular trait of interest. The sample proportion is denoted by p and is found by: Number of successes in the sample p Number sampled Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-31 Test Statistic for Testing a Single Population Proportion z p (1 ) n The sample proportion is p and is the population proportion. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-32 EXAMPLE 4 In the past, 15% of the mail order solicitations for a certain charity resulted in a financial contribution. A new solicitation letter that has been drafted is sent to a sample of 200 people and 45 responded with a contribution. At the .05 significance level can it be concluded that the new letter is more effective? Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-33 Example 4 continued Step 1: State the null and the alternate hypothesis. H0: .15 H1: > .15 Step 2: Select the level of significance. It is .05. Step 3: Find a test statistic. The z distribution is the test statistic. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-34 Example 4 continued Step 4: State the decision rule. The null hypothesis is rejected if z is greater than 1.65. Step 5: Make a decision and interpret the results. z p (1 ) n 45 .15 200 2.97 .15(1 .15) 200 The null hypothesis is rejected. More than 15 percent are responding with a pledge. The new letter is more effective. Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-35 Lesson 8: One-Sample Tests of Hypothesis - END - Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson8-36