Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sufficient statistic wikipedia , lookup
History of statistics wikipedia , lookup
Psychometrics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Foundations of statistics wikipedia , lookup
Omnibus test wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Resampling (statistics) wikipedia , lookup
Statistics for Business (ENV) Chapter 9 INTRODUCTION TO HYPOTHESIS TESTING 1 Hypothesis Testing 9.1 9.2 9.3 Null and Alternative Hypotheses and Errors in Testing z Tests about a Population with known s t Tests about a Population with unknown s 2 Hypothesis testing-1 Researchers usually collect data from a sample and then use the sample data to help answer questions about the population. Hypothesis testing is an inferential statistical process that uses limited information from the sample data as to reach a general conclusion about the population. 3 Hypothesis testing-2 • A hypothesis test is a formalized procedure that follows a standard series of operations. • In this way, researchers have a standardized method for evaluating the results of their research studies. 4 The basic experimental situation for using hypothesis testing is presented here. It is assumed that the parameter is known for the population before treatment. The purpose of the experiment is to determine whether or not the treatment has an effect. Is the population mean after treatment the same as or different from the mean before treatment? A sample is selected from the treated population to help answer this question. 5 Procedures of hypothesis-testing 1. First, we state a hypothesis about a population. Usually the hypothesis concerns the value of a population parameter. For example, we might hypothesize that the mean IQ for UIC students is = 110. 2. Next, we obtain a random sample from the population. For example, we might select a random sample of n = 100 UIC students. 3. Finally, we compare the sample data with the hypothesis. If the data are consistent with the hypothesis, we will conclude that the hypothesis is reasonable. But if there is a big discrepancy between the data and the hypothesis, we will decide that the hypothesis is wrong. 6 Null and Alternative Hypotheses • The null hypothesis, denoted H0, is a statement of the basic proposition being tested. It generally represents the status quo (a statement of “no effect” or “no difference”, or a statement of equality) and is not rejected unless there is convincing sample evidence that it is false. • The (scientific or) alternative hypothesis, denoted Ha (or H1) , is an alternative (to the null hypothesis) statement that will be accepted only if there is convincing sample evidence that it is true. • These two hypotheses are mutually exclusive and exhaustive. 7 Determined by the level of significance or the alpha level 8 Z Alpha level of .05 -- the probability of rejecting the null hypothesis when it is true is no more than 5%. 9 The locations of the critical region boundaries for three different levels of significance 10 Example: Alcohol appears to be involved in a variety of birth defects, including low birth weight and retarded growth. A researcher would like to investigate the effect of prenatal alcohol on birth weight. A random sample of n = 16 pregnant rats is obtained. The mother rats are given daily doses of alcohol. At birth, one pup is selected from each litter to produce a sample of n = 16 newborn rats. The average weight for the sample is 15 grams. The researcher would like to compare the sample with the general population of rats. It is known that regular newborn rats (not exposed to alcohol) have an average weight of m = 18 grams. The distribution of weights is normal with sd = 4. 11 H0 : µ=18 12 1. State the hypotheses The null hypothesis states that exposure to alcohol has no effect on birth weight. The alternative hypothesis states that alcohol exposure does affect birth weight. 2. Select the Level of Significance (alpha) level We will use an alpha level of .05. That is, we are taking a 5% risk of committing a Type I error, or, the probability of rejecting the null hypothesis when it is true is no more than 5%. 3. Set the decision criteria by locating the critical region 13 Alpha level of .05 -- the probability of rejecting the null hypothesis when it is true is no more than 5%. Z 14 4. COLLECT DATA and COMPUTE SAMPLE STATISTICS The sample mean is then converted to a z-score, which is our test statistic. X 0 15 18 z 3 s / n 4 / 16 5. Arrive at a decision Reject the null hypothesis 15 Hypothesis Testing Step 1: State null and alternate hypotheses Step 2: Select a level of significance Step 3: Identify the test statistic Step 4: Formulate a decision rule Step 5: Take a sample, arrive at a decision Do not reject null Reject null and accept alternate Step 1: State the null and alternate hypotheses Null Hypothesis H0: A statement about the value of a population parameter ( and s). With “=” sign Say, “ = 2” or “ 2” Alternative Hypothesis H1: A statement that is accepted if H0 is false Without “=” sign Say, “ 2” or “ < 2” 17 Step 1: State the null and alternate hypotheses H 0: = 0 H 1: = 0 Three possibilities H0: < 0 regarding H1: > 0 means H 0: > 0 H 1: < 0 a constant The null hypothesis always contains equality. 3 hypotheses about means 18 Step Two: Select a Level of Significance, Level of Significance, Measures the max probability of rejecting a true null hypothesis too high Type I Error H0 is actually true but you reject it (false positive). Type II Error H0 is false but you accept it (false negative). Level of Significance: the maximum allowable probability of making a type I error 19 Step Two: Select a Level of Significance, Risk table Null Hypothesis Ho is true Ho is false Researcher Accepts Rejects Ho Ho Correct Type I error decision (< ) Type II Correct Error Decision 20 Step 3: Select the test statistic A test statistic is used to determine whether the result of the research study (the difference between the sample mean and the population mean) is more than would be expected by chance alone. We will only consider statistics Z or t, for the time being. Since our hypothesis is about the population mean. X 0 z ~ N (0,1) s/ n 21 Test Statistic • The term test statistic simply indicates that the sample mean is converted into a single, specific statistic that is used to test the hypotheses. • The z-score statistic that is used in the hypothesis test is the first specific example of what is called a test statistic. • We will introduce several other test statistics that are used in a variety of different research situations later. 22 Step 4: Formulate the decision rule. Decision Rule Reject the H0 if Determined by level of significance H0: 0 H0: 0 H0: = 0 Computed z > Critical z Computed z < - Critical z Computed z > Critical z Or Computed z < - Critical z 23 Critical value: The dividing point between the region where H0 is rejected and the region where H0 is accepted, determined by level of significance. From the table, with statistic z, one tailed test and significance level 0.05, we found the critical value 1.65. Region of Do not rejection reject [Probability =.95] [Probability=.05] H0: 0 Reject if z > Critical z 0 1.65 Critical value 24 One-Tailed Test of Significance If H0: 0 is true, it is very unlikely that the computed z value is so large. Region of Do not rejection reject [Probability =.95] 0 [Probability=.05] 1.65 Critical value . 25 Reject the H0 if H0: 0 Computed z < - Critical z If H0: 0 is true, it is very unlikely that the computed z value (from the sample mean) is so small. 26 Two-Tailed Tests of Significance If H0: = 0 is true, it is very unlikely that the computed z value is extremely large or small. Region of Region of Do not rejection rejection reject [Probability=.025] [Probability =.95] -1.96 Critical value 0 [Probability=.025] 1.96 Critical value 27 Step 5: Make a decision. Accept ! Reject ! 28 Example One Tailed (Upper Tailed) • An insurance company is reviewing its current policy rates. When originally setting the rates they believed that the average claim amount was $1,800. They are concerned that the true mean is actually higher than this, because they could potentially lose a lot of money. They randomly select 40 claims, and calculate a sample mean of $1,950. Assuming that the population standard deviation of claims is $500, and set level of significance = 0.05, test to see if the insurance company should be concerned. Step 1: Set the null and alternative hypotheses 29 Example One Tailed (Upper Tailed) Step 2: Calculate the test statistic Step 3: Set Rejection Region Looking at the picture below, we need to put all of alpha in the right tail. Thus, R : Z > 1.96 30 Example One Tailed (Upper Tailed) Step 4: Conclude We can see that z=1.897 < 1.96, thus our test statistic is not in the rejection region. Therefore we fail to reject the null hypothesis. We cannot conclude anything statistically significant from this test, and cannot tell the insurance company whether or not they should be concerned about their current policies. 31 Example: One Tailed (Lower Tailed) Trying to encourage people to stop driving to campus, the university claims that on average it takes people 30 minutes to find a parking space on campus. John does not think it takes so long to find a spot. He calculated the mean time to find a parking space on campus for the last five times and found it to be 20 minutes. Assuming that the time it takes to find a parking spot is normally distributed, and that the population standard deviation = 6 minutes, perform a hypothesis test with level of significance alpha = 0.10 to see if his claim is correct. 32 Example: One Tailed (Lower Tailed) Step 1: Set the null and alternative hypotheses Step 2: Calculate the test statistic Step 3: Set Rejection Region Looking at the picture below, we need to put all of alpha in the left tail. Thus, R : Z < -1.28 33 Example: One Tailed (Lower Tailed) Step 4: Conclude We can see that z=-3.727 < -1.28, thus our test statistic is in the rejection region. Therefore we reject the null hypothesis in favor of the alternative. We conclude that the mean is significantly less than 30, thus John has proven that the mean time to find a parking space is less than 30. 34 Example: Two Tailed A sample of 40 sales receipts from a grocery store has mean = $137 and population standard deviation = $30.2. Use these values to test whether or not the mean in sales at the grocery store are different from $150 with level of significance alpha = 0.01. Step 1: Set the null and alternative hypotheses Step 2: Calculate the test statistic 35 Example: Two Tailed Step 3: Set Rejection Region Looking at the picture below, we need to put half of alpha in the left tail, and the other half of alpha in the right tail. Thus, R : Z < -2.58 or Z > 2.58 Step 4: Conclude We see that Z= -2.722 < -2.58, thus our test statistic is in the rejection region. Therefore we reject the null hypothesis in favor of the alternative. We can conclude that the mean is significantly different from $150, thus I have proven that the mean sales at the grocery store is not $150. 36 Example: credit manager Lisa, the credit manager, wants to check if the mean monthly unpaid balance is more than $400. The level of significance she set is .05. A random check of 172 unpaid balances revealed the sample mean to be $407. The population standard deviation is known to be $38. Should Lisa conclude that the population mean is greater than $400, or is it reasonable to assume that the difference of $7 ($407$400) is due to chance? (at confidence level 0.05) 37 Step 5 Make a decision and interpret the results. (Next page) Step 4 H0 is rejected if z > 1.65 (since = 0.05) Step 3 Since s is known, we can find the test statistic z. Step 1 H0: µ < $400 H1: µ > $400 Example: Lisa, the credit manager Step 2 The significance level is .05. 38 Step 5 Make a decision and interpret the results. oComputed z z of 2.42 > Critical z of 1.65, op of .0078 < of .05. Reject H0. X s n $407 $400 $38 172 2.42 The p-value is .0078 for a one-tailed test. (ref to informal ans.) We can conclude that the mean unpaid balance is greater than $400. 39 Limitation of z-scores in hypothesis testing • The limitation of z-scores in hypothesis testing is that the population standard deviation s (or variance) must be known. • What if you don’t know the µ and s of the population? • Answer: use the sample variability instead 40 Sample variance s2 = sum of squares of deviation/ (n-1) = sum of square of deviations/df = SS/df Since you must know the sample mean before you can compute sample variance, this places a restriction on sample variability such that only n-1 scores in a sample are free to vary. The value n-1 is called the degrees of freedom (or df ) for the sample variance. 41 Z statistic X z s n t statistic Unknown s X t s n If you select all the possible samples of a particular size (n), the set of all possible t statistics will form a t distribution. Good for: (i) large sample n>30, with the underlying distribution may or may not be Normal (ii) small sample n<30 with the underlying distribution is Normal 42 Distributions of the t statistic for different values of degrees of freedom are compared to a normal distribution. 43 44 45 46 The t distribution with df = 3. Note that 5% of the distribution is located in the tails t>2.353 and t<2.353. 47 The label on Fries’ Catsup indicates that the bottle contains 16 ounces of catsup. A sample of 36 bottles from last hour’s production revealed a mean weight of 16.12 ounces per bottle and a sample standard deviation of 0.5 ounces. At the 0.05 significance level, test if the process out of control? That is, can we conclude that the mean amount per bottle is different from 16 ounces? 48 Step 5 Make a decision and interpret the results. (Next page) Step 4 State the decision rule. Reject H0 if z > 1.96 or z < -1.96 (since = 0.05) Step 3 Since the sample size is large enough and the population s.d. is unknown, we can use the test statistic is t. Step 1 State the null and the alternative hypotheses H0: = 16 H1: 16 Step 2 Select the significance level. The significance level is .05. 49 Step 5: Make a decision and interpret the results. t X s n 16.12 16.00 0.5 36 1.44 The p-value is .1499 for a two-tailed test. oComputed z of 1.44 We cannot < Critical z of 1.96, conclude the op of .1499 > of .05, mean is different Do not reject the null hypothesis. from 16 ounces. 50 Testing for a Population Mean: Unknown (Population) standard deviation , Small sample. But the underlying distribution is Normal The test statistic is the t distribution. The critical value of t is determined by its degrees of freedom which is equal to n-1. t X s/ n 51 The current rate for producing 5 amp fuses at a Electric Co. is 250 per hour. A new machine has been purchased and installed. According to the supplier, the production rate are normally A sample of 10 randomly selected hours from last month revealed that the mean hourly production was 256 units, with a sample s.d. of 6 per hour. distributed. At the 0.05 significance level, test if the new machine is faster than the old one? 52 Step 4 State the decision rule. degrees of freedom = 10 – 1 =9 . Reject H0 if t > 1.833 Step 3 Since the underlying distribution is normal, s is unknown, use the t distribution. Step 1 State the null and alternate hypotheses. H0: µ < 250 H1: µ > 250 Step 2 Select the level of significance. It is .05. 53 Step 5 Make a decision and interpret the results. t X s n 256 250 6 10 3.162 The p-value is 0.0058. (obtained from t, need a software to find it.) oComputed t of 3.162 >Critical t of 1.833 op of .0058 < alpha of .05 Reject Ho The mean number of fuses produced is more than 250 per hour. If the p-value is less than alpha , then reject the null hypothesis. 54 Example: One-sample hypothesis test for mean • Amount of time UIC students spend in library from survey – Mean 41.72 minutes – Standard deviation 40.179 minutes – Number of cases 294 • National survey finds university library users spend mean of 38 minutes • Is population mean for UIC Library users different from national mean? Step 1. Hypotheses • Null hypothesis H0: μ = μ0 μ = 38 • Alternative or research hypothesis Ha: μ ≠ μ0 μ ≠ 38 Step 2. Level of significance • Probability of error in making decision to reject null hypothesis • For this test choose α = 0.05 Region of Region of Do not rejection rejection reject [Probability=.025] [Probability =.95] -1.96 Critical value 0 [Probability=.025] 1.96 Critical value Step 3. Test statistic y 0 41.72 38 t 1.588 s / n 40.179 / 294 Region of Region of Do not rejection rejection reject [Probability=.025] [Probability =.95] -1.96 Critical value 0 [Probability=.025] 1.96 Critical value n = 294 so use critical t values from table for infinity. 4. Decision • Cannot reject the null hypothesis • Cannot conclude that population mean is different from 38 minutes 95% confidence Interval in this example: E=1.96* =4.59 [41.72-4.59, 41.72+4.59] or [37.13, 46.31] Confidence interval and hypothesis test for library example • Confidence interval for time spent in library is 37.13 < μ < 46.31 • Hypothesized value of 38 minutes falls within confidence interval • Therefore we cannot say that population mean is not equal to 38 minutes, cannot reject the null hypothesis Using confidence intervals or hypothesis tests • For parameters for a single sample… – One-sample hypothesis test involves comparison with pre-specified value… – Which is often artificial… – So confidence interval most appropriate for reporting results • For parameters for two samples… – Difference in parameters is of interest – Hypothesis test examines directly – Confidence interval less intuitive Confidence interval or Hypothesis test? • Hypothesis tests are better when the chief issue is to make a yes/no decision about whether a pattern exists in a population. • Confidence intervals are better when the chief issue is to make a best guess of a population parameter. When reading a scientific journal, you typically will not be told explicitly that the researcher evaluated the data using a z-score as a test statistic with an alpha level of .05. Nor will you be told that “the null hypothesis is rejected.” Instead, you will see a statement such as: The treatment with medication had a significant effect on people’s depression scores, z = 3.85, p < .05. Let us examine this statement piece by piece. First, what is meant by the term significant? In statistical tests, this word indicates that the result is different from what would be expected due to chance. A significant result means that the null hypothesis has been rejected. That is, the data are significant because the sample mean falls in the critical region and is not what we would have expected to obtain if H0 were true. Next, what is the meaning of z = 3.85? The z indicates that a z-score was used as the test statistic to evaluate the sample data and that its value is 3.85. 63 Finally, what is meant by p< .05? This part of the statement is a conventional way of specifying the alpha level that was used for the hypothesis test. More specifically, we are being told that an outcome as extreme as the result of the experiment would occur by chance with a probability (p) that is less than .05 (alpha) if H0 were true. 64 In circumstances where the statistical decision is to fail to reject H0, the report might state that There was no evidence that the medication had an effect on depression scores, z=1.30, p> .05. In this case, we are saying that the obtained result, z= 1.30, is not unusual (not in the critical region) and is relatively likely to occur by chance (the probability is greater than .05). Thus, H0 was not rejected. 65 Using the p-Value in Hypothesis Testing p-value does not only tell us whether we should reject H0, but also tell us how confident we are to reject it. If the p-Value a, H0 cannot be rejected. If the p-Value < a, H0 is rejected. Sample means that fall in the critical region (shaded areas) have a probability less than alpha. H0 should be rejected. 66 More Example: To test the effectiveness of eye-spot patterns in deterring predation, a sample of n=16 insectivorous birds is selected. The animals are tested in a box that has two separate chambers (see figure). The birds are free to roam from one chamber to another through a doorway in a partition. On the wall of one chamber, two large eye-spot patterns have been painted. The other chamber has plain walls. The birds are tested one at a time by placing them in the doorway in the center of the apparatus. Each animal is left in the box for 60 minutes, and the amount of time spent in the plain chamber is recorded. Suppose that the sample of n=16 birds spent an average m of 39 minutes in the plain side, with SS=540. Can we conclude that eye-spot patterns have an effect on behavior? Note that we have no information about the population variance. 67 Step 1: State the hypotheses : H0: µplain side = 30 minutes Step 2: Locate the critical region. The test statistic is a t statistic because the population variance is not known. df=16-1=15 For a two-tailed test at the .05 level of significance and with 8 degrees of freedom, the critical region consists of t values greater than +2.131 or less than -2.131 Step 3: Calculate the test statistic s2 = SS/df = 540/15 = 36 sm = sqrt(s2 /16) = 1.5 the t statistic t=(39-30)/1.5=6 Step 4: Make a decision – reject H0 68 The critical region in the t distribution for alpha= .05 and df=15. 69 HYPOTHESIS TESTING for: population proportions 70 Example: Survey data on attitudes toward income inequality • Imagine that we would like to find out if US adults had some net opinion on the following issue. • “Do you think it should or should not be the government’s responsibility to reduce income differences between the rich and the poor?” • • • • Score Response 1 should be 0 should not be 636 Total n = 1227 Number 591 Survey data on attitudes toward income inequality • 0: Assumptions: we will be doing a large-sample test for population proportions. To perform this test, we must assume that… – Sample size is large enough that np(1-p) > 10 – The sample is a random sample of some sort – The variable is a discrete interval-scale variable, which is automatically true for population proportions. Survey data on attitudes toward income inequality • 1: Hypothesis: let denote the population proportion who favor government intervention to alleviate income inequality. • Our null hypothesis is that the population, on average, neither supports nor opposes government intervention. – Ho: = 0.5 • The alternate hypothesis is then – HA: 0.5 Survey data on attitudes toward income inequality • 2: Test Statistic: For an n of 1227 respondents, we calculate the following statistics: –P = n(yes)/n(total) = 591/1227 = .4817 – σ0 = SQRT(o(1- o)) = .5 – SE = σ0 / SQRT(n) = .01427 –z = (P - o ) / s.e. = (.4817 - .500) / .01427 = -1.282 • The z-statistic is the test statistic of interest in a largesample test of a population proportion. . . Survey data on attitudes toward income inequality 3. Pick α = 0.05 & determine critical z Region of Region of Do not rejection rejection reject [Probability=.025] [Probability =.95] [Probability=.025] -1.282 -1.96 Critical value 0 1.96 Critical value Survey data on attitudes toward income inequality • 4: Conclusion: Therefore, we do not reject the hypothesis that the population proportion is .5