Download Significance Tests

Sociology 601: Class 6: September 17, 2008 5.2-5.3: Confidence intervals 6.1: Elements of a significance test 6.2: Large-sample significance test for a mean 6.3: Large-sample significance test for a proportion 6.4: Types of error. 1 Where we stand so far • We can use statistics from a sample to estimate parameters for a population – specifically, we can make a point estimate of a parameter and an interval estimate / confidence interval of a parameter. • As we have seen, however, a common-sense interpretation of an interval estimate is elusive. • Today, we will define an approach to produce interpretations reasonable nonstatisticians can understand. • “Is (some statement) true?” 2 6.1: Informal steps in a significance test • 1.) Decide on a sample statistic that will tell us about a population parameter (e.g., a sample mean). • 2.) Make a prediction about some attribute of the population: (= a hypothesis: mean age = 25) • 3.) measure the result for a sample • 4.) see how much the result differs from the prediction • 5.) if the result is pretty close to the prediction, you cannot conclude much, but if the result is far away from the prediction, you have weakened the hypothesis and might reject it. • (Part 5 is the potentially confusing part) 3 Formal steps in a significance test 1.) 2.) 3.) 4.) 5.) List assumptions State a hypothesis (or two) Calculate a test statistic Look up a p - value State a formal conclusion • Note: this is how Agresti and Finlay do it; there are other appropriate treatments of significance tests 4 1.) List Assumptions about the sample The sample statistic we choose to calculate will only have a predictable relationship with a population parameter when certain assumptions about the sample and the variable are true. We should state these assumptions explicitly. Examples include… – type of data – the form of the population distribution (e.g. normal) – method of sampling – sample size 5 2.) Hypotheses about the parameter • A significance test considers two hypotheses about the value of the parameter. – The null hypothesis, designated Ho, is usually a statement that the parameter has a value corresponding to, in some sense, no effect. – The alternative hypothesis, designated Ha when it is explicitly stated, is usually a statement that the parameter has some other value. • Note: the hypothesis is not a theory about why the parameter has a certain value. 6 The special backward logic we use in developing hypotheses • Property of reality: If A is a polar bear, then A must be white. • Hypothesis: A is a polar bear • Observed result #1: A is white. • What can you conclude about the hypothesis? • Observed result #2: A is not white. • What can you conclude about the hypothesis? 7 A generalized application to statistical inference Assumption about reality: If a population parameter has a certain value, then the corresponding sample statistic should be in a certain range. Hypothesis:  has a certain value •Sample result #1: Ybar has a result consistent with the predicted value of . •Sample result #2: Ybar has a result inconsistent with the predicted value of . 8 A typical approach to stating hypotheses • Usually, we construct our significance test around a null hypothesis: – The mean value of some variable in the population is zero, or – The difference between two population means is zero, or – The mean value for some group is the same as the known value for the entire population. • Then, a meaningful result tends to be one that forces us to reject our null hypothesis. • We often have our own hypothesis about what is actually going on, but this will be only one of many interpretations of any alternative hypothesis. 9 3.) Calculate a test statistic • The test statistics we will look at today are for population mean and proportion. • We also calculate statistics useful for interval estimation such as the standard deviation. • Later in the semester we will look at chi-squared statistics, statistics for slopes of regression lines, and so on. 10 4.) Look up a p-value. • The p-values for population mean and proportion are based on standardized confidence intervals - in other words, on the number of standard errors that separate the observed sample statistic from the hypothesized population parameters. • The p-value is the probability, if Ho were true, that the sample statistic would fall this far (i.e., that many standard errors) from the population parameter, or closer. • The smaller the p-value, the more strongly the data contradict Ho. 11 5.) State a conclusion • In the conclusion, we judge the evidence against Ho and usually make a formal decision to reject Ho or not. • We also interpret the results in terms of the original question motivating the test. What do we know that we did not know before? We make reference to some or all of the following numbers. – p-values or z-scores – standard errors – differences between hypothesized and observed means 12 6.2: Step-by-step example for a significance test • Many political commentators have remarked that US citizens have been politically conservative in recent decades. • A recent General Social Survey allows us to collect data to test this assertion. – respondents are asked to rate their political ideology on an ordered 7-point scale 13 Survey data on political ideology Score Response 1 Extremely liberal 2 Liberal 3 Slightly liberal 4 Moderate 5 Slightly conservative 6 Conservative 7 Extremely conservative Count 12 66 109 239 116 74 11 Q: level of measurement? 14 Significance test for political ideology Assumptions: we will be doing a large-sample test for population means. To perform this test, we must assume that… – Sample size is at least 30 (some researchers insist on 50 or 100). This implies that the sampling distribution for samples of this size has a normal curve, so that interval inferences based on the normal curve are appropriate. – The sample is a random sample of some sort. – The variable is a quantitative variable with interval scale. 15 Significance test for political ideology • Hypothesis: let  denote the population mean ideology, based on this seven-point scale. • One null hypothesis is that the population has moderate political ideology. – Ho:  = 4.0 • The alternate hypothesis is then – HA:   4.0 • (To discuss: another null hypothesis is that the population does not have a conservative ideology.) – Ho:  <= 4.0 16 Significance test for political ideology • Test Statistic: For an n of 627 respondents, we calculate the following statistics: – Ybar = 4.032 –s = 1.258 – s.e. = s / SQRT(n) = .05024 –z = (Ybar - o ) / s.e. – = (4.032 - 4.000) / .05024 – = 0.64 • The z-statistic is the statistic of interest in a large-sample test of a population mean. 17 Significance test for political ideology • P-value: When we look up the p-value for z = 0.64, we get .2611. This means that if the true population mean is really 4.0, then 26% of samples of size 627 would have z-scores of 0.64 or greater by chance alone. • Two-tailed test: We have stated the hypothesis as a two-tailed test. The p-value for a two-tailed test is 2*.2611 = .5222. This means that if the true population mean is really 4.0, then 52% of samples of size 627 would have z-scores this far or farther from 0 by chance alone. 18 Why do we usually use two-tailed p-values? • Common practice: Other researchers expect to read two-tailed p-values, and computer outputs give two-tailed p-values. • Conservative results: With a two-tailed test, you face more stringent standards to reject the null hypothesis and report a significant finding. • Flexibility: What if you did this study and found a statistically significant pattern of liberal political ideology? Would you be duty-bound to ignore it? • (Researchers still sometimes use 1-tailed pvalues.) 19 Significance test for political ideology • Conclusion: The p-value of .56 for a two-tailed test indicates that it is quite possible, given a true population mean of 4.000, to have a sample mean as far from the mean as 4.032. Therefore, we do not reject the hypothesis that the population mean is 4.0. • Furthermore, a sample mean ideology score of 4.032 indicates that our best guess of the population’s political ideology score is essentially neutral. 20 Never “Accept Ho” When we “do not reject Ho” that doesn’t mean that we accept that Ho is true. – For example, given a sample mean of 4.03 and a null hypothesis that μ=4.00, the true population mean could be 4.00, but it could also be a value much higher than 4.00 with an “unlucky” (but still random) sample of low scores within that population. – Given no other information about the population, our best guess of the population mean is 4.032, not 4.00. We simply have not proven with any conviction that it isn’t 4.00. 21 6.3: Next example: Significance test for a population proportion • A question in the General Social Survey asked: “Do you think it should or should not be the government’s responsibility to reduce income differences between the rich and the poor?” • Imagine that we would like to find out if US adults had some net opinion on this issue. 22 Survey data on attitudes toward income inequality “Do you think it should or should not be the government’s responsibility to reduce income differences between the rich and the poor?” Score 1 0 Response Number should be 591 should not be 636 Total N = 1227 23 Survey data on attitudes toward income inequality, Step 1 1: Assumptions: we will be doing a large-sample test for population proportions. To perform this test, we must assume that… – Sample size is large enough that np(1-p) > 10 • A&F suggest using the standard: N > 30 when .3 <= p <=.7 – The sample is a random sample of some sort – The variable is a discrete interval-scale variable, which is automatically true for population proportions. 24 Survey data on attitudes toward income inequality, Step 2 2: Hypothesis: let  denote the population proportion who favor government intervention to alleviate income inequality. • Our null hypothesis is that the population, on average, neither supports nor opposes government intervention. – Ho:  = 0.5 • The alternate hypothesis is then – HA:   0.5 25  Survey data on attitudes toward income inequality, Step 3 3: Test Statistic: For an N of 1227 respondents, we calculate the following statistics: –  ˆ – σ0 = N(yes)/ N(total) = 591/1227 = .4817 – s.e. = σ0 / SQRT(n) = .500 / SQRT(1227) = .01427 –z = (  ˆ - o ) /s.e. – = (.4817 - .500) / .01427 – = -1.282 = SQRT(o(1- o)) = SQRT(.500 * .500) = .5  • The z-statistic is the test statistic of interest in a largesample test of a population proportion. 26 Survey data on attitudes toward income inequality, Step 4 4: P-value: When we look up the p-value for z = 1.28, we get .1003. This means that if the true population proportion is really .5, then 10% of samples of size 1227 would have z-scores of -1.26 or less by chance alone. • We have stated the hypothesis as a two-tailed test. The p-value for a two-tailed test is 2 * .1003 = 0.2006. 27 Survey data on attitudes toward income inequality, Step 5 5: Conclusion: The p-value of .21 for a two-tailed test indicates that it is possible, given a true population proportion of .5, to have a sample mean as far from the proportion as .482 Therefore, we do not reject the hypothesis that the population proportion is .5 • Furthermore, a sample mean ideology score of .482 indicates that our best guess of the population’s attitude toward government intervention is essentially neutral. 28 Confidence interval or significance test? • Significance tests are better when the chief issue is to make a yes/no decision about whether a pattern exists in a population. • Confidence intervals are better when the chief issue is to make a best guess of a population parameter. 29 A practical concern: what should you do with categories that are inconclusive? “A joint USA Today/CNN/Gallup poll in July 1995 indicated that of 832 white adults, 53% thought affirmative action had been good for the country and 37% thought it had not been good; the remaining 10% were undecided.” • There is not always a universal correct answer in such cases. Behave honestly, and make your decisions transparent. 30 6.4 Decision rules in hypothesis tests • In our significance tests so far, we have calculated the pvalue in step 4, then decided what to conclude about H0 in step 5. • Traditionally, social scientists often decided in step 1 what p-value will constitute sufficient evidence to reject H0. – This is called using a fixed decision rule. • Why use fixed decision rules? – They supposedly reduce the chance that we will succumb to temptation on a borderline result, and choose a p-value that will allow us to reject the null. 31 Formal definition of a decision rule • A statistical decision rule specifies for each possible sample outcome which outcome HO or HA should be selected. • Before we calculate a test statistic, we pick the significance level at which we will decide to reject the null hypothesis. • The predetermined significance level is  (alpha), the probability of rejecting the null hypothesis due to a chance distribution, if the null is actually true. – we reject HO if our observed p-value is less than . 32 Types of errors in hypothesis tests •  (alpha) is called the probability of a type I error. – a type I error occurs when we reject the null hypothesis for a population where the null hypothesis is true. – type I error is like “crying wolf” when there is no wolf •  (beta)is called the probability of a type II error. – type II error is the error of not rejecting the null hypothesis, when the null is in fact false. – type II error is like not noticing a wolf that is really there – NOTE:  is not equal to 1- , although  is often larger than . 33 Errors and correct conclusions in a hypothesis test Possible types of error depend on your sample statistics and on the true state of reality State of reality: Your conclusion: Ho true Ho not true do not reject Ho correct inference, negative result type II error reject Ho type I error correct inference, positive result 34 Consequences of errors in hypothesis tests • Consequences of type I (alpha) error: – misleads other researchers – social costs of erroneous information – damages your reputation as a careful researcher • Consequences of type II (beta) error: – no publication for you (probably) – no damage to your reputation as a careful researcher – the truth stays hidden, with possible social consequences • hopefully, the truth will come out later 35 Terms used in hypothesis tests with decision rules • alpha level: the probability of a type I error, conditional on the idea that Ho is really true. – an alpha level is expressed as a probability • rejection region: the collection of test-statistic values for which the test rejects Ho. – a rejection region is often expressed as a range of z-scores • action limit: the value of a test statistic at which one will shift to rejecting the null hypothesis. 36 Example of a hypothesis test with a decision rule The General Social Survey often asks: – “I’m going to show you a seven-point scale on which the political views that people hold are arranged from extremely liberal (point 1) to extremely conservative (point 7). Where would you place yourself on this scale?” Use a fixed decision rule to test whether the responses in 1994 were neutral or leaned toward any political view. Use alpha = .01. • N = 2879 • Y-bar = 4.170 • s = 1.39 37 Hypothesis test for GSS political views • N = 2879 Y-bar = 4.170 s = 1.39 • assumptions: random sample, interval variable, large sample size, alpha = .01 • hypothesis: Ho is that  = 4.00 • test statistic: z = (Ybar - o)/(s.e.) = (4.17 – 4.00) /(1.39 / SQRT(2879)) = 6.6 • p-value: p < .01 (for z of 6.6) • Conclusion: the p-value is in the rejection region, so reject Ho: the population does not have neutral political views. 38 Thought questions • In the previous GSS example, what is the probability that we have committed a type I error? – answer: unknown!. .01 is the probability that a random sample would produce a type I error, conditional on H0 being true for the population. We have no idea whether this condition is true. • another question: if Hois really true in our case, what is the probability that we have committed a type I error? • yet another question: Based on our result, what is the probability that we have committed a type II error? 39 What (prospective) factors affect the probability of error? • The probability of a type I error depends on… – appropriateness of assumptions – alpha level • The probability of a type II error depends on… – – – – – appropriateness of assumptions sample size the efficiency of the estimation procedure the true value of the population parameter alpha level 40

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Significance Tests