Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sociology 5811: Lecture 9: CI / Hypothesis Tests Copyright © 2005 by Evan Schofer Do not copy or distribute without permission Announcements • Problem Set #3 Due next week • Problem set posted on course website • We are a bit ahead of reading assignments in Knoke book • Try to keep up; read ahead if necessary Review: Confidence Intervals • General formula for Confidence Interval: C.I. : Y Zα/2 (σ Y ) • • • • Where: Y-bar is the sample mean Sigma sub-Y-bar is the standard error of the mean Z (alpha/2) is the critical Z-value for a given level of confidence – If you want 90%, look up Z for 45% (a/2) – See Knoke, Figure 3.5 on page 87 for info Small N Confidence Intervals • Issue: What if N is not large? • The sampling distribution may not be normal • Z-distribution probabilities don’t apply… • Standard CI formula doesn’t work • Solution: Use the “T-Distribution” • A different curve that accurately approximates the shape of the sampling distribution for small N • Result: We can look up values in a “t-table” to determine probabilities associated with a # of standard deviations from the mean. Confidence Intervals for Small N • Small N C. I. Formula: • Yields accurate results, even if N is not large C.I. : Y t α/2 (σ̂ Y ) • Again, the standard error can be estimated by the sample standard deviation: s C.I. : Y t α/2 N T-Distributions • The T-distribution is a “family” of distributions • In a T-Distribution table, you’ll find many T-distributions to choose from – Basically, the shape of sampling distribution varies with the size of your sample • You need a specific t-distribution depending on sample size • One t-distribution for each “degree of freedom” – Also called “df” or “DofF” • Which T-distribution should you use? • For confidence intervals: Use T-distribution for df = N - 1 • Ex: If N = 15, then look at T-distribution for df = 14. Looking Up T-Tables Choose the desired probability for a/2 Find t-value in correct row and column Choose the correct df (N-1) Interpretation is just like a Z-score. 2.145 = number of standard errors for C.I.! Answering Questions… • Knowledge of the standard error allows us to begin answering questions about populations • Example: National educational standard requires all schools to maintain a test score average of 60 • You observe that a sample (N=16, s=6) has a mean of 62 • Question: Are you confident that the school population is above the national standard? • We know Y-bar for the sample, but what about m for the whole school? • Are we confident that m > 60? Question: Is m > 60? • Strategy 1: Construct a confidence interval around Y-bar • And, see if the bounds fall above 60 • Visually: Confident that m > 60: Y 58 59 60 61 62 63 64 65 • Visually: m might be 60 or less 66 Y 58 59 60 61 62 63 64 65 66 Question: Is m > 60? • Strategy 1: Construct a confidence interval around Y-bar – Let’s choose a desired confidence level of .95 – N of 16 is “small”… we must use the t-distribution, not the Z-distribution – Look up t=value for 15 degrees of freedom (N-1). Looking Up T-Tables Choose the desired probability for a/2 Find t-value in correct row and column Choose the correct df (N-1)=15 Result: t = 2.131 Question: Is m > 60? • Strategy 1: Construct a confidence interval around Y-bar s C.I. : Y t α/2 (σ̂ Y ) Y t α/2 N 6 62 2.131 62 3.47 16 • CI is 58.53 to 65.47! We aren’t confident m > 60 Y 58 59 60 61 62 63 64 65 66 Question: Is m > 60? • Note #1: Results would change if we used a different confidence level • A 95% and 50% CIs yield different conclusions: Y 58 59 60 61 62 63 64 65 66 • Idea: Wouldn’t it be nice to know exactly which CI would describe the distance from Y-bar to m? • i.e., to calculate the exact probability of Y-bar falling a certain distance from m? Question: Is m > 60? • Note #2: We typically draw CIs around Y-bar – But, we can also get the same result focusing on our comparison point (Y = 60) • Example: If 60 is outside of CI around Y-bar Y 58 59 60 61 62 63 64 65 66 • Then, Y-bar is outside of the CI around 60 Y 58 59 60 61 62 63 64 65 66 Question: Is m > 60? • The critical issue is: How far is the distance between Y-bar and 60 – Is it “far” compared to the width of the sampling distribution? • Ex: Y-bar is more than 2 Standard Errors from 60? • In which case, the school probably exceeds the standard – Or, is it relatively close? • Ex: Y-bar is only .5 Standard Errors from 60 • In which case we aren’t confident… – Note: If we know the sampling distribution is normal (or t-distributed), we can convert SE’s to a probability Question: Is m > 60? • Strategy 2: Determine the probability of Y-bar = 62, if m is really 60 or less • Procedure: – 1. Use Y=60 as a reference point – 2. Determine how far Y-bar is from 60, measured in Standard Errors • Which we can convert to a probability – 3. Issue: Is it likely to observe a Y-bar as high as 62? • If this is common to observe, even when m = 60 (or less), then we can’t be confident that m > 60! • But, if that is a rare event, we can be confident that m > 60! Question: Is m > 60? • Strategy 2: Look at sampling distribution • Confident that m not 60 or less: m is unlikely to really be Y 60… because Y-bar usually falls near the center of the sampling distribution! 58 59 60 61 62 63 64 65 • Visually: m might easily be <60 Y 66 In this case, it is common to get Y-bars of 62 or even higher 58 59 60 61 62 63 64 65 66 Question: Is m > 60? • Issue: How do we tell where Y-bar falls within the sampling distribution? • Strategy: Compute a Z-score • Recall: Z-scores help locate the position of case within a distribution • It can tell us how far a Y-bar falls from the center of the sampling distribution • In units of “standard errors”! • Probability can be determined from a Z-table • Note: for small N, we call it a t-score, look up in a t-table. Question: Is m > 60? • Note: We use a slightly modified Z formula (Yi Y ) (Y μ) Zi sY σY • “Old” formula calculates # standard deviations a case falls from the sample mean • From Y-sub-i to Y-bar • New formula tells the number of standard errors a mean estimate falls from the population mean m • Distance from Y-bar to m in the sampling distribution • In this case we compare to hypothetical m = 60. Question: Is m > 60? • Let’s calculate how far Y-bar falls from m – Since N is small, we call it a “t-score” or “t-value” (Y μ) (62 60) t 2 / σ̂ Y σY σ̂ Y s 6 σ̂ Y 1.5 N 4 t 2 / σ̂ Y 2 / 1.5 1.333 • Y-bar is 1.33 standard errors above 60! Question: Is m > 60? • Question: What is the probability of t>1.33 • i.e., Y-bar falling 1.333 or more standard errors from m)? This area reflects the probability Y 58 59 60 61 62 63 64 65 66 • Result: p = about .105 • Note: Knoke t-table doesn’t contain this range… have to look it up elsewhere or use SPSS to calculate probability. Question: Is m > 60? • Result: p = .105 • In other words, if m = 60, we will observe Y-bar of 62 or greater about 10% of the time • Conclusion: It is plausible that m is 60 or lower • We are not 95% confident that m > 60 • Conclusion matches result from confidence interval • We have just tested a claim using inferential statistics! Hypothesis Testing • Hypothesis Testing: • A formal language and method for examining claims using inferential statistics – Designed for use with probabilistic empirical assessments • Because of the probabilistic nature of inferential statistics, we cannot draw conclusions with absolute certainty – We cannot “prove” our claims are “true” – However, improbable, we will occasionally draw an un-representative sample, even if it is random Hypothesis Testing • The logic of hypothesis testing: • We cannot “prove” anything • Instead, we will cast doubt on other claims, thus indirectly supporting our own • Strategy: • 1. We first state an “opposing” claim • The opposite of what we want to claim • 2. If we can cast sufficient doubt on it, we are forced (grudgingly) to accept our own claim. Hypothesis Testing • Example: Suppose we wish to argue that our school is above the national standard • First we state the opposite: • “Our school is not above the national standard” • Next we state our alternative: • “Our school is above the national standard” • If our statistical analysis shows that the first claim is highly improbable, we can “reject” it, in favor of the second claim • …“accepting” the claim that our school is doing well. Hypothesis Testing: Jargon • Hypotheses: Claims we wish to test • Typically, these are stated in a manner specific enough to test directly with statistical tools – We typically do not test hypotheses such as “Marx was right” / “Marx was wrong” – Rather: The mean years of education for Americans is/is not above 18 years. Hypothesis Testing: Jargon • The hypothesis we hope to find support for is referred to as the alternate hypothesis • The hypothesis counter to our argument is referred to as the null hypothesis • Null and alternative hypotheses are denoted as: • H0: School does not exceed the national standard • H-zero indicates null hypothesis • H1: School does exceed national standard • H-1 indicates alternate hypotheses • Sometimes called: “Ha” Hypothesis Testing: More Jargon • If evidence suggests that the null hypothesis is highly improbable, we “reject” it • Instead, we “accept” the alternative hypothesis • So, typically we: • Reject H0, accept H1 – Or: • Fail to reject H0, do not find support for H1 • That was what happened in our example earlier today… Hypothesis Testing • In order to conduct a test to evaluate hypotheses, we need two things: • 1. A statistical test which reflects on the probability of H0 being true rather than H1 • Here, we used a z-score/t-score to determine the probability of H0 being true • 2. A pre-determined level of probability below which we feel safe in rejecting H0 (a) • In the example, we wanted to be 95% confident… a =.05 • But, the probability was .10, so we couldn’t conclude that the school met the national standard! Hypothesis Test for the Mean • Example: Laundry Detergent • Suppose we work at the Tide factory • We know the “cleaning power” of tide detergent, exactly: It is 73 on a continuous scale. • “Cleaning Power” of Tide = 73 • You conduct a study of a competitor. You buy 50 bottles of generic detergent and observe a mean cleaning power of 65 • H0: Tide is no better than competitor (m >= 73) • H1: Tide is better than competitor (m < 73) Hypothesis Test: Example • It looks like Tide is better: • Cleaning power is 73, versus 65 for a sample of the competition • Question: Can we reject the null hypothesis and accept the alternate hypothesis? • Answer: No! It is possible that we just drew an atypical sample of generic detergent. The true population mean for generics may be higher. Hypothesis Test: Example • We need to use our statistical knowledge to determine: • What is the probability of drawing a sample (N=50) with mean of 65 from a population of mean 73 (the mean for Tide) • If that is a probable event, we can’t draw very strong conclusions… • But, if the event is very improbable, it is hard to believe that the population of generics is as high as that of Tide… • We have grounds for rejecting the null hypothesis. Hypothesis Test: Example • How would we determine the probability (given an observed mean of 65) that the population mean of generic detergent is really 73? • Answer: We apply the Central Limit Theorem to determine the shape of the sampling distribution • And then calculate a Z-value or T-value based on it • If we chose an alpha (a) of .05 • If we observe a t-value with probability of only .0023, then we can reject the null hypothesis. • If we observe a t-value with probability of .361, we cannot reject the null hypothesis Hypothesis Test: Steps • 1. State the research hypothesis (“alternate hypothesis), H1 • 2. State the null hypothesis, H0 • 3. Choose an a-level (alpha-level) – Typically .05, sometimes .10 or .01 • 4. Look up value of test statistic corresponding to the a-level (called the “critical value”) • Example: find the “critical” t-value associated with a=.05 Hypothesis Test: Steps • 5. Use statistics to calculate a relevant test statistic. – T-value or Z-value – Soon we will learn additional ones • 6. Compare test statistic to “critical value” – If test statistic is greater, we reject H0 – If it is smaller, we cannot reject H0 Hypothesis Test: Steps • Alternate steps: • 3. Choose an alpha-level • 4. Get software to conduct relevant statistical test. – Software will compute test statistic and provide a probability… the probability of observing a test statistic of a given size. – If this is lower than alpha, reject H0 Hypothesis Test: Errors • Due to the probabilistic nature of such tests, there will be periodic errors. • Sometimes the null hypothesis will be true, but we will reject it – Our alpha-level determines the probability of this • Sometimes we do not reject the null hypothesis, even though it is false Hypothesis Test: Errors • When we falsely reject H0, it is called a Type I error • When we falsely fail to reject H0, it is called a Type II error • In general, we are most concerned about Type I errors… we try to be conservative. Hypothesis Tests About a Mean • What sorts of hypothesis tests can one do? • 1. Test the hypothesis that a population mean is NOT equal to a certain value – Null hypothesis is that the mean is equal to that value. • 2. Population mean is higher than a value – Null hypothesis: mean is equal or less than a value • 3. Population mean is lower than a value – Null hypothesis: mean is equal or greater than a value • Question: What are examples of each? Hypothesis Tests About Means • Example: Bohrnstedt & Knoke, section 3.93, pp. 108-110. N = 1015, Y-bar = 2.91, s=1.45 • H0: Population mean m = 4 • H1: Population mean m not = 4 • Strategy: • 1. Choose Alpha (let’s use .001) • 2. Determine the Standard Error • 3. Use S.E. to determine the range in which sample means (Y-bar) is likely to fall 99.9% of time, IF the population mean is 4. • 4. If observed mean is outside range, reject H0 Example: Is m =4? • Let’s determine how far Y-bar is from hypothetical m=4 • In units of standard errors (Y μ) (2.91 4) 1.09 t σY σ̂ Y σ̂ Y sY 1.45 σ̂ Y .046 N 1015 t 1.09 / .046 24.0 • Y-bar is 24 standard errors below 4.0! Hypothesis Tests About a Mean • A Z-table (if N is large) or a T-table will tell us probabilities of Y-bar falling Z (or T) standard deviations from m • In this example, the desired a = .001 • Which corresponds to t=3.3 (taken from t-table) – That is: .001 (i.e, .1%) of samples (of size 1015) fall beyond 3.29 standard errors of the population mean – 99.9% fall within 3.29 S.E.’s. Hypothesis Tests About a Mean • There are two ways to finish the “test” • 1. Compare “critical t” to “observed t” – Critical t is 3.3, observed t = -24 • We reject H0: t of +/-24 is HUGE, very improbable • It is highly unlikely that m = 4 • 2. Actually calculate the probability of observing a t-value of 24, compare to pre-determined a • If observed probability is below a, reject H0 – In this case, probability of t=27 is .0000000000000… • Very improbable. Reject H0! Two-Tail Tests • Visually: Most Y-bars should fall near m • 99.9% CI: –3.3 < t < 3.3, or 3.85 to 4.15 Mean of 2.91 (t=24) is far into the red area (beyond edge of graph) Sampling Distribution of the Mean 3.85 Z=-3.3 4 4.15 Z=+3.3 Hypothesis Tests About a Mean • Note: This test was set up as a “two-tailed test” • Meaning, that we reject H0 if observed Y-bar falls in either tail of the sampling distribution • Ex: Very high Y-bar or very low Y-bar means reject H0 – Not all tests are done that way… Sometimes you only reject H0 if Y-bar falls in one particular tail. Hypothesis Testing • Definition: Two-tailed test: A hypothesis test in which the a-area of interest falls in both tails of a Z or T distribution. • Example: H0: m = 4; H1: m ≠ 4 • Definition: One-tailed test: A hypothesis test in which the a-area of interest falls in just one tail of a Z or T distribution. • Example: H0: m > or = 4; H1: m < 4 • This is called a “directional” hypothesis test. Hypothesis Tests About Means • A one-tailed test: H1: m < 4 • Entire a-area is on left, as opposed to half (a/2) on each side. Also, critical t-value changes. 4 Hypothesis Tests About Means • T-value changes because the alpha area (e.g., 5%) is all concentrated in one size of distribution, rather than split half and half. • One tail vs. Two-tail: a.05 a/2.025 a/2.025 Hypothesis Tests About Means • Use one-tailed tests when you have a directional hypothesis – e.g., m > 5 • Otherwise, use 2-tailed tests • Note: In many instances, you are more likely to reject the null hypothesis when utilizing a onetailed test – Concentrating the alpha area in one tail reduces the critical T-value needed to reject H0 Tests for Differences in Means • A more useful and interesting application of these same ideas… • Hypothesis tests about the means of two different groups – Up until now, we’ve focused on a single mean for a homogeneous group – It is more interesting to begin to compare groups – Are they the same? Different? • We’ll do that next class!