Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Degrees of freedom (statistics) wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Psychometrics wikipedia , lookup
Omnibus test wikipedia , lookup
Misuse of statistics wikipedia , lookup
Lecture notes 7b: Inference for a difference in means Outline: • • • • • Hypothesis test for two means using independent samples (example 1) CI for two means using independent samples “Paired” differences Hypothesis test for a paired difference (example 2) CI for a paired difference Inference for a difference between population means • We have looked at how hypothesis tests and confidence intervals can be used to draw inference on a population mean. • In practice, it is more common to want to investigate how two means differ from one another – or if they differ at all. • We will look at how we compare the means from two separate groups (“independent samples”), as well as how we compare the means of two sets of observations taken on the same group (“paired data”). Hypothesis tests for two means (independent samples) • We will first look at inferential procedures for comparing means when we have “independent samples”, i.e. two different groups of subjects. • We’ll start with a hypothesis test, and then look at constructing a corresponding confidence interval. • The following hypothesis test is usually referred to as a “two sample t-test”. Hypothesis tests for two means • This is the formula for the test statistic which compares two sample means to one another: • Here, the subscripts refer to populations 1 and 2. • In the numerator of this formula, , is the point estimate for the difference between population means 1 and 2. Hypothesis tests for two means • is the hypothesized difference between the means. This is almost always zero, because the null hypothesis is almost always that the means are equal. If we reject this null, it will be in favor of the alternative hypothesis that the means are different. • When statistic: is zero, we can simplify the test Hypothesis tests for two means • The denominator of this equation is the standard error of . It combines the standard errors of both and . • And so this statistic follows the same general formula as the statistic for a one sample t-test: Example 1 • A researcher who is studying the relationship between concentration and balance conducts an experiment in which nine elderly subjects and eight young subjects each stand barefoot on a “force platform”, which measures how much a person sways (in millimeters) in the forward/backward and side-to-side directions. • Subjects are asked to maintain a stable upright position and to react as quickly as possible to a randomly timed noise by pressing on a hand held button. • The researcher would like to know if there is a difference between the elderly and young with regards to how well they maintain balance in this scenario. Example 1 Here is the data: Elderly subjects 18 11 16 24 18 14 41 21 17 Young subjects 10 16 22 12 14 12 18 37 Step 1: the hypotheses State the null and alternative hypotheses: Step 2: α and the critical value • This time we didn’t specify a level of significance. By convention, when no level of significance is specified, we default to α = 0.05 • As with the one sample test for a mean, the test statistic used for a two sample test follows a t-distribution. So, we will use a t-critical value. • As it turns out, the formula for finding degrees of freedom for the sampling distribution of this test statistic is pretty involved: Step 2: α and the critical value In practice, software can be used to find this degrees of freedom. In our class, you will be given the degrees of freedom for two sample procedures. In this case, df = 11. Find the critical value and sketch the sampling distribution of the test statistic under H0: Step 3: The test statistic and p-value Compute the test statistic and corresponding pvalue: Step 4: The statistical decision State and interpret the statistical decision: Example 1 confidence interval • We can also construct a confidence interval for the true difference in mean millimeters of sway between the elderly and the young. • Recall the general form of a confidence interval: CI = point estimate ± margin of error or CI = point estimate ± (critical value) * (standard error) Example 1 confidence interval • Using this general form as a guide, we see that Construct a 95% CI for the true difference in mean sway between the young and elderly groups: Example 1 confidence interval Do the results of the hypothesis test and confidence interval agree? Why or why not? A brief aside: pooling variance • The 2 sample t-test we just conducted involved finding estimates for the standard deviations of our two groups (young and elderly) separately. • There is a slight variation on this method which involves “pooling” the standard deviations. Usually this is referred to as “pooling the variances” – mathematically this is the same thing, since the standard deviation is the square root of the variance. • This method assumes that the population standard deviations of the two groups are equal, and so we can estimate one standard deviation that applies to both groups. We will not be using this technique in our class, but in practice it is often used. Statistical Tests Involving Paired Data • Sometimes, when testing for a difference in means, we are able to measure the same subjects twice and test for differences in the two measurements. • Examples include “before” and “after” type studies (e.g. compare blood pressure before being put on a drug to after being on the drug), or studies where each subject can be measured under two different treatments, or a treatment and a control (e.g. conduct a vision test with your right eye, then with your left eye, and compare the results). • These “paired” studies have an advantage over studies using independent samples in that there is less natural variability to account for. • Example: suppose we want to see if beer and wine consumption have different effects on short term memory. We could take a sample of participants, randomly assign them to either the beer or wine group, and then administer a memory test after they’ve consumed some specified quantity of alcohol. • If we do this, any differences we observe between the groups may be attributable to the beer or wine, but they may also be attributable many other variables that affect memory (age, genetics, physical health, etc.) and that differ from person to person. • Using paired observations, i.e. multiple observations on the same subjects, in effect “controls” for these other variables. • If each participant undergoes a memory test once after consuming beer and once after consuming wine, then we know that any difference between these pair observations will not be attributable to these other variables (age, genetics, physical health, etc.). • Because of this, the amount of random variability that we normally expect to see in our data will be reduced, which might make it easier to test our research hypotheses. • Using paired observations can introduce other possible problems, and so this type of test is not always the most appropriate. • For instance, human subjects may perform differently on a test the nd 2 time due to having gone through it once. • Also, you may not be able to test against placebo in a paired study. • If conducting a paired study is feasible and appropriate, then it will generally be preferable to a two-sample study because it results in a smaller standard error. • The formula for a paired t-test is essentially identical to the formula for a t-test for a single mean. The only difference is that the data we use is not raw observations of a variable; rather it is the differences between the paired observations. • We note that our data takes the form of differences by using the subscript “d” in our notation. Apart from this, the formula is identical to that of a one sample test for a mean: Example 2 A physical fitness program is designed to increase a person’s upper body strength. To determine the effectiveness of this program a SRS of 31 members of a health club was selected and each member was asked to do as many push-ups as possible in 1 minute. After 1 month on the program the participants were once again asked to do as many push-ups as possible in 1 minute. These values were recorded and the difference (After - Before) was computed. Example 2 Test to see if the program is effective in increasing upper body strength as defined by the “number of push ups in one minute” metric, using α = 0.05. Here are summary statistics for the variable “difference”: Variable difference N 31 Mean 9.17 Median 8.00 StDev 8.06 SE Mean 1.45 Note here that “SE mean” (the standard error of the mean) is found by dividing the standard deviation by the square root of the sample size. Sample data (Partial Listing) Subject 1 2 3 4 5 6 7 8 … Before 28 34 28 60 20 25 32 19 … After 32 32 42 64 41 33 49 32 … Difference 4 -2 14 4 21 8 17 13 … 44 This shows how the paired differences are calculated. For the purposes of this test, our data will be the “difference” column. Note that this isn’t all of the data. Step 1: The Hypotheses Step 2: α and the critical value Step 3: test statistic and p-value Step 4: the statistical decision Example 2 confidence interval We can also construct a CI for the true mean difference in upper body strength, before and after the program: Example 2 confidence interval Finally, we can interpret this confidence interval, and note how it relates to the hypothesis test we conducted: Conclusion • Hopefully the basic process of performing a hypothesis test and constructing a confidence interval has become familiar. • The statistical techniques that we study throughout the remainder of the class will always involve a hypothesis test, confidence interval, or both. These are the “bread and butter” of statistical inference. • In the next set of notes, we will consider the assumptions that underlie these procedures, as well as some areas of controversy in statistical inference.