* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download IQL Chapter 10
Psychometrics wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
Omnibus test wikipedia , lookup
Statistical hypothesis testing wikipedia , lookup
Misuse of statistics wikipedia , lookup
IQL Chapter 10 – tTests, Two – Way Tables, and ANOVA Statistical Reasoning for everyday life, Bennett, Briggs, Triola, 3rd Edition 10.1 t Distribution for Inferences about a Mean LEARNING GOAL Understand when it is appropriate to use the Student t distribution rather than the normal distribution for constructing confidence intervals or conducting hypothesis tests for population means, and know how to make proper use of the t distribution. t-DISTRIBUTION FOR INFERENCES ABOUT A MEAN Much of the work in preceding sections assumed the sampling distribution is normal, but a review of articles in professional journals shows that professional statisticians rarely use the normal distribution for confidence intervals and hypothesis tests in real applications. A major reason for this is that the normal distribution requires that we know the population standard deviation σ. Because we generally do not know σ, we must estimate it with the sample standard deviation s. Statisticians therefore prefer an approach that does not require knowing σ. Such is the case with the Student t distribution, or t distribution for short, which can be used when we do not know the population standard deviation and either the sample size is greater than 30 or the population has a normal distribution. We estimate the margin of error, E, to be: E≈2s/√n We use this formula for the standard score of the sample mean: z = x-µ/σ/√n Inferences about a Population Mean: Choosing between t and Normal Distributions t distribution: or Normal distribution: or IQL Chapter 10 Population standard deviation is not known and the population is normally distributed. Population standard deviation is not known and the sample size is greater than 30. Population standard deviation is known and the population is normally distributed. Population standard deviation is known and the sample size is greater than 30. Page 1 CONFIDENCE INTERVALS USING THE t DISTRIBUTION To specify a confidence interval, we must first calculate the margin of error, E. With a t distribution, the formula is s n Confidence Interval for a Population Mean (μ) with the t Distribution If conditions require use of the t distribution (σ not known and n > 30 or population normally distributed), the confidence interval for the true value of the population mean (μ) extends from the sample mean minus the margin of error to the sample mean plus the margin of error. That is, the confidence interval for the population mean is x where the margin of error is s n and we find t from Table 10.1. IQL Chapter 10 Page 2 Hypothesis Tests Using the t Distribution When the t distribution is used for a hypothesis test of a claim about a population mean (H0: μ = claimed value), the t value plays the role that the standard score z played when we studied these hypothesis tests in Section 9.2. With the t distribution, instead of calculating the standard score z, we use the following formula to calculate t: t x-μ s/ n where n is the sample size, is the sample mean, s is the sample standard deviation, and μ is the population mean claimed by the null hypothesis. 10.2 Hypothesis Testing with the Two – Way Tables LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized in two-way tables IDENTIFYING THE HYPOTHESES WITH TWO VARIABLES Suppose that administrators at a college are concerned that there may be bias in the way degrees are awarded to men and women in different departments. They therefore collect data on the number of degrees awarded to men and women in different departments. These data concern two variables: major and gender. To test whether there is bias in the awarding of degrees, the administrators ask the following question: Do the data suggest a relationship between the two variables? Null and Alternative Hypotheses with Two Variables The null hypothesis, H0, states that the variables are independent (there is no relationship between them). The alternative hypothesis, Ha, states that there is a relationship between the two variables. DISPLAYING THE DATA IN TWO-WAY TABLES With the hypotheses identified, the next step in the hypothesis test is to examine the data set to see if it supports rejecting or not rejecting the null hypothesis. We can display the data efficiently with a two-way table (also called a contingency table), so named because it displays two variables. IQL Chapter 10 Page 3 Two-Way Tables A two-way table shows the relationship between two variables by listing one variable in the rows and the other variable in the columns. The entries in the table’s cells are called frequencies (or counts). CARRYING OUT THE HYPOTHESIS TEST The basic idea of the hypothesis test is the same as always—to decide whether the data provide enough evidence to reject the null hypothesis. For the case of a test with a two-way table, the specific steps are as follows: As always, we start by assuming that the null hypothesis is true, meaning there is no relationship between the two variables. In that case, we would expect the frequencies (the numbers in the individual cells) in the two-way table to be those that would occur by pure chance. Our first step, then, is to find a way to calculate the frequencies we would expect by chance. We next compare the frequencies expected by chance to the observed frequencies from the sample, which are the frequencies displayed in the table. We do this by calculating something called the chi-square statistic (pronounced “ky-square”) for the sample data, which here plays a role similar to the role of the standard score z in the hypothesis tests we carried out in Chapter 9 or the role of the t test statistic in Section 10.1. Recall that for the hypothesis tests in Chapter 9, we made the decision about whether to reject or not reject the null hypothesis by comparing the computed value of the standard score for the sample data to critical values given in tables; similarly, in Section 10.1 we compared computed values of the t test statistic to values found in a table. Here, we do the same thing, except rather than using critical values for the standard score or t, we use critical values for the chi-square statistic. Definition The expected frequencies in a two-way table are the frequencies we would expect by chance if there were no relationship between the row and column variables Computing the Chi – Square Statistic Finding the Frequencies Expected by Chance Computing the Chi-Square Statistic IQL Chapter 10 Page 4 Finding the Chi-Square Statistic Step 1. For each cell in the two-way table, identify O as the observed frequency and E as the expected frequency if the null hypothesis is true (no relationship between the variables). Step 2. Compute the value (O - E)2/E for each cell. Step 3. Sum the values from step 2 to get the chi-square statistic: The larger the value of c2, the greater the average difference between the observed and expected frequencies in the cells. 10.3 Analysis of Variance (One – Way ANOVA) LEARNING GOAL Interpret and carry out hypothesis tests using the method of one-way analysis of variance. HYPOTHESIS TESTING FOR VARIANCE We follow the same general principles laid out for hypothesis testing in Section 9.1. To begin with, we identify the null hypothesis; the mean Flesch scores for all three books are equal. The alternative hypothesis, then, is that the three population means are different. The hypothesis test must tell us whether to reject or not reject the null hypothesis. Rejecting the null hypothesis would allow us to conclude that the books really do have different mean Flesch scores, as we expect. Not rejecting the null hypothesis would tell us that the data do not provide sufficient evidence for concluding that the mean Flesch scores are different. We write the null hypothesis as H0: μClancy = μRowling = μTolstoy We need a hypothesis test that will allow us to determine whether three different populations have the same mean. The method we use is called analysis of variance, commonly abbreviated ANOVA. The name comes from the formal statistic known as the variance of a set of sample values; as we noted briefly in Section 4.3, variance is defined as the square of the sample standard deviation, or s2. IQL Chapter 10 Page 5 Definition Analysis of variance (ANOVA) is a method of testing the equality of three or more population means by analyzing sample variances. One-Way ANOVA for Testing H0: μ1 = μ2 = μ3 = . . . Step 1. Enter sample data into a statistical software package, and use the software to determine the test statistic (F = variance between samples / variance within samples) and the P-value of the test statistic. Step 2. Make a decision to reject or not reject the null hypothesis based on the P-value of the test statistic: • If the P-value is less than or equal to the significance level, reject the null hypothesis of equal means and conclude that at least one of the means is different from the others. • If the P-value is greater than the significance level, do not reject the null hypothesis of equal means. This method is valid as long as the following requirements are met: The populations have distributions that are approximately normal with the same variance, and the samples from each population are simple random samples that are independent of each other. IQL Chapter 10 Page 6