Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Student Resource Sheet Choosing a Significance Test Student Resource Sheet Choosing Your Test Choosing an appropriate type of significance test is a very important consideration in analyzing data. If an inappropriate test is used, the analysis will not only be meaningless, but also misleading. In order to determine which type of statistical test is most appropriate for your analysis, consider the following three questions: 1. What type of data are you analyzing? 2. How many variables are being measured? 3. How many groups are involved? Type of Data Is your data categorical or quantitative? Sometimes the answer to that question is as easy as asking if the data consists of words (categorical) or numbers (quantitative), but the distinction isn’t always so clear. For example, a high school student’s grade level could be thought of as categorical (freshman, sophomore, junior, senior), or it could be considered quantitative (9, 10, 11, 12). The distinction depends on how you, the researcher, will use the data. If you are interested in the proportion of the sample falling into each category, treat the data as categorical. If, on the other hand, you are interested in calculating the average grade level of the students in your sample or you want to know how grade level is related to GPA, treat the data as quantitative. Table 1: Determining the type of data Categorical You will analyze how many subjects fall into each category, or what percent fall into each category. Usually not numeric, but can be numeric Examples: Political preference Gender Favorite movie genre Quantitative You will analyze averages, or you will analyze the correlation between two variables. Always numeric Examples: SAT scores Height Cell phone minutes used per month Number of Variables How many variables are being measured? You may have collected data comprised of dozens of variables, but when conducting a significance test, you will not necessarily use all variables in a single test. Focus on the specific question the significance test is answering. For example, if you conducted a survey in which you asked each subject their gender, ethnicity, religious affiliation, political preference, age, height, grade level, GPA, SAT scores, favorite movie genre, and number of cell phone minutes used last month, you have collected data comprised of eleven variables. Before conducting a significance test, you need to focus on a specific question, and then determine how many variables are needed to answer that question. As is often the case in statistics, the answer is not always clear cut. Student Resource Sheet Table 2: Determining the number of variables Question How is gender related to political views? Is there a correlation between GPA and SAT scores? Do males tend to receive higher SAT scores than females? Number of Variables Two categorical variables (gender, political preference) Two quantitative variables (GPA, SAT score) Two variables, one categorical (gender) and one quantitative (SAT score), or One quantitative variable (SAT math score) across two groups (male and female) Number of Groups Another important consideration when deciding which type of significance test is most appropriate is the number of groups involved. Is the analysis being done on a single group of subjects, or are two or more groups being compared? Comparative studies, those that show how one group differs from another, are often more relevant than single‐sample studies. If you intend to compare multiple groups, you may choose to sample multiple sub‐populations, or you may choose to sample from one large population, and then break the sample into sub‐groups. For example, if you want to conduct a study comparing teenage pregnancy rates among different socio‐economic groups, you could sample several students from each socio‐economic group separately, or you could take one large sample and then separate those subjects into different socio‐economic groups. Is the Test Valid? Once you have determined which type of test is appropriate for your analysis, it is necessary to determine whether the test will yield valid results for your particular set of data. Each of the following significance tests involves calculations that are based on certain assumptions and conditions. If the necessary assumptions and conditions are not met, then the results of the test will not be valid, and the test should not be used. This can often be overcome, however, by increasing your sample size. 1. The One‐Proportion z‐Test Note: the term “success” refers to the category of interest. For example, if you are conducting a one‐proportion z‐test to determine if more than half of the sample like horror films, then liking horror films is considered a “success” and not liking horror films is considered a “failure.” Student Resource Sheet 2. The Two‐Proportion z‐Test Note: the term “success” refers to the category of interest. For example, if you are conducting a two‐proportion z‐test to determine if more males than females like horror films, then liking horror films is considered a “success” and not liking horror films is considered a “failure.” 3. The Chi‐Square Goodness‐of‐Fit Test Note: To calculate the expected counts, multiply the sample size by the expected percentage. For example, if you are conducting a chi‐square goodness‐of‐fit test to determine if the distribution of AP scores in your high school differs from the national distribution of AP scores, you would multiply the sample size by the percent of the nation scoring a 5, then by the percent of the nation scoring a 4, etc. This would give the expected counts. 4. The Chi‐Square Test for Homogeneity Note: To calculate the expected counts, first organize the data in a two‐way table. For each cell, multiply the row total by the column total, and then divide by the table total. This gives the expected count for each cell. 5. The Chi‐Square Test for Independence Note: To calculate the expected counts, first organize the data in a two‐way table. For each cell, multiply the row total by the column total, and then divide by the table total. This gives the expected count for each cell. Student Resource Sheet 6. The One‐Sample t‐Test Note: If the sample is sufficiently large (30 or more), the data need not be approximately normal. If you have a moderate sample size (between 15 and 30), construct a modified box plot and verify that there are no strong outliers. If not, you may proceed with the test. If the sample size is small (less than 15), you must not only verify that there are no outliers, but also construct a histogram and verify that the shape of the distribution is approximately symmetric. Alternatively, you could construct a normal probability plot and verify that the plot is linear. 7. The Paired t‐Test Note: If the sample is sufficiently large (30 or more), the data need not be approximately normal. If you have a moderate sample size (between 15 and 30), construct a modified box plot and verify that there are no strong outliers. If not, you may proceed with the test. If the sample size is small (less than 15), you must not only verify that there are no outliers, but also construct a histogram and verify that the shape of the distribution is approximately symmetric. Alternatively, you could construct a normal probability plot and verify that the plot is linear. 8. The Two‐Sample t‐Test Note: If the sample is sufficiently large (30 or more), the data need not be approximately normal. If you have a moderate sample size (between 15 and 30), construct a modified box plot and verify that there are no strong outliers. If not, you may proceed with the test. If the sample size is small (less than 15), you must not only verify that there are no outliers, but also construct a histogram and verify that the shape of the distribution is approximately symmetric. Alternatively, you could construct a normal probability plot and verify that the plot is linear. Student Resource Sheet 9. The One‐Way ANOVA (Analysis of Variance) F‐test Note: To determine whether the groups have similar variances, construct a box plot for each group and compare the interquartile ranges. They need not be exactly the same, but should be similar. To determine whether the data for each group are approximately normal, examine the box plot for each group to verify there are no outliers, and examine the histogram for each group to verify the shape is approximately symmetric. Or, for each group, subtract the group mean from the data (producing residuals) and verify normality with a normal probability plot. 10. The Linear Regression t‐test Note: To determine whether the true relationship between the variables is linear, it is sufficient to verify that the scatter plot of the data is roughly linear. To determine whether the variance is equal for all values of x, graph the least‐squares regression line along with the scatter plot and verify that the distances between each point and the line are roughly the same. To determine if the data come from a normal population, it is sufficient to verify that the residuals are approximately normal (this can be accomplished by graphing a histogram or normal probability plot of the residuals). 11. Multiple Regression Note: To determine whether the true relationship between the variables is linear, it is sufficient to verify that the scatter plots of the data (each x against the y) is roughly linear. To determine whether the variance is equal for all values of x, graph the least‐squares regression line along with the scatter plots and verify that the distances between each point and the line are roughly the same. To determine if the data come from a normal population, it is sufficient to verify that the residuals are approximately normal (this can be accomplished by graphing a histogram or normal probability plot of the residuals).