Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter13 Chapter Univariate Statistics Outline 13-1 Introduction 13-2 The Role of Statistics 13-2a Descriptive Statistics 13-2b Inferential Statistics 13-3 Limitations of Statistics in Research 13-4 The Frequency Distribution 13-4a An Overview 13-4b General Comments about Table Entries 13-4c Frequency Distribution with More Than One Frequency Distribution 13-4d Frequency Table with Metric Data 13-5 Graphic Presentations 13-5a The Bar Graph 13-5b The Histogram 13-5c The Pie Chart 13-6 Measures of Central Tendency 13-6a The Mode 13-6b The Median 13-6c The Mean 13-6d Comparing the Mode, Median, and Mean 13-7 Measures of Dispersion 13-7a The Variation Ratio (v) 13-7b The Range 13-7c The Mean Deviation 13-7d The Variance and the Standard Deviation 13-8 Shape of the Distribution and Metric Distributions 13-8a Skewed Distributions 13-8b The Normal Curve 13-8c Standard Scores (the Z Score) Chapter Summary Chapter Quiz Suggested Readings Endnotes Key Terms bar graph central tendency descriptive statistics dispersion frequency distribution histogram inferential statistics mean mean deviation measures of central tendency median mode negatively skewed normal distribution curve pie graph positively skewed range skewed distribution standard deviation standard normal distribution standard score univariate analysis variance variation ratio Z score `247 248 Chapter 13 13-1 Introduction You have now completed several steps in the behavioral research process, such as the literature review, the research plan, and data collection and processing. Now you are ready to analyze your data. This procedure, which includes the calculation of different statistics, can be the most exciting part of the entire research process. You begin to convert raw data and indefinable patterns into explanation and understanding. As you begin to receive signs that your data substantiates your initial expectations, you begin “. . . to sense the excitement of discovery; a thoroughly invigorating and stimulating intellectual experience shared by all scientists” (Cole 1996, 141). Thankfully, a computer’s statistical program, such as SPPSW, will calculate the statistics for you. The calculation, however, is secondary. The more important task is to interpret the statistics so you can see what your data is trying to tell you. Thus, the next three chapters will give you the tools to interpret statistics so you can revel in the excitement of discovery. An understanding of this chapter will enable you to 1. 2. 3. 4. 5. 6. 7. Explain the role of descriptive and inferential statistics. Explain a frequency distribution and describe its characteristics. Understand different ways to present your data. Interpret measures of central tendency. Interpret the measures of dispersion. Describe the types of frequency distributions. Explain the normal curve. 13-2 The Role of Statistics univariate analysis: The analysis of a single variable. Researchers often use frequency tables, bar graphs, or pie charts to complete such an analysis. The role of statistics in political research is a subject of intense debate. Normative theorists see statistics as cold and calculating. They also see the proponents of statistics as more concerned with what it is versus what it should be. Behavioralists, on the other hand, see statistics as another way to analyze and explain political phenomena. Despite the debate, the role of statistics in the social sciences is important. Statistics enable us to see patterns in the data and to describe and interpret observations in ways that help us test theories and hypotheses. In short, statistics are an invaluable tool for the political scientist who seeks to resolve important political questions. The empirical analysis of political questions often involves a mass of quantitative data requiring organization before making any analysis and interpretation. Additionally, before examining the relationship between variables, you must describe the typical case of a variable and determine how typical it really is (Kay 1991). Statisticians call this process univariate analysis. Conversely, when we analyze one variable in relation to another variable, we are conducting bivariate analysis. 13-2a Descriptive Statistics descriptive statistics: The mathematical summary of measurements for a set of data. There are two types of statistics that political scientists use: descriptive statistics and inferential statistics. Descriptive statistics enable political scientists to organize and summarize data. They provide us with the necessary tools to describe quantitative data. Among these summarizing measures are percentages, proportions, means, and standard deviations. Descriptive statistics are especially useful when the researcher finds it necessary to analyze interrelationships between more than two variables. Univariate Statistics 249 13-2b Inferential Statistics Inferential statistics deal with sample data. They enable the researcher to infer properties of a population based on data collected from only a random probability sample of individuals. Inferential statistics have value because they offset problems associated with data collection. For example, the time-cost factor associated with collecting data on the entire population may be prohibitive. That is, the population may be immense and difficult to define. In such instances, inferential statistics can prove to be invaluable to the social scientist. Descriptive and inferential statistics are used in the data analysis process. Data analysis involves noting whether hypothesized patterns exist in the observations. We might hypothesize, for example, that urban legislators are more liberal and supportive of welfare programs than those legislators representing rural constituencies. To test this hypothesis the researcher may ask urban and rural legislators about their views on welfare programs and payments. The researcher then compares the groups and uses descriptive and inferential statistics to find out whether differences between the groups support expectations. In sum, a descriptive statistic is a mathematical summary of measurements for one variable. Inferential statistics, on the other hand, use sample data to make statements about the population. Descriptive and inferential statistics provide explanations for complex political phenomena that deal with relationships between variables. Thus, they are an important tool in the political scientist’s repertoire. 13-3 Limitations of Statistics in Research Statistics cannot resolve every question you have about politics. Therefore, we need to discuss some of the limitations of statistical research. First, statistics do not provide the means for the researcher to prove anything he or she wants to prove. On the contrary, there are explicit procedural guidelines, rules, and decision-making criteria to follow in the statistical analysis of data. As such, statistics cannot make up for the lack of clear, consistent, logical thinking in the development of a body of theory. Second, statistics provide little help in understanding political phenomena that we cannot empirically measure. Some contend, for example, that we cannot measure the critical concept of political power (Bacharach and Baratz 1962). Even when measurement is possible, statistics do not always tell us whether we are measuring what we want to measure. There are, for example, several ways to measure the rate of employment. One possibility is to contact the local unemployment office and find out how many individuals have applied for unemployment benefits. However, what about the few who believe it is beneath them to apply for what they perceive as welfare? And what about those who have dropped out of the job market? A final principal limitation of statistics is that the techniques only allows us to describe and infer trends among groups. They do not provide definite predictions about individual cases. Thus, while statistical techniques may provide guidelines, they do not allow us to reach certain conclusions about individuals (Cole 1996). Knowing that 64 percent of the respondents in a survey favored gun control, for example, does not allow you to say that your neighbor favors gun control. In sum, there are important limits on the value of statistical analysis. There are some political problems you cannot explore statistically. For those questions subject to quantitative analysis, however, statistics may only be a “poor man’s” substitute for controlled laboratory, or true experimental research. Statistics in these inferential statistics: Statistics that enable the researcher to make decisions (inferences) about characteristics of a population based on observations from a random probability sample taken from the population. 250 Chapter 13 cases are only valuable when researchers carefully define the problem, develop ways to measure important concepts, and use a sound research design to collect data. Then, and only then, are statistics helpful in understanding the research question. 13-4 The Frequency Distribution Constructing a frequency distribution will probably be the first step you will take when organizing and presenting information. Properly constructed frequency distributions help summarize a large amount of information while enhancing the interpretation of data. In this chapter you will learn all you need to know about arraying and summarizing single variables. 13-4a An Overview frequency distribution: A tabulation of raw data according to numerical values and discrete classes. A frequency distribution of party identification, for example, shows the number of individuals belonging to a particular political party. As a student you have frequently read research papers, articles, and reports that included descriptive statistics. Government textbooks, for example, present displays of voting results, public welfare expenditures, and characteristics of congressional members. Additionally, media headlines read “President’s Popularity Rises by Three Percent” or “The Dow Stock Market Drops by 150 Points.” The media also inundates you with these statistics in the form of public opinion polls. Whatever the source, most of your exposure has usually been with tabular statistics, or frequency distributions. One step in analyzing and reporting information involves the presentation of frequency distributions of the variables of concern. A frequency distribution is nothing more than a tabulation of raw data according to numerical values and discrete classes. A frequency distribution of party identification, for example, shows the number of individuals belonging to a particular political party. A frequency table is the tabular presentation of a frequency distribution. It should meet certain criteria to be considered presentation quality.1 Let’s examine Table 13-1, which is a distribution of respondents’ political ideology from the 1998 National Opinion Research Center General Social Survey. The presentation of a frequency distribution should include the following: • • Table labels: If there is more than one table included in a report, the tables need a label to distinguish them. Examples include “Table 1,” “Table 2,” or “Table 13-1.” The latter example identifies the first table of Chapter 13. Descriptive title: Researchers must make it clear to the reader what information they are presenting. The title must be as specific as possible. As such, it should include the type of information (Respondents’ Political Ideology), the time (1998), and any other pertinent information (General Social Survey [GSS]). T a b l e 13-1 Respondents’ Ideology, General Social Survey (GSS) 1998 Category Frequency % Liberal 772 28.7 28.7 Moderate 986 36.6 65.3 933 34.7 100.0 2691 100.0 Conservative Totals: Cumulative % Question: Is the respondent a liberal, moderate, or conservative? Source: Data from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey (GSS) for 1998. Univariate Statistics • • • • Clear labels: Tables require clearly labeled columns that enable the reader to see the column and row summaries of the table’s data. Our example includes four columns. Column one contains the variable value (row) labels “Liberal” through “Conservative.” The second column lists the frequencies, or number of cases, for each category of the variable. The third column lists the value percentages. For example, there are 772 respondents who said they were liberal. The percentage column converts the frequency into a percentage of the 2,891 cases, or 28.7 percent. The last column, Cumulative %, is only needed if there are more than two categories associated with the variable. It is simply a “running total” of the category percentages. Appropriate classes: Normally each group should have some entries. Additionally, classes should not be so large that they obscure the range and variation in the data. For example, classes that divided cases into less than twenty, or twenty or more, may find 85 percent of the cases in a single class. This obscures differences in the data. Conversely, a unit-by-unit breakdown such as less than one, or one to two, would be too fine a classification and leave some classes with few cases. When determining the number of classes, you need to consider the needs of your audience and the nature of your data. A totals row: A properly constructed frequency table must include a totals row showing the total number of cases included in the table and the percentage total that will normally add up to 100 percent. We say “normally add up to 100 percent” because there may be a small difference (99.9 or 100.1) due to the rounding of individual category values. Source and question: It is a good idea to specify the source of the data you presented in the table. The source may be the Congressional Record or, as in our example, survey data collected by a national research center. When working with survey data, you should also include the question that describes the variable (Corbett 2001, 135). The source of data and the question, if applicable, should be presented at the bottom of the table. In our example, we used the following from the 1998 National Opinion Research Center General Social Survey: “Is respondent liberal, moderate, or conservative?” 13-4b General Comments about Table Entries When you present percentage values in your tables, be consistent with the decimal places. In other words, don’t use one decimal digit (.1) for some entries and two digits for others (.14). In fact, you should limit yourself to only one decimal digit. If you do use a decimal digit with percentages, make sure you use a decimal digit with whole percentages (62.0, not 62). In addition, don’t put percentage signs after percentages or use horizontal or vertical lines in the table. The use of percentage signs and lines only clutter the appearance of the table. 13-4c Frequency Distributions with More Than One Frequency Distribution On occasion, you may want to present more than one frequency distribution in a single table. A major advantage of such a table is that it makes it convenient to compare frequency distributions for different variables. For example, you may want to compare distributions for attitudes toward spending on varied policy areas or societal problems. Or, as in Table 13-2, you may want to compare responses toward different questions you could use to enhance the validity of a single concept. 251 252 Chapter 13 T a b l e 13-2 Distributions of Attitude toward Divisive Forms of Speech Not Allow Type of speech Allow # % # % Atheism 520 26.4 1451 73.6 Communism 619 31.7 1331 68.3 Sexual orientation 363 18.7 1578 81.3 Military rule 680 34.8 1272 65.2 Racism 714 38.0 1164 62.0 Question: Consider a person who is against/for ________________. If such a person wanted to make a speech in your (city/town/community), should they be allowed to speak or not? Source: Adapted from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey (GSS) for 1998. Note that we labeled the table “13-2” to show that it is the second table included in Chapter 13. Also note that the title specifies the table’s content. At the bottom of the table, we also included the source of data and a question used to operationalize the concept of attitude toward divisive forms of speech. In the table, we presented frequency and percentage distributions for five types of speech that could prove to be divisive. The table also presents the response categories (Not Allow and Allow). Looking at the table, we can easily compare results for the five types of speech. We can readily see, for example, that a greater percentage of the respondents would allow a person who is a homosexual to make a speech in their community (81.3 percent). There is far less tolerance, on the other hand, toward allowing a racist to deliver a speech in their community (62 percent). There are other ways you can present more than one frequency distribution in a single table. Whatever way you decide to use, however, make sure you follow the rules we presented. 13-4d Frequency Table with Metric Data There are also times when you may want to present the frequency results of metric variables. Table 13-3 is an example of such a table. Notice that the table includes only the highest five states and the lowest five states. Note also that the table includes the mean of the distribution (12.9 percent) and the standard deviation of the distribution (4.0 percent). The mean is the average level of poverty for the states. The standard deviation is a measure that expresses the degree of variation within a variable on the basis of the average difference from the mean (Corbett 2001, 294). The smaller the standard deviation, the closer the individual case values will cluster about the mean. We cover these measures in more detail in Section 13-6c and Section 13-7d. 13-5 Graphic Presentations An extension of the frequency distribution occurs when you present distributions in graphic form. Graphs are a convenient way to present data, and they help one to understand the data without reading a table. We limit our discussion to three basic types of graphs: the bar graph, the histogram, and the pie graph. Univariate Statistics 253 T a b l e 13-3 State Poverty Level Highest Five States Rank State 1 New Mexico Percent Below the Poverty Line 25.5 2 Mississippi 20.6 3 Louisiana 20.5 4 Arizona 20.5 5 West Virginia 18.5 Rank State 46 Alaska 47 Nevada 8.1 48 Utah 7.7 49 Indiana 7.5 50 New Hampshire Lowest Five States Percent Below the Poverty Line 8.2 6.4 Mean: 12.9. Standard deviation: 4.0. Source: Data from Percentage of the population below the poverty line (1996). Statistical Abstract of the United States, 1998. 13-5a The Bar Graph When dealing with nominal or ordinal data, Cole recommends that you use a bar graph to present data (Cole 1996, 145). Bars are drawn for each class of the variable so that the height represents the number of cases for each class. Bar graphs are useful when trying to compare categories. Figure 13-1 presents the data considered in Table 13-1 in bar graph format. The visual advantage of data presented in a bar graph format is obvious. The reader can immediately see that there are not as many liberal respondents in the GSS 1998 Survey as there were moderates and conservatives. bar graph: A type of graphic display of a frequency or a percentage distribution of data. One uses bar graphs with discrete data. 13-5b The Histogram The histogram differs from a bar graph in that you do not separate the bars in a histogram. The bars are adjoining to show that the variable consists of continuous 1200 Figure 13-1 1000 Respondents’ Political Ideology, General Social Survey (GSS) 1998 800 Source: Data from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey (GSS) for 1998. 600 400 Frequency histogram: The type of bar graph that is used to depict continuous metric-level measures. 200 0 LIBERAL MODERATE Respondents perceived political ideology. CONSERV. 254 Chapter 13 Figure 13-2 Histogram of Percent of the Population Living in Urban Centers throughout the World Source: Data from The World Almanac and Book of Facts, 1995. 14 12 10 8 6 4 2 0 5.0 15.0 25.0 35.0 45.0 55.0 65.0 75.0 85.0 95.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 People living in cities (%) data. Also, intervals, rather than discrete categories, are depicted along the horizontal axis. While bar graphs are used with nonmetric data, researchers use histograms with metric-level data. Figure 13-2 is a histogram that depicts the extent of urbanization in nations of the world. Let’s take time to examine the graph. The bars represent the categories for the urbanization variable depicted in the histogram. The numbers across the horizontal axis represent the intervals for each category. The first classification will consist of nations having an urbanization rate from 2.5 percent to 7.5 percent. The second classification will consist of nations having an urbanization rate from 7.6 percent to 12.5 percent, and so on. The heights of the bars are proportioned to the number of nations for each class. The higher the bar, the more nations there are within a particular category. The numbers alongside the vertical axis represent the number of nations (cases) included in each category. Continuing our example, there are two nations with an urbanization rate from 2.5 percent up to 7.5 percent, and there are four nations having an urbanization percentage from 7.6 percent to 12.5 percent. 13-5c The Pie Graph pie graph: A type of graphic display of a frequency distribution. Each “slice” of pie represents a category of the variable. The larger the slice of pie in the graph, the more cases for the particular category. A pie graph displays a frequency distribution as a circle (or pie shape) with each category shown as a different-colored slice. The larger the slice, the more cases there are within a particular category. Political scientists use this type of graphic presentation with nominal or ordinal data. Because of numerous categories, pie charts are inappropriate to use with metric data. Can you imagine how many slices of pie you would have with a continuous metric variable such as the one presented in Figure 13-2? Figure 13-3 presents the data considered in Table 13-1 in pie graph format. 13-6 Measures of Central Tendency measures of central tendency: Numbers that represent the principal value of a distribution of data. We commonly refer to these measures as averages. Measures of central tendency include the mode, the median, and the mean. central tendency: The most frequently observed, common, or central value in the distribution of values of a variable. While frequency distributions and graphs help to describe and explain variables, political scientists often want to present their findings more conveniently. Reports dealing with several variables would soon become tedious if you relied solely on the depiction of charts and frequency distributions. Therefore, researchers often summarize data with measures of central tendency. A measure of central tendency is a number that represents the principal value of a distribution of data. We commonly refer to these measures as averages. An average you are probably familiar with is your grade point average, or GPA. Your GPA describes and summarizes your academic performance in college classes. Measures of central tendency include the mode, the median, and the mean. Univariate Statistics 255 Figure 13-3 CONSERV. Respondents’ Perceived Political Ideology, General Social Survey (GSS) 1998 LIBERAL Source: Data from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey (GSS) for 1998. MODERATE Category Liberal Moderate Conservative Totals: Frequency 772 986 933 2691 % 28.7 36.6 34.7 100.0 Cumulative % 28.7 65.3 100.0 13-6a The Mode The mode is a convenient measure to use with nominal data. The mode is the most frequently occurring value in any distribution of data. If a distribution has only one mode, we say the distribution is unimodal. If there are two values that appear most frequently, the distribution is bimodal. Figure 13-4 shows the political party affiliation of members of the 107th session of the U.S. House of Representatives. A close look at the figure shows that the Republican political party was the most common party affiliation of members of the 107th House of Representatives (51.0 percent). While not equal, the distribution also approximates a bimodal distribution. 13-6b The Median The median is the middle item of a set of numbers after ranking the items according to their size (1, 2, 3, . . ., n). For a ranked distribution the median is the score mode: The category of a variable with the greatest frequency of observations. median: The category or value above and below which one-half of the observations lie. (The median is the middle category or value.) Figure 13-4 Political Party Affiliation of Members of the 107th U.S. Congress Source: Data from http://thomas.loc.gov Democrat Republican Party membership Category Democratic Republican Independents Totals: Frequency 211 222 2 Percentage 48.5 51.0 .5 435 100.0 256 Chapter 13 T a b l e 13-4 Hypothetical Distribution of Scores of Ideology Scale of Angelo State University Students, 2001 Student Score Student Score Student Score 1 7 10 5 19 3 2 7 11 5 20 3 3 7 12 5 21 2 4 6 13 4 22 2 5 6 14 4 23 2 6 6 15 4 24 1 7 6 16 4 25 1 8 5 17 4 9 5 18 4 N = 25. Median = 4. Source: Hypothetical of the middle case if there are an odd number of cases. If there is an even number of cases, the median is the value halfway between the two middle cases. In other words, you will have to calculate the average of the two middle cases. As an example, assume that a political science student used a scale to determine the ideological views of twenty-five respondents. The distribution of scores ranging from 1 (liberal) to 7 (conservative) might appear as shown in Table 13-4. Determining the median in Table 13-4 is a simple process if you take the following steps: 1. 2. 3. 4. 5. Rank the numbers (7, 7, 7, 6, 6, 6, 6, . . .1, 1). Determine the number of items in the set (N) = 25. Add 1 to the number of items: 25 + 1 = 26. Divide the result by 2 to determine the middle item: 26/2 = 13. The median is 4, or the response of the thirteenth respondent. This value is no greater than half the distribution (those first twelve students whose scores range from 5 through 7). Additionally, it is no smaller than half the distribution (students 14 through 25 whose scores range from 1 through 4). If student 25 was not included in this sample, there would not be a single middle case. For a data set having an even number of items, the same steps are taken to calculate the median. 1. 2. 3. 4. Rank the numbers. Determine the number of items in the set (N) = 24. Add 1 to the number of items: 24 + 1 = 25. Divide the result by 2 to determine the middle item: 25/2 = 12.5. In principle, the median is the 12.5 item. To determine the value you calculate the average of the values of the twelfth and the thirteenth items (5 + 4 /2 = 4.5). The result represents the median for the sample. Before we leave our discussion about the median, we need to discuss several of its characteristics. First, the median case is always in the middle, and extreme values do not affect the median value. Thus, its interpretive value remains constant. When we discuss the arithmetic mean in Sections 13-6c and 13-6d, we will see how extreme values can detract from the interpretive value of the statistic. Second, although we use every item to determine the median, we do not use their actual Univariate Statistics 257 values in the calculations. At most, we only use the values of the two middle items to calculate the median when we have an even number of cases in our data set. Third, if items do not cluster near the median, the median may not be a good measure of the group’s central tendency. Last, medians usually do not take on values that are not realistic. The median number of children per American families, for example, is two. The median number of children per American families will never have a value of 1.7, for example. 13-6c The Mean The mean is used with metric data. It is the average of a set of numbers. We calculate the mean by summing the observations in a data set and dividing by the number of cases. To illustrate how we can use the mean in political science, let’s analyze the following problem. For the current budget year, a local “Meals for the Elderly” board has limited the agency to serve hot meals to a monthly average of 340 recipients. For the first nine months of 2003 the serving figures were 320, 360, 350, 350, 370, 330, 360, 370, and 340. Based on the first nine months, is the agency meeting its monthly average target? You need to calculate the mean for the first nine months to answer the question. Mean first nine months: 320 + 360 + 350 + 350 + 370 + 330 + 360 + 370 + 340 / 9 months = 3150/9 = 350. Since the mean of 350 exceeds the agency board goal of 340 recipients, it does not appear that the agency can meet the board’s target. You are agency administrator; how many recipients must your agency average over the next three months to meet the target? To answer this question, you need to 1. Determine the number of clients the agency would have to feed for the year if the agency adhered to the board’s edict (340 * 12 = 4080). 2. Determine the number of clients the agency is feeding to date (350 * 9 = 3150). 3. Determine the number of clients the agency can feed over the last three months of the year without exceeding the goal (4080 – 3150 = 930). 4. Determine the number of clients the agency can feed for each month for the remainder of the year (930 * 3 = 310). The mean has several important characteristics. First, we use every item in a group to calculate the mean. Second, unlike the mode, every group of data has one and only one mean. Third, the mean may take on a value that is not realistic. For example, the average American family has exactly 1.7 children. Fourth, an extreme value may have a disproportionate influence on the mean and thus could affect how well the mean represents the data. 13-6d Comparing the Mode, Median, and Mean The three measures of central tendency that we discussed represent univariate distributions. Each, however, has its own characteristics that prescribe and limit its use. The mode is the most common value in any distribution of data. The median is nothing more than the middle item of a set of numbers when one ranks the items in order of size. Last, the mean is the average of a set of values. How does one know, however, when to use the mode, the median, or the mean? Alas, there is not an easy answer to this question. Most statisticians agree, however, that the application of any measure of central tendency depends on the measurement level of the variable being analyzed. Table 13-5 shows that the mean: The sum of the values of a variable divided by the number of values. 258 Chapter 13 T a b l e 13-5 The Hierarchy of Measurement Level of Measurement Measure of Central Tendency Mode Median Metric x x Ordinal x x Nominal x Mean x mode can represent nominal variables such as the distribution of gender or party affiliation. We use the median, on the other hand, with ordinal variables such as classes of attitudes and categories of income. The political researcher uses the mean with metric variables such as age and years of formal education. Table 13-5 also shows that it is permissible to use the measures appropriate for lower levels of measurement with higher-level data. It is not appropriate, however, to use higher-level measures with lower-level data. The mode, for example, can represent income, but the mean cannot represent the distribution of gender. We can use Table 13-6 to illustrate the appropriateness for using measures of central tendency. The data represented in the table is a hypothetical distribution of Government grades made by Jerry Perry, a student of political science, during one agonizing semester. Note that the grades have been arrayed in descending order. The instructor used the ratio level of measurement for the grades. Thus, according to our discussion, the appropriate measure of central tendency to use is the arithmetic mean. When we calculate the mean in the example, our answer is 60 (540/9). Additionally, the median score, the value that occupies the middle position in an array of values, is 55. Both averages are relatively low. Thus, neither Jerry nor his instructor is happy. In addition, Jerry knows his father will be unhappy about his course grades. He does not want to tell his father about the low mean and median values. So he decides that if his father asks about his average in the course, he will give his father the modal value, the most commonly occurring value in an array of data. In this example the mode is 85, which appeals to Jerry. After he tells his father that he had several grades of 85, his father will be happy and Jerry will not incur his father’s wrath. In sum, our example illustrates how the different measures of central tendency can be misleading if used in the wrong context. 13-7 Measures of Dispersion Measures of central tendency are helpful in identifying important characteristics among distributions of data. They accurately reflect the actual values of distributed data when the data closely group about the measures. Conversely, measures of central tendency are less likely to reflect the actual values of all members of a distribution when the data has extreme values. For example, the mean for Jerry Perry’s Government grades was 60. However, the high score was 85 and the low T a b l e 13-6 Comparison of the Measures of Central Tendency 30 35 45 50 55 70 85 ↑ Total of all grades = 540. Number of grades (n) = 9. Mode = 85. Median = (n + 1)/2 = 5th item, 55. Mean = 540/9 = 60. 85 85 Univariate Statistics score was 30. Thus, there is much dispersion between the mean and the extreme scores. Therefore we need some measure of the deviation from the average value to tell us how well the measure of central tendency summarizes the data. Political scientists use measures of dispersion, also known as measures of variability (Corbett 2001, 134), to gain a clearer understanding of a distribution of data. Measures of dispersion are ways to communicate other differences in a set of data. They tell how much the data clusters about the various measures of central tendency. Suppose that researchers measured constitutional knowledge for a national sample of registered voters. The mean that we calculate for the test is 70 out of a possible 100. While we want to know the average score, we also want to know how much variation there is in the scores. In other words, how reflective is the mean in describing the distributions of scores? Did the majority of the respondents get a score of 70? Was there a bimodal distribution with one group of students attaining low scores and another group attaining high scores that averaged out to 70? Or was there a normal distribution of scores? We use measures of dispersion to answer these questions. 259 dispersion: The distribution of data values around the most common, middle, or average value. 13-7a The Variation Ratio (v) The variation ratio is useful when analyzing nominal data. It is simple to calculate and easy to understand. Specifically, the variation ratio tells the political scientist the degree to which the mode satisfactorily represents a particular frequency distribution. The formula for v is Formula: v = 1 – Number of cases in the modal category Total number of cases By analyzing the formula one can see that if all cases in a distribution fell into the modal class, the value of v would be 0. Thus, the lower the v score, the more representative the mode of all cases in the distribution. As an illustration, let’s examine the distribution of political ideology and party identification as shown in Table 13-7. The variation ratio for Republicans (.46) suggests that the mode is a better representation of ideology for Republicans than for the other groups. The variation ratio also shows that the mode is a less satisfactory summary of ideology for those respondents who say they are Independent (.70). Put another away, the Independent respondents varied more in their ideological orientations. Thus, one should be careful of reporting the mode as representative of the ideological orientation of all Independents in this example. T a b l e 13-7 Distribution of Political Ideology and Party Identification, 2001 Political Ideology Democrat Independent Liberal 360 160 80 Libertarian 220 260 180 Conservative 180 280 540 Populous Totals v computation v Source: Hypothetical 240 1000 1 – 360/1000 .64 300 1000 1 – 300/1000 .70 Republican 200 1000 1 – 540/1000 .46 variation ratio: The variation ratio tells the political scientist the degree to which the mode satisfactorily represents a particular frequency distribution. 260 Chapter 13 T a b l e 13-8 Hypothetical Distribution of Scores of Ideology Scale of Angelo State University Students, 2001 Student Score Student Score Student Score 1 6 10 5 19 3 2 6 11 5 20 3 3 6 12 5 21 2 4 6 13 4 22 2 5 6 14 4 23 2 6 6 15 4 24 2 7 6 16 4 25 2 8 5 17 4 9 5 18 4 N = 25. Median = 4. Range = 4 (6 – 2). Source: Hypothetical 13-7b The Range range: The distance between the highest and lowest values or the extent of categories into which observations fall. mean deviation: A measure of dispersion of data points for metric-level data. It is the mean of differences between each value in a distribution and the mean of the distribution. The range is a useful measure when the researcher is working with ordered or ranked data (Cole 1996, 162). Thus, it is useful with ordinal data and when considering the degree to which the median satisfactorily represents a particular frequency distribution. The range is simply the difference between the largest value and the smallest value in a distribution. The smaller the range, the more accurate, or representative, is the median score of all values in the distribution. In Table 13-4, our example of the median, we presented a hypothetical distribution of ideological orientation scores for twenty-five Angelo State University students. The median in our example is 4 and the range is 6 (7 – 1). The responses of another sample of twenty-five students using the same seven-point scale appear in Table 13-8. In this example, the median is still 4, but the range is 4 (6 – 2). A comparison of the measures presented in the two tables shows that there is greater homogeneity in the responses of the second group of students. The range is easy to calculate and has utility as a measure of dispersion. While we could use the range with metric data, it is not wise to do so because extreme values in a distribution could influence the range, thus giving a misleading impression of variation. Consider, for example, that there is one individual with a doctorate degree in the community you sample. If you randomly choose twentyfive persons to sample, there is an excellent chance that he or she will probably not be included. But, for the sake of this example, suppose you do include the individual in your sample. The range in education levels will then be extremely large and very misleading as a measure of dispersion. In addition, if we use the range as a measurement of dispersion with metric data, we do not know anything about the variability of scores between the two extreme values except that the scores do lie somewhere within the range. Consequently, you should avoid using the range with metric data. 13-7c The Mean Deviation The mean deviation is useful when analyzing metric data. Simply put, the mean deviation is the average difference between the mean and all other values in the distribution. The mean deviation makes use of every observation in the distribu- Univariate Statistics 261 tion. One computes this measure by taking the difference between each observation in the distribution and the mean. Summing these deviations is the next step. (Note: When summing, ignore negative signs. Otherwise, the sum would always be zero.) Last, divide the sum by the number of observations. Arithmetically, the mean deviation is expressed as Mean deviation = Σ Xi − x n where Σ = the sum of. Xi = each individual observation. X = the mean of all of the observations. n = number of observations. | | = absolute difference (ignore signs). Table 13-9 illustrates the calculation of the mean deviation for the percentage of the total vote George W. Bush and Dick Cheney received in the southeastern states. The results show that, on the average, the percentage of the total vote received by the Bush/Cheney ticket in each southeastern state in the 2000 election differs from the mean vote for all southeastern states by 2.7 percent. 13-7d The Variance and the Standard Deviation While the mean deviation has a more direct intuitive interpretation than other measures of deviation that we can use with metric data, the measure has fewer useful statistical properties than those measures (Blaylock 1979, 78). As such, political researchers do not use this measure very often. We have discussed it largely as a way to enhance your understanding about measures of dispersion and as a prelude to our discussion of other metric measures of dispersion. One such measure is the variance. The variance uses the mean deviation in its calculation. When you calculate the variance, however, you square the differences between each observation and the mean. Next, you sum the squares and divide the T a b l e 13-9 Percent of Vote for Bush/Cheney in Southeastern States in the 2000 Presidential Election (Mean Deviation) Mean = 54.0% State % Vote |Xi–X| Alabama 56.5 2.5 Arkansas 51.3 2.7 Florida 48.8 5.2 Georgia 55.0 1.0 Kentucky 56.4 2.4 Louisiana 52.6 1.4 Mississippi 57.6 3.6 South Carolina 56.9 2.9 Tennessee 51.2 2.8 Total Mean Deviation = 24.5/9 = 2.7. Source: Adapted from National Archives and Records Administration. 24.5 variance: Another measure of dispersion of data points about the mean for metric-level data. It is a measure of how spread out a distribution is. 262 Chapter 13 result by the number of cases. The formula for the calculation of the variance looks very much like the formula for the calculation of the mean deviation: Variance (s2 ) = Σ ( Xi – X )2 n where Σ = the sum of. Xi = the summation of the value of each individual observation. X = the mean of all of the observations. n = number of observations. standard deviation: The most common measure of dispersion of data points about the mean of metric-level data. It is athe square root of the variance. standard score: An individual observation that belongs to a distribution with a mean of 0 and a standard deviation of 1. See Z score. The standard deviation is probably the most common measure of dispersion for metric data. Like the variance, the basis for standard deviations is the squared differences between every item in a data set and the mean of that set. In fact, you simply take the square root of the variance to calculate the standard deviation. Similar to the other measures of dispersion we discussed, the smaller the standard deviation in a set of data, the more closely the data cluster about the measure of central tendency. We won’t trouble you with the reasoning here, but the standard deviation is a stable measure of dispersion from sample to sample. Political scientists use standard deviations with the normal curve to determine where scores or observations cluster about the mean and to determine a standard score. While we can use either the variance or the standard deviation to indicate the amount of variation within a metric-level variable, we usually use the standard deviation. To see the utility of the two measures, let’s examine Table 13-10. The table is similar to Table 13-9 in that it depicts the percent of vote for Bush/Cheney in southeastern states in the 2000 presidential election. It differs, however, in that it illustrates the calculation of the variance and the standard deviation for the same data set. For this example, the variance is 8.7 and the standard deviation is 2.96. The lower the variance/standard deviation, the more accurately does the mean represent all the scores of all cases in a distribution of metric-level data. T a b l e 13-10 Percent of Vote for Bush/Cheney in Southeastern States in the 2000 Presidential Election (Variance and Standard Deviation) Mean = 54.0% State (X i – X)2 % Vote (X i – X) Alabama 56.5 2.5 Arkansas 51.3 2.7 7.3 Florida 48.8 5.2 27.0 Georgia 55.0 1.0 1.0 6.3 Kentucky 56.4 2.4 5.8 Louisiana 52.6 1.4 2.0 Mississippi 57.6 3.6 13.0 South Carolina 56.9 2.9 8.4 Tennessee 51.2 Total Variance (s2) = 78.6/9 = 8.7. Standard deviation (s) = /8.7 = 2.95. Source: Adapted from National Archives and Records Administration. 2.8 7.8 24.5 78.6 Univariate Statistics 263 T a b l e 13-11 Per Capita Income for Selected States State Mean Income Standard Deviation Florida 22,916 500 Texas 20,654 3000 Colorado 23,449 600 New York 26,782 550 New Mexico 18,055 1850 Illinois 24,763 625 Source: Hypothetical The standard deviation also helps us to understand the distribution of the values for a particular metric-level variable and can be helpful when we are comparing two or more groups of cases (Corbett 2001, 140). To illustrate our point, let’s examine Table 13-11. The table shows the hypothetical per capita income (PCI) for samples of citizens of several states. It also shows the standard deviation for the PCI in the states. Let’s analyze the results. The table shows that the income is distributed in very different ways in Florida and in the other states. In Florida, Colorado, New York, and Illinois, people’s incomes are fairly close together. In other words, there is not a great deal of income inequality. That is why the standard deviations are relatively low. In New Mexico, however, there is greater income inequity. And in Texas, as evidenced by the relatively high standard deviation, there is a great deal of income inequity when compared to the other states. In summary, the standard deviation and the variance show us how much variation there is within a metric variable. For a variable, when there is very little difference from one case to another, these statistics will be low. Conversely, when there is a great deal of diversity among the cases for a variable, these statistics will be high. As we discuss later in Section 13-8b of this chapter, when the distribution of values of a variable approaches a normal distribution, the standard deviation tells us even more. 13-8 Shape of the Distribution and Metric Distributions Up to this point we have discussed measures of central tendency and measures of dispersion as ways to examine data distributions. In the past, political scientists also analyzed the shape of distributions by constructing a frequency polygon. To do this, they would connect the midpoints of the top of each bar of a histogram with a solid line. The shape of a distribution was a function of the distribution. Those distributions having most of their case scores above the mean had a different shape from those having a large proportion of scores below the mean. frequency polygon: A graph resulting from the connection of the midpoints of the top of each bar of a histogram with a solid line. 13-8a Skewed Distributions Three possible shapes can result when drawing frequency polygons of metric data distributions. Figure 13-5 shows the first two shapes. Each shape represents a skewed distribution. This means that in both instances there are more extreme scores in one direction or the other. In the first instance, there are more extreme low scores than extreme high scores. This is a negative, or left, skewed distribution. You can see that the mean is pulled in the directio of the lower scores. If you are analyzing the distribution of Anglo residents for the United States, the distributions will be negatively skewed because Anglos are in the minority only in Hawaii. The second shape shows the impact of the many extreme high scores in skewed distribution: A data distribution in which more observations fall to one side of the mean than the other. Thus, the mean is “pulled” toward the extreme low (negative skew) or extreme high (positive skew). negatively skewed: A distribution of values in which more observations lie to the left of the middle value. 264 Chapter 13 14 30 12 10 20 8 6 10 4 2 0 35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0 1992: PERCENT WHITE (SA,1996) Mean = 83.9%. Median = 87.4%. 0 0.0 100000.0 200000.0 300000.0 400000.0 500000.0 50000.0 150000.0 250000.0 350000.0 450000.0 550000.0 1990: LAND AREA IN SQUARE MILES Median = 54,125 miles. Mean = 70,724 miles. (a) (b) Figure 13-5 Skewed Distributions and the Normal Curve Source: Statistical Abstract of the United States, 1996. positively skewed: A distribution of values in which more observations lie to the right of the middle value. normal distribution curve: A frequency curve showing a symmetrical, bell-shaped distribution in which the mean, mode, and median coincide and in which a fixed proportion of observations lies between the mean and any distance from the mean measured in terms of the standard deviation. the distribution. This shape represents a positive, or right, skewed distribution because the mean is pulled in the direction of the higher scores. If you are examining the land area for the United States, the distribution will be positively skewed because of Texas and Alaska. While political researchers, in the past, drew frequency polygons to get a sense of the shape of a distribution, today many statistical packages allow us to compare metric distributions with the normal curve. Again, consider Figure 13-5. The figure depicts two distributions with a normal distribution curve superimposed on the polygons. If the distributions were normal distributions, the bars would touch the curves. As you can readily see, there are several bars that do not reach the curves and there are several bars that extend beyond the curves. Thus, each distribution is a skewed distribution. Also note that the median value is greater than the value of the mean in Graph (a), while the mean is greater than the median value in Graph (b). Therefore, Graph (a) depicts a negative (left) skew and Graph (b) illustrates a positive (right) skew. 13-8b The Normal Curve A symmetrical distribution is the third shape you can obtain when constructing a frequency polygon. The third shape one can obtain is a symmetrical distribution. Figure 13-6 is a depiction of the normal distribution curve. The normal curve, a special type of symmetrical distribution, is very valuable in statistics because it has several important properties. First, the curve is symmetrical and bell-shaped. Second, the measures of central tendency coincide at the center of the distribution. In other words, the values of the measures are equal. Third, the curve is based on an infinite number of observations. The last property of the normal curve that we discuss, however, is probably its most distinctive characteristic. In any normal distribution, a fixed proportion of the observations lie between the mean and fixed units of standard deviations. To help you understand why this property is so important, let’s examine Figure 13-7. The percentages can be seen in Figure 13-7. The mean of the distribution divides the curve exactly in half. Note that a little more than 34 percent of all cases fall between the mean and one standard deviation above the mean. Additionally, a little more than 34 percent of the cases fall between the mean and one standard deviation below the mean. Thus, slightly more than 68 percent of all cases in a Univariate Statistics 265 Figure 13-6 The Normal Distribution -4 -3 -2 -1 0 1 2 3 4 normal distribution lie within one standard deviation (plus or minus) of the mean. Similarly, more than 13.5 percent of all cases fall between one standard deviation and two standard deviations above the mean and between one standard deviation and two standard deviations below the mean. Therefore, more than 95 percent of all cases in a normal distribution lie within a plus or minus two standard deviations of the mean. Continuing the analysis you can see that almost all of the cases (99.74 percent, to be exact) will fall within a plus or minus three standard deviations of the mean. Consequently, the standard deviation used with the normal curve can be a very important tool in the political scientist’s repertoire. It is important because the researcher can determine the proportion of observations included within fixed distances of the mean. For example, assume that the public’s rating of a particular welfare program rated on a scale of 0 to 100 has a normal distribution. Additionally, the distribution has a mean of 50 and a standard deviation of 10. Based on this information we can conclude that more than 68 percent of the public assigns the program a rating between 40 and 60 (±1 standard deviations from the mean). Additionally, more than 95 percent assigned the program a rating between 30 and 70 (±2 standard deviations from the mean). Last, almost everyone in the survey Figure 13-7 .3413 .3413 .1359 Areas under the Normal Curve .1359 .0215 -3 Source: Adapted from http://davidmlane.com/hyperstat/ normal_distribution.html. .0215 -2 -1 0 1 2 3 If the mean is 50 and the standard deviation is 10 the scores are as follows: 20 30 40 50 60 70 80 (2s) (3s) (-3s) (-2s) (-s) Mean (s) 266 Chapter 13 assigned the program a rating between 20 and 80 (±3 standard deviations from the mean). 13-8c Standard Scores (the Z Score) When rating the proportion of observations within a desired interval, the political scientist should express observations in units of standard deviation. For example, you can use Figure 13-7 to determine the percentage of cases rating the welfare program from the mean of 50 to 75. To do so you have to determine how many standard deviations the rating of 75 lies from the mean of 50. Political scientists calculate the Z score to accomplish this task. The formula for Z is Z = X – X s where Z score: The number of standard deviations that a score deviates from the mean in a standardized normal distribution. See standard score. standard normal distribution: A normal distribution having a mean of 0 and a standard deviation and variance of 1. Z = the standard score. X = the value of any observation. X = the mean. s = the standard deviation. The Z score tells us the number of standard deviations that the score lies above or below the mean. If we apply the formula to the example discussed, the Z score is Z = 75 – 50 = 2.50 10 In our example, we find that the score of 75 is 2.5 standard deviation units above the mean. Intuitively, this should make sense. Recall that we showed that a score of 70 is 2 standard deviation units above the mean, and a score of 80 is 3 units of standard deviation above the mean. Thus, a score of 75 had to fall between 2 and 3 standard deviation units. To carry our analysis further, Figure 13-7 shows that 47.72 percent of the cases lie between the mean and 2 standard deviation units above the mean. The figure also shows that 49.87 percent of the cases lie between the mean and 3 units of standard deviation above the mean. Thus, if a 75 rating is 2.5 standard deviation units above the mean, somewhere between 47.72 and 49.87 percent of the public assigned the program a score between 50 and 75. We will need to use the standard normal distribution table to determine the exact percentage. Table 13-12 depicts selected sections of the standard normal distribution table. In other words, it is a partial Z table to be used with the examples in this book. Take the following steps to use the table: 1. Scan the far-left column to find the first two digits of the Z value. In our case, 2.5. 2. Under the numerical column headings find the third digit of the Z value. In our case, .00. 3. Extend both the column and the row until they intersect. The value that you find at the point of intersection is the proportion of cases that lie between the mean and 2.5 standard deviations above the mean. In our case, .4938. This means that 49.38 percent of the public assigned the welfare program a rating between 50 and 75. Univariate Statistics 267 T a b l e 13-12 Selected Sections of the Standard Normal Distribution Z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359 0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224 1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621 1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441 2.0 .4772 .4778 .4783 .4788 .4793 .4798 .4803 .4808 .4812 .4817 2.5 .4938 .4940 .4941 .4943 .4945 .4946 .4948 .4949 .4951 .4952 3.0 .4987 .4987 .4987 .4988 .4988 .4989 .4989 .4989 .4990 .4990 Notes: 1. This is a partial Z table to be used with the examples in this book. 2. An entry in the table is the proportion under the entire curve that is between Z = 0 and a positive value of Z. Areas for negative values of Z are obtained by symmetry. 3. To obtain the percentage of cases from the mean, multiply the cell entry by 100 (.4938 * 100 = 49.38%). To conclude our discussion about the normal curve and Z scores, let’s look at another illustration. Suppose you want to determine the percentage of the public assigning the program a rating from 0 to 75. Before you begin to plug figures into the formula just presented, there is a quicker way to determine the answer. Simply add .50 to the percentage associated with the Z value we just calculated (.4938). We do this because the normal curve assumes that 50 percent of the cases will lie on either side of the mean. Thus, we conclude that 99.38 percent of the public would rate the program from 0 to 75. Chapter Summary Chapter Summary In this chapter we examined some important tools to use in the preliminary stage of data analysis. For example, sophisticated computer programs summarize data as frequency distributions. These distributions depict a number of case (N) and number of cases by class, percentages, and, perhaps, cumulative percentages. These techniques help the political scientist to assess the weight of a single class in relation to other classes of a distribution or distributions. Additionally, political scientists use measures of central tendency to describe the distribution’s main characteristics. These measures help the researcher answer questions such as “What is the typical party identification of respondents?” or “What is the average level of income of the group?” Measures of central tendency, however, can be misleading if not accompanied by measures that describe the amount of dispersion in the distribution. While measures of central tendency reflect a group’s typical characteristic, measures of dispersion depict the extent of variance from the typical value, or average. The dispersion measures show how many members of the group deviate from the typical and the extent of their deviation. A small deviation shows that most responses cluster around the measure of central tendency, suggesting a homogeneous group. Large deviations, on the other hand, suggest that the measure of central tendency is a poor representation of the distribution. Another important step in examining a distribution is the identification of its general form. For example, the shape of frequency polygons may show that extreme scores in the distribution may affect the measure of central tendency. Or the form may be symmetrical or even normal, because there are no extreme scores affecting the shape of the distribution. If this is the case, there are a fixed proportion of observations lying between the mean and fixed units of standard deviation. The measures discussed throughout this chapter help the political scientist understand data distributions. Analyzing these descriptive statistics, however, is only the first step in data analysis. Once summarized, researchers often want to discover relationships between variables. We turn to this issue in the next chapter. 268 Chapter 13 Chapter Quiz Chapter Quiz 1. Consider these scores: 0, 3, 1, 5, 1. The mean is a. 2. b. 2.5. c. 3. d. None of choices a through c is correct. 2. In a symmetric, unimodal distribution, a. the median equals the mean. b. the mode equals the median. c. the mean equals the mode. d. Each of choices a through c is correct. 3. The number of mean years of the GSS variable AGE1STBRN is higher than the median. The variable measures the respondent’s age when their first child was born. So we know that the distribution of respondent’s age when their first child was born is a. negatively skewed. b. normal. c. bimodal. d. positively skewed. 4. The standard deviation measures deviation around the a. mode. b. median. c. mean. d. variance. 5. The number of standard deviations a score lies from the mean in a normal standard distribution is a. the case’s Z score. b. the standard error. c. the confidence interval. d. None of choices a through c is correct. 6. The ________________________ is the only measure of central tendency that can properly be used with nominal data. a. mode b. median c. mean d. standard deviation 7. Political scientists use measures of _______________ to describe the distribution’s average characteristics. a. dispersion b. association c. central tendency d. statistical significance 8. Measures of __________________________depict the extent of variance from the typical value, or average value. a. dispersion b. association c. central tendency d. statistical significance 9. The basis for the standard deviation and _____________ is the squared differences between every item in a data set and the mean of that set. a. mode b. median c. mean d. variance 10. The _________________________________ is the average difference between the mean and all other values in the distribution. a. mode b. standard deviation c. variance d. mean deviation Univariate Statistics 269 Suggested Readings Suggested Readings Bernstein, Robert A. and James A. Dyer. An Introduction to Political Science Methods, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1992. Blaylock, Hubert M., Jr. Social Statistics, 2nd ed. New York: McGraw-Hill, 1979. Cole, Richard L. Introduction to Political Science and Policy Research. New York: St. Martin’s Press, 1996. Corbett, Michael. Research Methods in Political Science: An Introduction Using MicroCase, 4th ed. Belmont, CA: Wadsworth, 2001. Davis, Richard and Diana Owen. New Media and American Politics. New York: Oxford University Press, 1998. Fox, William. Social Statistics, 3rd ed. Bellevue, WA: MicroCase, 1998. Endnote Endnote 1. See Corbett, Michael. Research Methods in Political Science: An Introduction Using MicroCase, 4th ed. Belmont, CA: Wadsworth Publishers, 2001, for a succinct presentation of how to present frequency tables. Frankfort-Nachmias, Chava and David Nachmias. Research Methods in the Social Sciences, 6th ed. New York: Worth Publishers, 2000. Johnson, Janet Buttolph, Richard A. Joslyn, and H. T. Reynolds. Political Science Research Methods, 4th ed. Washington, D.C.: Congressional Quarterly Press, 2001. Kay, Susan Ann. Introduction to the Analysis of Political Data. Englewood Cliffs, NJ: Prentice-Hall, 1991. Leedy, Paul D. and Jeanne Ellis Ormrod. Practical Research: Planning and Design, 7th ed. Upper Saddle River, NJ: Merrill Prentice Hall, 2001.