Download Jones ch13.qxd - Angelo State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Chapter13
Chapter
Univariate Statistics
Outline
13-1 Introduction
13-2 The Role of Statistics
13-2a Descriptive Statistics
13-2b Inferential Statistics
13-3 Limitations of Statistics in Research
13-4 The Frequency Distribution
13-4a An Overview
13-4b General Comments about Table Entries
13-4c Frequency Distribution with More Than One Frequency Distribution
13-4d Frequency Table with Metric Data
13-5 Graphic Presentations
13-5a The Bar Graph
13-5b The Histogram
13-5c The Pie Chart
13-6 Measures of Central Tendency
13-6a The Mode
13-6b The Median
13-6c The Mean
13-6d Comparing the Mode, Median, and Mean
13-7 Measures of Dispersion
13-7a The Variation Ratio (v)
13-7b The Range
13-7c The Mean Deviation
13-7d The Variance and the Standard Deviation
13-8 Shape of the Distribution and Metric Distributions
13-8a Skewed Distributions
13-8b The Normal Curve
13-8c Standard Scores (the Z Score)
Chapter Summary
Chapter Quiz
Suggested Readings
Endnotes
Key Terms
bar graph
central tendency
descriptive statistics
dispersion
frequency distribution
histogram
inferential statistics
mean
mean deviation
measures of central tendency
median
mode
negatively skewed
normal distribution curve
pie graph
positively skewed
range
skewed distribution
standard deviation
standard normal distribution
standard score
univariate analysis
variance
variation ratio
Z score
`247
248
Chapter 13
13-1 Introduction
You have now completed several steps in the behavioral research process, such as
the literature review, the research plan, and data collection and processing. Now
you are ready to analyze your data. This procedure, which includes the calculation
of different statistics, can be the most exciting part of the entire research process.
You begin to convert raw data and indefinable patterns into explanation and
understanding. As you begin to receive signs that your data substantiates your initial expectations, you begin “. . . to sense the excitement of discovery; a thoroughly
invigorating and stimulating intellectual experience shared by all scientists” (Cole
1996, 141).
Thankfully, a computer’s statistical program, such as SPPSW, will calculate the
statistics for you. The calculation, however, is secondary. The more important task
is to interpret the statistics so you can see what your data is trying to tell you. Thus,
the next three chapters will give you the tools to interpret statistics so you can revel
in the excitement of discovery.
An understanding of this chapter will enable you to
1.
2.
3.
4.
5.
6.
7.
Explain the role of descriptive and inferential statistics.
Explain a frequency distribution and describe its characteristics.
Understand different ways to present your data.
Interpret measures of central tendency.
Interpret the measures of dispersion.
Describe the types of frequency distributions.
Explain the normal curve.
13-2 The Role of Statistics
univariate analysis: The analysis
of a single variable. Researchers
often use frequency tables, bar
graphs, or pie charts to complete
such an analysis.
The role of statistics in political research is a subject of intense debate. Normative
theorists see statistics as cold and calculating. They also see the proponents of statistics as more concerned with what it is versus what it should be. Behavioralists, on
the other hand, see statistics as another way to analyze and explain political phenomena. Despite the debate, the role of statistics in the social sciences is important. Statistics enable us to see patterns in the data and to describe and interpret
observations in ways that help us test theories and hypotheses. In short, statistics
are an invaluable tool for the political scientist who seeks to resolve important
political questions.
The empirical analysis of political questions often involves a mass of quantitative data requiring organization before making any analysis and interpretation.
Additionally, before examining the relationship between variables, you must
describe the typical case of a variable and determine how typical it really is (Kay
1991). Statisticians call this process univariate analysis. Conversely, when we
analyze one variable in relation to another variable, we are conducting bivariate
analysis.
13-2a Descriptive Statistics
descriptive statistics: The
mathematical summary of
measurements for a set of data.
There are two types of statistics that political scientists use: descriptive statistics
and inferential statistics. Descriptive statistics enable political scientists to organize and summarize data. They provide us with the necessary tools to describe
quantitative data. Among these summarizing measures are percentages, proportions, means, and standard deviations. Descriptive statistics are especially useful
when the researcher finds it necessary to analyze interrelationships between more
than two variables.
Univariate Statistics
249
13-2b Inferential Statistics
Inferential statistics deal with sample data. They enable the researcher to infer
properties of a population based on data collected from only a random probability sample of individuals. Inferential statistics have value because they offset problems associated with data collection. For example, the time-cost factor associated
with collecting data on the entire population may be prohibitive. That is, the population may be immense and difficult to define. In such instances, inferential statistics can prove to be invaluable to the social scientist.
Descriptive and inferential statistics are used in the data analysis process. Data
analysis involves noting whether hypothesized patterns exist in the observations.
We might hypothesize, for example, that urban legislators are more liberal and
supportive of welfare programs than those legislators representing rural constituencies. To test this hypothesis the researcher may ask urban and rural legislators about their views on welfare programs and payments. The researcher then
compares the groups and uses descriptive and inferential statistics to find out
whether differences between the groups support expectations.
In sum, a descriptive statistic is a mathematical summary of measurements for
one variable. Inferential statistics, on the other hand, use sample data to make
statements about the population. Descriptive and inferential statistics provide
explanations for complex political phenomena that deal with relationships
between variables. Thus, they are an important tool in the political scientist’s
repertoire.
13-3 Limitations of Statistics in Research
Statistics cannot resolve every question you have about politics. Therefore, we
need to discuss some of the limitations of statistical research.
First, statistics do not provide the means for the researcher to prove anything
he or she wants to prove. On the contrary, there are explicit procedural guidelines,
rules, and decision-making criteria to follow in the statistical analysis of data. As
such, statistics cannot make up for the lack of clear, consistent, logical thinking in
the development of a body of theory.
Second, statistics provide little help in understanding political phenomena
that we cannot empirically measure. Some contend, for example, that we cannot
measure the critical concept of political power (Bacharach and Baratz 1962). Even
when measurement is possible, statistics do not always tell us whether we are
measuring what we want to measure. There are, for example, several ways to measure the rate of employment. One possibility is to contact the local unemployment
office and find out how many individuals have applied for unemployment benefits. However, what about the few who believe it is beneath them to apply for what
they perceive as welfare? And what about those who have dropped out of the job
market?
A final principal limitation of statistics is that the techniques only allows us to
describe and infer trends among groups. They do not provide definite predictions
about individual cases. Thus, while statistical techniques may provide guidelines,
they do not allow us to reach certain conclusions about individuals (Cole 1996).
Knowing that 64 percent of the respondents in a survey favored gun control, for
example, does not allow you to say that your neighbor favors gun control.
In sum, there are important limits on the value of statistical analysis. There are
some political problems you cannot explore statistically. For those questions subject to quantitative analysis, however, statistics may only be a “poor man’s” substitute for controlled laboratory, or true experimental research. Statistics in these
inferential statistics: Statistics
that enable the researcher to make
decisions (inferences) about
characteristics of a population
based on observations from a
random probability sample taken
from the population.
250
Chapter 13
cases are only valuable when researchers carefully define the problem, develop ways
to measure important concepts, and use a sound research design to collect data.
Then, and only then, are statistics helpful in understanding the research question.
13-4 The Frequency Distribution
Constructing a frequency distribution will probably be the first step you will take
when organizing and presenting information. Properly constructed frequency distributions help summarize a large amount of information while enhancing the
interpretation of data. In this chapter you will learn all you need to know about
arraying and summarizing single variables.
13-4a An Overview
frequency distribution:
A tabulation of raw data according
to numerical values and discrete
classes. A frequency distribution
of party identification, for
example, shows the number
of individuals belonging to
a particular political party.
As a student you have frequently read research papers, articles, and reports that
included descriptive statistics. Government textbooks, for example, present displays of voting results, public welfare expenditures, and characteristics of congressional members. Additionally, media headlines read “President’s Popularity Rises
by Three Percent” or “The Dow Stock Market Drops by 150 Points.” The media
also inundates you with these statistics in the form of public opinion polls. Whatever the source, most of your exposure has usually been with tabular statistics, or
frequency distributions.
One step in analyzing and reporting information involves the presentation of
frequency distributions of the variables of concern. A frequency distribution is
nothing more than a tabulation of raw data according to numerical values and discrete classes. A frequency distribution of party identification, for example, shows
the number of individuals belonging to a particular political party.
A frequency table is the tabular presentation of a frequency distribution. It
should meet certain criteria to be considered presentation quality.1 Let’s examine
Table 13-1, which is a distribution of respondents’ political ideology from the 1998
National Opinion Research Center General Social Survey.
The presentation of a frequency distribution should include the following:
•
•
Table labels: If there is more than one table included in a report, the tables
need a label to distinguish them. Examples include “Table 1,” “Table 2,” or
“Table 13-1.” The latter example identifies the first table of Chapter 13.
Descriptive title: Researchers must make it clear to the reader what
information they are presenting. The title must be as specific as possible. As
such, it should include the type of information (Respondents’ Political
Ideology), the time (1998), and any other pertinent information (General
Social Survey [GSS]).
T a b l e 13-1 Respondents’ Ideology,
General Social Survey (GSS) 1998
Category
Frequency
%
Liberal
772
28.7
28.7
Moderate
986
36.6
65.3
933
34.7
100.0
2691
100.0
Conservative
Totals:
Cumulative %
Question: Is the respondent a liberal, moderate, or conservative?
Source: Data from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey (GSS)
for 1998.
Univariate Statistics
•
•
•
•
Clear labels: Tables require clearly labeled columns that enable the reader to
see the column and row summaries of the table’s data. Our example includes
four columns. Column one contains the variable value (row) labels “Liberal”
through “Conservative.” The second column lists the frequencies, or number
of cases, for each category of the variable. The third column lists the value
percentages. For example, there are 772 respondents who said they were
liberal. The percentage column converts the frequency into a percentage of
the 2,891 cases, or 28.7 percent. The last column, Cumulative %, is only
needed if there are more than two categories associated with the variable. It is
simply a “running total” of the category percentages.
Appropriate classes: Normally each group should have some entries.
Additionally, classes should not be so large that they obscure the range and
variation in the data. For example, classes that divided cases into less than
twenty, or twenty or more, may find 85 percent of the cases in a single class.
This obscures differences in the data. Conversely, a unit-by-unit breakdown
such as less than one, or one to two, would be too fine a classification and
leave some classes with few cases. When determining the number of classes,
you need to consider the needs of your audience and the nature of your data.
A totals row: A properly constructed frequency table must include a totals
row showing the total number of cases included in the table and the
percentage total that will normally add up to 100 percent. We say “normally
add up to 100 percent” because there may be a small difference (99.9 or
100.1) due to the rounding of individual category values.
Source and question: It is a good idea to specify the source of the data you
presented in the table. The source may be the Congressional Record or, as in
our example, survey data collected by a national research center. When
working with survey data, you should also include the question that describes
the variable (Corbett 2001, 135). The source of data and the question, if
applicable, should be presented at the bottom of the table. In our example,
we used the following from the 1998 National Opinion Research Center
General Social Survey: “Is respondent liberal, moderate, or conservative?”
13-4b General Comments about Table Entries
When you present percentage values in your tables, be consistent with the decimal
places. In other words, don’t use one decimal digit (.1) for some entries and two
digits for others (.14). In fact, you should limit yourself to only one decimal digit.
If you do use a decimal digit with percentages, make sure you use a decimal digit
with whole percentages (62.0, not 62). In addition, don’t put percentage signs after
percentages or use horizontal or vertical lines in the table. The use of percentage
signs and lines only clutter the appearance of the table.
13-4c Frequency Distributions with More Than
One Frequency Distribution
On occasion, you may want to present more than one frequency distribution in a
single table. A major advantage of such a table is that it makes it convenient to
compare frequency distributions for different variables. For example, you may
want to compare distributions for attitudes toward spending on varied policy
areas or societal problems. Or, as in Table 13-2, you may want to compare
responses toward different questions you could use to enhance the validity of a
single concept.
251
252
Chapter 13
T a b l e 13-2 Distributions of Attitude toward
Divisive Forms of Speech
Not Allow
Type of speech
Allow
#
%
#
%
Atheism
520
26.4
1451
73.6
Communism
619
31.7
1331
68.3
Sexual orientation
363
18.7
1578
81.3
Military rule
680
34.8
1272
65.2
Racism
714
38.0
1164
62.0
Question: Consider a person who is against/for ________________. If such a person wanted to make
a speech in your (city/town/community), should they be allowed to speak or not?
Source: Adapted from James A. Davis and Tom W. Smith. National Opinion Research Center (NORC) General Social Survey
(GSS) for 1998.
Note that we labeled the table “13-2” to show that it is the second table
included in Chapter 13. Also note that the title specifies the table’s content. At the
bottom of the table, we also included the source of data and a question used to
operationalize the concept of attitude toward divisive forms of speech. In the
table, we presented frequency and percentage distributions for five types of speech
that could prove to be divisive. The table also presents the response categories (Not
Allow and Allow).
Looking at the table, we can easily compare results for the five types of speech.
We can readily see, for example, that a greater percentage of the respondents
would allow a person who is a homosexual to make a speech in their community
(81.3 percent). There is far less tolerance, on the other hand, toward allowing a
racist to deliver a speech in their community (62 percent).
There are other ways you can present more than one frequency distribution in
a single table. Whatever way you decide to use, however, make sure you follow the
rules we presented.
13-4d Frequency Table with Metric Data
There are also times when you may want to present the frequency results of metric variables. Table 13-3 is an example of such a table.
Notice that the table includes only the highest five states and the lowest five
states. Note also that the table includes the mean of the distribution (12.9 percent)
and the standard deviation of the distribution (4.0 percent). The mean is the average level of poverty for the states. The standard deviation is a measure that
expresses the degree of variation within a variable on the basis of the average difference from the mean (Corbett 2001, 294). The smaller the standard deviation,
the closer the individual case values will cluster about the mean. We cover these
measures in more detail in Section 13-6c and Section 13-7d.
13-5 Graphic Presentations
An extension of the frequency distribution occurs when you present distributions
in graphic form. Graphs are a convenient way to present data, and they help one
to understand the data without reading a table. We limit our discussion to three
basic types of graphs: the bar graph, the histogram, and the pie graph.
Univariate Statistics
253
T a b l e 13-3 State Poverty Level
Highest Five States
Rank
State
1
New Mexico
Percent Below the Poverty Line
25.5
2
Mississippi
20.6
3
Louisiana
20.5
4
Arizona
20.5
5
West Virginia
18.5
Rank
State
46
Alaska
47
Nevada
8.1
48
Utah
7.7
49
Indiana
7.5
50
New Hampshire
Lowest Five States
Percent Below the Poverty Line
8.2
6.4
Mean: 12.9.
Standard deviation: 4.0.
Source: Data from Percentage of the population below the poverty line (1996). Statistical Abstract of the United States, 1998.
13-5a The Bar Graph
When dealing with nominal or ordinal data, Cole recommends that you use a bar
graph to present data (Cole 1996, 145). Bars are drawn for each class of the variable so that the height represents the number of cases for each class. Bar graphs are
useful when trying to compare categories. Figure 13-1 presents the data considered in Table 13-1 in bar graph format. The visual advantage of data presented in
a bar graph format is obvious. The reader can immediately see that there are not
as many liberal respondents in the GSS 1998 Survey as there were moderates and
conservatives.
bar graph: A type of graphic
display of a frequency or a
percentage distribution of data.
One uses bar graphs with discrete
data.
13-5b The Histogram
The histogram differs from a bar graph in that you do not separate the bars in a
histogram. The bars are adjoining to show that the variable consists of continuous
1200
Figure 13-1
1000
Respondents’ Political
Ideology, General Social
Survey (GSS) 1998
800
Source: Data from James A. Davis
and Tom W. Smith. National
Opinion Research Center (NORC)
General Social Survey (GSS) for
1998.
600
400
Frequency
histogram: The type of bar
graph that is used to depict
continuous metric-level measures.
200
0
LIBERAL
MODERATE
Respondent’s perceived political ideology.
CONSERV.
254
Chapter 13
Figure 13-2
Histogram of Percent
of the Population Living in
Urban Centers throughout
the World
Source: Data from The World
Almanac and Book of Facts, 1995.
14
12
10
8
6
4
2
0
5.0 15.0 25.0 35.0 45.0 55.0 65.0 75.0 85.0 95.0
10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0
People living in cities (%)
data. Also, intervals, rather than discrete categories, are depicted along the horizontal axis. While bar graphs are used with nonmetric data, researchers use histograms with metric-level data. Figure 13-2 is a histogram that depicts the extent
of urbanization in nations of the world. Let’s take time to examine the graph.
The bars represent the categories for the urbanization variable depicted in the
histogram. The numbers across the horizontal axis represent the intervals for each
category. The first classification will consist of nations having an urbanization rate
from 2.5 percent to 7.5 percent. The second classification will consist of nations
having an urbanization rate from 7.6 percent to 12.5 percent, and so on.
The heights of the bars are proportioned to the number of nations for each
class. The higher the bar, the more nations there are within a particular category.
The numbers alongside the vertical axis represent the number of nations (cases)
included in each category. Continuing our example, there are two nations with an
urbanization rate from 2.5 percent up to 7.5 percent, and there are four nations
having an urbanization percentage from 7.6 percent to 12.5 percent.
13-5c The Pie Graph
pie graph: A type of graphic
display of a frequency
distribution. Each “slice” of pie
represents a category of the
variable. The larger the slice
of pie in the graph, the more
cases for the particular category.
A pie graph displays a frequency distribution as a circle (or pie shape) with each
category shown as a different-colored slice. The larger the slice, the more cases
there are within a particular category. Political scientists use this type of graphic
presentation with nominal or ordinal data. Because of numerous categories, pie
charts are inappropriate to use with metric data. Can you imagine how many slices
of pie you would have with a continuous metric variable such as the one presented
in Figure 13-2? Figure 13-3 presents the data considered in Table 13-1 in pie graph
format.
13-6 Measures of Central Tendency
measures of central tendency:
Numbers that represent the
principal value of a distribution of
data. We commonly refer to these
measures as averages. Measures
of central tendency include the
mode, the median, and the mean.
central tendency: The most
frequently observed, common,
or central value in the distribution
of values of a variable.
While frequency distributions and graphs help to describe and explain variables,
political scientists often want to present their findings more conveniently. Reports
dealing with several variables would soon become tedious if you relied solely on
the depiction of charts and frequency distributions. Therefore, researchers often
summarize data with measures of central tendency.
A measure of central tendency is a number that represents the principal value
of a distribution of data. We commonly refer to these measures as averages. An
average you are probably familiar with is your grade point average, or GPA. Your
GPA describes and summarizes your academic performance in college classes.
Measures of central tendency include the mode, the median, and the mean.
Univariate Statistics
255
Figure 13-3
CONSERV.
Respondents’ Perceived
Political Ideology, General
Social Survey (GSS) 1998
LIBERAL
Source: Data from James A. Davis
and Tom W. Smith. National
Opinion Research Center (NORC)
General Social Survey (GSS) for
1998.
MODERATE
Category
Liberal
Moderate
Conservative
Totals:
Frequency
772
986
933
2691
%
28.7
36.6
34.7
100.0
Cumulative %
28.7
65.3
100.0
13-6a The Mode
The mode is a convenient measure to use with nominal data. The mode is the
most frequently occurring value in any distribution of data. If a distribution has
only one mode, we say the distribution is unimodal. If there are two values that
appear most frequently, the distribution is bimodal.
Figure 13-4 shows the political party affiliation of members of the 107th session of the U.S. House of Representatives.
A close look at the figure shows that the Republican political party was the
most common party affiliation of members of the 107th House of Representatives
(51.0 percent). While not equal, the distribution also approximates a bimodal
distribution.
13-6b The Median
The median is the middle item of a set of numbers after ranking the items according to their size (1, 2, 3, . . ., n). For a ranked distribution the median is the score
mode: The category of a variable
with the greatest frequency of
observations.
median: The category or value
above and below which one-half
of the observations lie. (The
median is the middle category or
value.)
Figure 13-4
Political Party Affiliation
of Members of the 107th
U.S. Congress
Source: Data from
http://thomas.loc.gov
Democrat
Republican
Party membership
Category
Democratic
Republican
Independents
Totals:
Frequency
211
222
2
Percentage
48.5
51.0
.5
435
100.0
256
Chapter 13
T a b l e 13-4 Hypothetical Distribution of Scores of Ideology
Scale of Angelo State University Students, 2001
Student
Score
Student
Score
Student
Score
1
7
10
5
19
3
2
7
11
5
20
3
3
7
12
5
21
2
4
6
13
4
22
2
5
6
14
4
23
2
6
6
15
4
24
1
7
6
16
4
25
1
8
5
17
4
9
5
18
4
N = 25.
Median = 4.
Source: Hypothetical
of the middle case if there are an odd number of cases. If there is an even number
of cases, the median is the value halfway between the two middle cases. In other
words, you will have to calculate the average of the two middle cases.
As an example, assume that a political science student used a scale to determine the ideological views of twenty-five respondents. The distribution of scores
ranging from 1 (liberal) to 7 (conservative) might appear as shown in Table 13-4.
Determining the median in Table 13-4 is a simple process if you take the following steps:
1.
2.
3.
4.
5.
Rank the numbers (7, 7, 7, 6, 6, 6, 6, . . .1, 1).
Determine the number of items in the set (N) = 25.
Add 1 to the number of items: 25 + 1 = 26.
Divide the result by 2 to determine the middle item: 26/2 = 13.
The median is 4, or the response of the thirteenth respondent.
This value is no greater than half the distribution (those first twelve students
whose scores range from 5 through 7). Additionally, it is no smaller than half the
distribution (students 14 through 25 whose scores range from 1 through 4).
If student 25 was not included in this sample, there would not be a single middle case. For a data set having an even number of items, the same steps are taken
to calculate the median.
1.
2.
3.
4.
Rank the numbers.
Determine the number of items in the set (N) = 24.
Add 1 to the number of items: 24 + 1 = 25.
Divide the result by 2 to determine the middle item: 25/2 = 12.5.
In principle, the median is the 12.5 item. To determine the value you calculate
the average of the values of the twelfth and the thirteenth items (5 + 4 /2 = 4.5).
The result represents the median for the sample.
Before we leave our discussion about the median, we need to discuss several of
its characteristics. First, the median case is always in the middle, and extreme values do not affect the median value. Thus, its interpretive value remains constant.
When we discuss the arithmetic mean in Sections 13-6c and 13-6d, we will see
how extreme values can detract from the interpretive value of the statistic. Second,
although we use every item to determine the median, we do not use their actual
Univariate Statistics
257
values in the calculations. At most, we only use the values of the two middle items
to calculate the median when we have an even number of cases in our data set.
Third, if items do not cluster near the median, the median may not be a good
measure of the group’s central tendency. Last, medians usually do not take on values that are not realistic. The median number of children per American families,
for example, is two. The median number of children per American families will
never have a value of 1.7, for example.
13-6c The Mean
The mean is used with metric data. It is the average of a set of numbers. We calculate the mean by summing the observations in a data set and dividing by the
number of cases.
To illustrate how we can use the mean in political science, let’s analyze the following problem. For the current budget year, a local “Meals for the Elderly” board
has limited the agency to serve hot meals to a monthly average of 340 recipients.
For the first nine months of 2003 the serving figures were 320, 360, 350, 350, 370,
330, 360, 370, and 340. Based on the first nine months, is the agency meeting its
monthly average target? You need to calculate the mean for the first nine months
to answer the question.
Mean first nine months: 320 + 360 + 350 + 350 + 370 + 330 + 360 + 370 +
340 / 9 months = 3150/9 = 350.
Since the mean of 350 exceeds the agency board goal of 340 recipients, it does
not appear that the agency can meet the board’s target. You are agency administrator; how many recipients must your agency average over the next three months
to meet the target? To answer this question, you need to
1. Determine the number of clients the agency would have to feed for the year if
the agency adhered to the board’s edict (340 * 12 = 4080).
2. Determine the number of clients the agency is feeding to date (350 * 9 =
3150).
3. Determine the number of clients the agency can feed over the last three
months of the year without exceeding the goal (4080 – 3150 = 930).
4. Determine the number of clients the agency can feed for each month for the
remainder of the year (930 * 3 = 310).
The mean has several important characteristics. First, we use every item in a
group to calculate the mean. Second, unlike the mode, every group of data has one
and only one mean. Third, the mean may take on a value that is not realistic. For
example, the average American family has exactly 1.7 children. Fourth, an extreme
value may have a disproportionate influence on the mean and thus could affect
how well the mean represents the data.
13-6d Comparing the Mode, Median, and Mean
The three measures of central tendency that we discussed represent univariate distributions. Each, however, has its own characteristics that prescribe and limit its
use. The mode is the most common value in any distribution of data. The median
is nothing more than the middle item of a set of numbers when one ranks the
items in order of size. Last, the mean is the average of a set of values.
How does one know, however, when to use the mode, the median, or the
mean? Alas, there is not an easy answer to this question. Most statisticians agree,
however, that the application of any measure of central tendency depends on
the measurement level of the variable being analyzed. Table 13-5 shows that the
mean: The sum of the values of
a variable divided by the number
of values.
258
Chapter 13
T a b l e 13-5 The Hierarchy of Measurement
Level of Measurement
Measure of Central Tendency
Mode
Median
Metric
x
x
Ordinal
x
x
Nominal
x
Mean
x
mode can represent nominal variables such as the distribution of gender or party
affiliation. We use the median, on the other hand, with ordinal variables such as
classes of attitudes and categories of income. The political researcher uses the
mean with metric variables such as age and years of formal education.
Table 13-5 also shows that it is permissible to use the measures appropriate for
lower levels of measurement with higher-level data. It is not appropriate, however,
to use higher-level measures with lower-level data. The mode, for example, can
represent income, but the mean cannot represent the distribution of gender.
We can use Table 13-6 to illustrate the appropriateness for using measures of
central tendency. The data represented in the table is a hypothetical distribution of
Government grades made by Jerry Perry, a student of political science, during one
agonizing semester. Note that the grades have been arrayed in descending order.
The instructor used the ratio level of measurement for the grades. Thus,
according to our discussion, the appropriate measure of central tendency to use is
the arithmetic mean. When we calculate the mean in the example, our answer is
60 (540/9). Additionally, the median score, the value that occupies the middle
position in an array of values, is 55. Both averages are relatively low. Thus, neither
Jerry nor his instructor is happy.
In addition, Jerry knows his father will be unhappy about his course grades.
He does not want to tell his father about the low mean and median values. So he
decides that if his father asks about his average in the course, he will give his father
the modal value, the most commonly occurring value in an array of data. In this
example the mode is 85, which appeals to Jerry. After he tells his father that he had
several grades of 85, his father will be happy and Jerry will not incur his father’s
wrath. In sum, our example illustrates how the different measures of central tendency can be misleading if used in the wrong context.
13-7 Measures of Dispersion
Measures of central tendency are helpful in identifying important characteristics
among distributions of data. They accurately reflect the actual values of distributed data when the data closely group about the measures. Conversely, measures
of central tendency are less likely to reflect the actual values of all members of a
distribution when the data has extreme values. For example, the mean for Jerry
Perry’s Government grades was 60. However, the high score was 85 and the low
T a b l e 13-6 Comparison of the Measures of Central Tendency
30
35
45
50
55
70
85
↑
Total of all grades = 540.
Number of grades (n) = 9.
Mode = 85.
Median = (n + 1)/2 = 5th item, 55.
Mean = 540/9 = 60.
85
85
Univariate Statistics
score was 30. Thus, there is much dispersion between the mean and the extreme
scores. Therefore we need some measure of the deviation from the average value
to tell us how well the measure of central tendency summarizes the data.
Political scientists use measures of dispersion, also known as measures of
variability (Corbett 2001, 134), to gain a clearer understanding of a distribution of
data. Measures of dispersion are ways to communicate other differences in a set of
data. They tell how much the data clusters about the various measures of central
tendency.
Suppose that researchers measured constitutional knowledge for a national
sample of registered voters. The mean that we calculate for the test is 70 out of a
possible 100. While we want to know the average score, we also want to know how
much variation there is in the scores. In other words, how reflective is the mean in
describing the distributions of scores? Did the majority of the respondents get a
score of 70? Was there a bimodal distribution with one group of students attaining low scores and another group attaining high scores that averaged out to 70? Or
was there a normal distribution of scores? We use measures of dispersion to
answer these questions.
259
dispersion: The distribution
of data values around the most
common, middle, or average
value.
13-7a The Variation Ratio (v)
The variation ratio is useful when analyzing nominal data. It is simple to calculate and easy to understand. Specifically, the variation ratio tells the political scientist the degree to which the mode satisfactorily represents a particular frequency
distribution. The formula for v is
Formula: v = 1 –
Number of cases in the modal category
Total number of cases
By analyzing the formula one can see that if all cases in a distribution fell into
the modal class, the value of v would be 0. Thus, the lower the v score, the more
representative the mode of all cases in the distribution.
As an illustration, let’s examine the distribution of political ideology and party
identification as shown in Table 13-7. The variation ratio for Republicans (.46)
suggests that the mode is a better representation of ideology for Republicans than
for the other groups. The variation ratio also shows that the mode is a less satisfactory summary of ideology for those respondents who say they are Independent
(.70). Put another away, the Independent respondents varied more in their ideological orientations. Thus, one should be careful of reporting the mode as representative of the ideological orientation of all Independents in this example.
T a b l e 13-7 Distribution of Political Ideology
and Party Identification, 2001
Political Ideology
Democrat
Independent
Liberal
360
160
80
Libertarian
220
260
180
Conservative
180
280
540
Populous
Totals
v computation
v
Source: Hypothetical
240
1000
1 – 360/1000
.64
300
1000
1 – 300/1000
.70
Republican
200
1000
1 – 540/1000
.46
variation ratio: The variation
ratio tells the political scientist
the degree to which the mode
satisfactorily represents a
particular frequency distribution.
260
Chapter 13
T a b l e 13-8 Hypothetical Distribution of Scores of Ideology
Scale of Angelo State University Students, 2001
Student
Score
Student
Score
Student
Score
1
6
10
5
19
3
2
6
11
5
20
3
3
6
12
5
21
2
4
6
13
4
22
2
5
6
14
4
23
2
6
6
15
4
24
2
7
6
16
4
25
2
8
5
17
4
9
5
18
4
N = 25.
Median = 4.
Range = 4 (6 – 2).
Source: Hypothetical
13-7b The Range
range: The distance between
the highest and lowest values
or the extent of categories into
which observations fall.
mean deviation: A measure
of dispersion of data points for
metric-level data. It is the mean of
differences between each value in
a distribution and the mean of the
distribution.
The range is a useful measure when the researcher is working with ordered or
ranked data (Cole 1996, 162). Thus, it is useful with ordinal data and when considering the degree to which the median satisfactorily represents a particular frequency distribution. The range is simply the difference between the largest value
and the smallest value in a distribution. The smaller the range, the more accurate,
or representative, is the median score of all values in the distribution.
In Table 13-4, our example of the median, we presented a hypothetical distribution of ideological orientation scores for twenty-five Angelo State University
students. The median in our example is 4 and the range is 6 (7 – 1). The responses
of another sample of twenty-five students using the same seven-point scale appear
in Table 13-8. In this example, the median is still 4, but the range is 4 (6 – 2). A
comparison of the measures presented in the two tables shows that there is greater
homogeneity in the responses of the second group of students.
The range is easy to calculate and has utility as a measure of dispersion. While
we could use the range with metric data, it is not wise to do so because extreme
values in a distribution could influence the range, thus giving a misleading
impression of variation. Consider, for example, that there is one individual with a
doctorate degree in the community you sample. If you randomly choose twentyfive persons to sample, there is an excellent chance that he or she will probably not
be included. But, for the sake of this example, suppose you do include the individual in your sample. The range in education levels will then be extremely large
and very misleading as a measure of dispersion. In addition, if we use the range as
a measurement of dispersion with metric data, we do not know anything about
the variability of scores between the two extreme values except that the scores do
lie somewhere within the range. Consequently, you should avoid using the range
with metric data.
13-7c The Mean Deviation
The mean deviation is useful when analyzing metric data. Simply put, the mean
deviation is the average difference between the mean and all other values in the
distribution. The mean deviation makes use of every observation in the distribu-
Univariate Statistics
261
tion. One computes this measure by taking the difference between each observation in the distribution and the mean. Summing these deviations is the next step.
(Note: When summing, ignore negative signs. Otherwise, the sum would always be
zero.) Last, divide the sum by the number of observations. Arithmetically, the
mean deviation is expressed as
Mean deviation =
Σ Xi − x
n
where
Σ = the sum of.
Xi = each individual observation.
X = the mean of all of the observations.
n = number of observations.
| | = absolute difference (ignore signs).
Table 13-9 illustrates the calculation of the mean deviation for the percentage
of the total vote George W. Bush and Dick Cheney received in the southeastern
states. The results show that, on the average, the percentage of the total vote
received by the Bush/Cheney ticket in each southeastern state in the 2000 election
differs from the mean vote for all southeastern states by 2.7 percent.
13-7d The Variance and the Standard Deviation
While the mean deviation has a more direct intuitive interpretation than other
measures of deviation that we can use with metric data, the measure has fewer
useful statistical properties than those measures (Blaylock 1979, 78). As such,
political researchers do not use this measure very often. We have discussed it
largely as a way to enhance your understanding about measures of dispersion and
as a prelude to our discussion of other metric measures of dispersion.
One such measure is the variance. The variance uses the mean deviation in its
calculation. When you calculate the variance, however, you square the differences
between each observation and the mean. Next, you sum the squares and divide the
T a b l e 13-9 Percent of Vote for Bush/Cheney in Southeastern
States in the 2000 Presidential Election
(Mean Deviation)
Mean = 54.0%
State
% Vote
|Xi–X|
Alabama
56.5
2.5
Arkansas
51.3
2.7
Florida
48.8
5.2
Georgia
55.0
1.0
Kentucky
56.4
2.4
Louisiana
52.6
1.4
Mississippi
57.6
3.6
South Carolina
56.9
2.9
Tennessee
51.2
2.8
Total
Mean Deviation = 24.5/9 = 2.7.
Source: Adapted from National Archives and Records Administration.
24.5
variance: Another measure of
dispersion of data points about
the mean for metric-level data.
It is a measure of how spread out
a distribution is.
262
Chapter 13
result by the number of cases. The formula for the calculation of the variance
looks very much like the formula for the calculation of the mean deviation:
Variance (s2 ) =
Σ ( Xi – X )2
n
where
Σ = the sum of.
Xi = the summation of the value of each individual observation.
X = the mean of all of the observations.
n = number of observations.
standard deviation: The most
common measure of dispersion
of data points about the mean of
metric-level data. It is athe square
root of the variance.
standard score: An individual
observation that belongs to a
distribution with a mean of 0
and a standard deviation of 1.
See Z score.
The standard deviation is probably the most common measure of dispersion
for metric data. Like the variance, the basis for standard deviations is the squared
differences between every item in a data set and the mean of that set. In fact, you
simply take the square root of the variance to calculate the standard deviation.
Similar to the other measures of dispersion we discussed, the smaller the standard
deviation in a set of data, the more closely the data cluster about the measure of
central tendency.
We won’t trouble you with the reasoning here, but the standard deviation is a
stable measure of dispersion from sample to sample. Political scientists use standard deviations with the normal curve to determine where scores or observations
cluster about the mean and to determine a standard score. While we can use
either the variance or the standard deviation to indicate the amount of variation
within a metric-level variable, we usually use the standard deviation.
To see the utility of the two measures, let’s examine Table 13-10. The table is
similar to Table 13-9 in that it depicts the percent of vote for Bush/Cheney in
southeastern states in the 2000 presidential election. It differs, however, in that it
illustrates the calculation of the variance and the standard deviation for the same
data set. For this example, the variance is 8.7 and the standard deviation is 2.96.
The lower the variance/standard deviation, the more accurately does the mean
represent all the scores of all cases in a distribution of metric-level data.
T a b l e 13-10 Percent of Vote for Bush/Cheney in
Southeastern States in the 2000 Presidential
Election (Variance and Standard Deviation)
Mean = 54.0%
State
(X i – X)2
% Vote
(X i – X)
Alabama
56.5
2.5
Arkansas
51.3
2.7
7.3
Florida
48.8
5.2
27.0
Georgia
55.0
1.0
1.0
6.3
Kentucky
56.4
2.4
5.8
Louisiana
52.6
1.4
2.0
Mississippi
57.6
3.6
13.0
South Carolina
56.9
2.9
8.4
Tennessee
51.2
Total
Variance (s2) = 78.6/9 = 8.7.
Standard deviation (s) = /8.7 = 2.95.
Source: Adapted from National Archives and Records Administration.
2.8
7.8
24.5
78.6
Univariate Statistics
263
T a b l e 13-11 Per Capita Income for Selected States
State
Mean Income
Standard Deviation
Florida
22,916
500
Texas
20,654
3000
Colorado
23,449
600
New York
26,782
550
New Mexico
18,055
1850
Illinois
24,763
625
Source: Hypothetical
The standard deviation also helps us to understand the distribution of the values for a particular metric-level variable and can be helpful when we are comparing two or more groups of cases (Corbett 2001, 140). To illustrate our point, let’s
examine Table 13-11.
The table shows the hypothetical per capita income (PCI) for samples of citizens of several states. It also shows the standard deviation for the PCI in the states.
Let’s analyze the results. The table shows that the income is distributed in very different ways in Florida and in the other states. In Florida, Colorado, New York, and
Illinois, people’s incomes are fairly close together. In other words, there is not a
great deal of income inequality. That is why the standard deviations are relatively
low. In New Mexico, however, there is greater income inequity. And in Texas, as
evidenced by the relatively high standard deviation, there is a great deal of income
inequity when compared to the other states.
In summary, the standard deviation and the variance show us how much variation there is within a metric variable. For a variable, when there is very little difference from one case to another, these statistics will be low. Conversely, when
there is a great deal of diversity among the cases for a variable, these statistics will
be high. As we discuss later in Section 13-8b of this chapter, when the distribution
of values of a variable approaches a normal distribution, the standard deviation
tells us even more.
13-8 Shape of the Distribution and Metric Distributions
Up to this point we have discussed measures of central tendency and measures of
dispersion as ways to examine data distributions. In the past, political scientists
also analyzed the shape of distributions by constructing a frequency polygon. To
do this, they would connect the midpoints of the top of each bar of a histogram
with a solid line. The shape of a distribution was a function of the distribution.
Those distributions having most of their case scores above the mean had a different shape from those having a large proportion of scores below the mean.
frequency polygon: A graph
resulting from the connection
of the midpoints of the top of
each bar of a histogram with
a solid line.
13-8a Skewed Distributions
Three possible shapes can result when drawing frequency polygons of metric data
distributions. Figure 13-5 shows the first two shapes. Each shape represents a
skewed distribution. This means that in both instances there are more extreme
scores in one direction or the other. In the first instance, there are more extreme
low scores than extreme high scores. This is a negative, or left, skewed distribution.
You can see that the mean is pulled in the directio of the lower scores. If you are
analyzing the distribution of Anglo residents for the United States, the distributions will be negatively skewed because Anglos are in the minority only in
Hawaii. The second shape shows the impact of the many extreme high scores in
skewed distribution: A data
distribution in which more
observations fall to one side
of the mean than the other. Thus,
the mean is “pulled” toward the
extreme low (negative skew) or
extreme high (positive skew).
negatively skewed:
A distribution of values in which
more observations lie to the left
of the middle value.
264
Chapter 13
14
30
12
10
20
8
6
10
4
2
0
35.0 40.0 45.0 50.0 55.0 60.0 65.0 70.0 75.0 80.0 85.0 90.0 95.0 100.0
1992: PERCENT WHITE (SA,1996)
Mean = 83.9%. Median = 87.4%.
0
0.0
100000.0 200000.0 300000.0 400000.0 500000.0
50000.0 150000.0 250000.0 350000.0 450000.0 550000.0
1990: LAND AREA IN SQUARE MILES
Median = 54,125 miles. Mean = 70,724 miles.
(a)
(b)
Figure 13-5
Skewed Distributions
and the Normal Curve
Source: Statistical Abstract
of the United States, 1996.
positively skewed:
A distribution of values in which
more observations lie to the right
of the middle value.
normal distribution curve:
A frequency curve showing a
symmetrical, bell-shaped
distribution in which the mean,
mode, and median coincide and
in which a fixed proportion of
observations lies between the
mean and any distance from the
mean measured in terms of the
standard deviation.
the distribution. This shape represents a positive, or right, skewed distribution
because the mean is pulled in the direction of the higher scores. If you are examining the land area for the United States, the distribution will be positively skewed
because of Texas and Alaska.
While political researchers, in the past, drew frequency polygons to get a sense
of the shape of a distribution, today many statistical packages allow us to compare
metric distributions with the normal curve. Again, consider Figure 13-5. The figure depicts two distributions with a normal distribution curve superimposed on
the polygons. If the distributions were normal distributions, the bars would touch
the curves. As you can readily see, there are several bars that do not reach the
curves and there are several bars that extend beyond the curves. Thus, each distribution is a skewed distribution. Also note that the median value is greater than the
value of the mean in Graph (a), while the mean is greater than the median value
in Graph (b). Therefore, Graph (a) depicts a negative (left) skew and Graph (b)
illustrates a positive (right) skew.
13-8b The Normal Curve
A symmetrical distribution is the third shape you can obtain when constructing a
frequency polygon. The third shape one can obtain is a symmetrical distribution.
Figure 13-6 is a depiction of the normal distribution curve. The normal curve, a
special type of symmetrical distribution, is very valuable in statistics because it has
several important properties. First, the curve is symmetrical and bell-shaped. Second, the measures of central tendency coincide at the center of the distribution. In
other words, the values of the measures are equal. Third, the curve is based on an
infinite number of observations. The last property of the normal curve that we
discuss, however, is probably its most distinctive characteristic. In any normal distribution, a fixed proportion of the observations lie between the mean and fixed
units of standard deviations. To help you understand why this property is so
important, let’s examine Figure 13-7.
The percentages can be seen in Figure 13-7. The mean of the distribution
divides the curve exactly in half. Note that a little more than 34 percent of all cases
fall between the mean and one standard deviation above the mean. Additionally, a
little more than 34 percent of the cases fall between the mean and one standard
deviation below the mean. Thus, slightly more than 68 percent of all cases in a
Univariate Statistics
265
Figure 13-6
The Normal Distribution
-4
-3
-2
-1
0
1
2
3
4
normal distribution lie within one standard deviation (plus or minus) of the
mean. Similarly, more than 13.5 percent of all cases fall between one standard
deviation and two standard deviations above the mean and between one standard
deviation and two standard deviations below the mean. Therefore, more than 95
percent of all cases in a normal distribution lie within a plus or minus two standard deviations of the mean. Continuing the analysis you can see that almost all
of the cases (99.74 percent, to be exact) will fall within a plus or minus three standard deviations of the mean.
Consequently, the standard deviation used with the normal curve can be a
very important tool in the political scientist’s repertoire. It is important because
the researcher can determine the proportion of observations included within fixed
distances of the mean. For example, assume that the public’s rating of a particular
welfare program rated on a scale of 0 to 100 has a normal distribution. Additionally, the distribution has a mean of 50 and a standard deviation of 10. Based on
this information we can conclude that more than 68 percent of the public assigns
the program a rating between 40 and 60 (±1 standard deviations from the mean).
Additionally, more than 95 percent assigned the program a rating between 30 and
70 (±2 standard deviations from the mean). Last, almost everyone in the survey
Figure 13-7
.3413
.3413
.1359
Areas under the
Normal Curve
.1359
.0215
-3
Source: Adapted from
http://davidmlane.com/hyperstat/
normal_distribution.html.
.0215
-2
-1
0
1
2
3
If the mean is 50 and the standard deviation is 10 the scores are as follows:
20
30
40
50
60
70
80
(2s)
(3s)
(-3s)
(-2s)
(-s)
Mean
(s)
266
Chapter 13
assigned the program a rating between 20 and 80 (±3 standard deviations from
the mean).
13-8c Standard Scores (the Z Score)
When rating the proportion of observations within a desired interval, the political scientist should express observations in units of standard deviation. For example, you can use Figure 13-7 to determine the percentage of cases rating the welfare
program from the mean of 50 to 75. To do so you have to determine how many
standard deviations the rating of 75 lies from the mean of 50. Political scientists
calculate the Z score to accomplish this task. The formula for Z is
Z =
X – X
s
where
Z score: The number of
standard deviations that a score
deviates from the mean in a
standardized normal distribution.
See standard score.
standard normal distribution:
A normal distribution having a
mean of 0 and a standard
deviation and variance of 1.
Z = the standard score.
X = the value of any observation.
X = the mean.
s = the standard deviation.
The Z score tells us the number of standard deviations that the score lies above
or below the mean. If we apply the formula to the example discussed, the Z score is
Z =
75 – 50
= 2.50
10
In our example, we find that the score of 75 is 2.5 standard deviation units
above the mean. Intuitively, this should make sense. Recall that we showed that a
score of 70 is 2 standard deviation units above the mean, and a score of 80 is 3
units of standard deviation above the mean. Thus, a score of 75 had to fall between
2 and 3 standard deviation units.
To carry our analysis further, Figure 13-7 shows that 47.72 percent of the cases
lie between the mean and 2 standard deviation units above the mean. The figure
also shows that 49.87 percent of the cases lie between the mean and 3 units of standard deviation above the mean. Thus, if a 75 rating is 2.5 standard deviation units
above the mean, somewhere between 47.72 and 49.87 percent of the public
assigned the program a score between 50 and 75. We will need to use the standard
normal distribution table to determine the exact percentage.
Table 13-12 depicts selected sections of the standard normal distribution
table. In other words, it is a partial Z table to be used with the examples in this
book. Take the following steps to use the table:
1. Scan the far-left column to find the first two digits of the Z value. In our case,
2.5.
2. Under the numerical column headings find the third digit of the Z value. In
our case, .00.
3. Extend both the column and the row until they intersect. The value that you
find at the point of intersection is the proportion of cases that lie between the
mean and 2.5 standard deviations above the mean. In our case, .4938. This
means that 49.38 percent of the public assigned the welfare program a rating
between 50 and 75.
Univariate Statistics
267
T a b l e 13-12 Selected Sections of the Standard
Normal Distribution
Z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
0.0
.0000
.0040
.0080
.0120
.0160
.0199
.0239
.0279
.0319
.0359
0.5
.1915
.1950
.1985
.2019
.2054
.2088
.2123
.2157
.2190
.2224
1.0
.3413
.3438
.3461
.3485
.3508
.3531
.3554
.3577
.3599
.3621
1.5
.4332
.4345
.4357
.4370
.4382
.4394
.4406
.4418
.4429
.4441
2.0
.4772
.4778
.4783
.4788
.4793
.4798
.4803
.4808
.4812
.4817
2.5
.4938
.4940
.4941
.4943
.4945
.4946
.4948
.4949
.4951
.4952
3.0
.4987
.4987
.4987
.4988
.4988
.4989
.4989
.4989
.4990
.4990
Notes:
1. This is a partial Z table to be used with the examples in this book.
2. An entry in the table is the proportion under the entire curve that is between Z = 0 and a positive value of Z. Areas for
negative values of Z are obtained by symmetry.
3. To obtain the percentage of cases from the mean, multiply the cell entry by 100 (.4938 * 100 = 49.38%).
To conclude our discussion about the normal curve and Z scores, let’s look at
another illustration. Suppose you want to determine the percentage of the public
assigning the program a rating from 0 to 75. Before you begin to plug figures into
the formula just presented, there is a quicker way to determine the answer. Simply
add .50 to the percentage associated with the Z value we just calculated (.4938). We
do this because the normal curve assumes that 50 percent of the cases will lie on
either side of the mean. Thus, we conclude that 99.38 percent of the public would
rate the program from 0 to 75.
Chapter Summary
Chapter Summary
In this chapter we examined some important tools to use in
the preliminary stage of data analysis. For example, sophisticated computer programs summarize data as frequency
distributions. These distributions depict a number of case
(N) and number of cases by class, percentages, and, perhaps,
cumulative percentages. These techniques help the political
scientist to assess the weight of a single class in relation to
other classes of a distribution or distributions.
Additionally, political scientists use measures of central
tendency to describe the distribution’s main characteristics.
These measures help the researcher answer questions such
as “What is the typical party identification of respondents?”
or “What is the average level of income of the group?”
Measures of central tendency, however, can be misleading if not accompanied by measures that describe the
amount of dispersion in the distribution. While measures of
central tendency reflect a group’s typical characteristic,
measures of dispersion depict the extent of variance from
the typical value, or average. The dispersion measures show
how many members of the group deviate from the typical
and the extent of their deviation. A small deviation shows
that most responses cluster around the measure of central
tendency, suggesting a homogeneous group. Large deviations, on the other hand, suggest that the measure of central
tendency is a poor representation of the distribution.
Another important step in examining a distribution is
the identification of its general form. For example, the shape
of frequency polygons may show that extreme scores in the
distribution may affect the measure of central tendency. Or
the form may be symmetrical or even normal, because there
are no extreme scores affecting the shape of the distribution.
If this is the case, there are a fixed proportion of observations lying between the mean and fixed units of standard
deviation.
The measures discussed throughout this chapter help
the political scientist understand data distributions. Analyzing these descriptive statistics, however, is only the first step
in data analysis. Once summarized, researchers often want
to discover relationships between variables. We turn to this
issue in the next chapter.
268
Chapter 13
Chapter Quiz
Chapter Quiz
1. Consider these scores: 0, 3, 1, 5, 1. The mean is
a. 2.
b. 2.5.
c. 3.
d. None of choices a through c is correct.
2. In a symmetric, unimodal distribution,
a. the median equals the mean.
b. the mode equals the median.
c. the mean equals the mode.
d. Each of choices a through c is correct.
3. The number of mean years of the GSS variable
AGE1STBRN is higher than the median. The variable
measures the respondent’s age when their first child
was born. So we know that the distribution of
respondent’s age when their first child was born is
a. negatively skewed.
b. normal.
c. bimodal.
d. positively skewed.
4. The standard deviation measures deviation around
the
a. mode.
b. median.
c. mean.
d. variance.
5. The number of standard deviations a score lies from
the mean in a normal standard distribution is
a. the case’s Z score.
b. the standard error.
c. the confidence interval.
d. None of choices a through c is correct.
6. The ________________________ is the only measure
of central tendency that can properly be used with
nominal data.
a. mode
b. median
c. mean
d. standard deviation
7. Political scientists use measures of _______________
to describe the distribution’s average characteristics.
a. dispersion
b. association
c. central tendency
d. statistical significance
8. Measures of __________________________depict
the extent of variance from the typical value, or
average value.
a. dispersion
b. association
c. central tendency
d. statistical significance
9. The basis for the standard deviation and
_____________ is the squared differences between
every item in a data set and the mean of that set.
a. mode
b. median
c. mean
d. variance
10. The _________________________________ is the
average difference between the mean and all other
values in the distribution.
a. mode
b. standard deviation
c. variance
d. mean deviation
Univariate Statistics
269
Suggested Readings
Suggested Readings
Bernstein, Robert A. and James A. Dyer. An Introduction to
Political Science Methods, 3rd ed. Englewood Cliffs, NJ:
Prentice-Hall, 1992.
Blaylock, Hubert M., Jr. Social Statistics, 2nd ed. New York:
McGraw-Hill, 1979.
Cole, Richard L. Introduction to Political Science and Policy
Research. New York: St. Martin’s Press, 1996.
Corbett, Michael. Research Methods in Political Science: An
Introduction Using MicroCase, 4th ed. Belmont, CA:
Wadsworth, 2001.
Davis, Richard and Diana Owen. New Media and American
Politics. New York: Oxford University Press, 1998.
Fox, William. Social Statistics, 3rd ed. Bellevue, WA: MicroCase, 1998.
Endnote
Endnote
1. See Corbett, Michael. Research Methods in Political Science: An Introduction Using MicroCase, 4th ed. Belmont,
CA: Wadsworth Publishers, 2001, for a succinct presentation of how to present frequency tables.
Frankfort-Nachmias, Chava and David Nachmias. Research
Methods in the Social Sciences, 6th ed. New York: Worth
Publishers, 2000.
Johnson, Janet Buttolph, Richard A. Joslyn, and H. T.
Reynolds. Political Science Research Methods, 4th ed.
Washington, D.C.: Congressional Quarterly Press, 2001.
Kay, Susan Ann. Introduction to the Analysis of Political Data.
Englewood Cliffs, NJ: Prentice-Hall, 1991.
Leedy, Paul D. and Jeanne Ellis Ormrod. Practical Research:
Planning and Design, 7th ed. Upper Saddle River, NJ:
Merrill Prentice Hall, 2001.