Download PPA 415 – Research Methods in Public Administration

Document related concepts

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
PPA 501 – Analytical Methods
in Administration
Lecture 5a - Counting and
Charting Responses
Percentages and Proportions

Percentages and proportions supply a
frame of reference for reporting research
results by standardizing the raw data:
percentages by base 100 and proportions
by base 1.00.
f
Proportion (p) 
N
f 
Percentage (%)    *100
N
Percentages and Proportions

Example from IAEM-NEMA Survey, 2006.
Problems with the government response to Hurricane Katrina arose largely because
of inadequate leadership and management of the crisis by FEMA.
Valid
Mis sing
Total
Strongly disagree
Dis agree
Neutral
Agree
Strongly agree
Total
Sys tem
Frequency
7
24
26
27
15
99
12
111
Percent
6.3
21.6
23.4
24.3
13.5
89.2
10.8
100.0
Valid Percent
7.1
24.2
26.3
27.3
15.2
100.0
Cumulative
Percent
7.1
31.3
57.6
84.8
100.0
Percentages and Proportions

Guidelines.



When working with a small number of cases,
report the actual frequencies.
Always report the number of observations
along with proportions and percentages.
Proportions and percentages can be used for
any level of measurement.
Percentage Change
 f 2  f1 
 100
Percentage change  
 f1 
where f1  first score, frequency, or value at time 1
f 2  second score, frequency, or value at time 2
Percentage Change Example
Percentage Change Example
 57.84%  50.18% 
% change 1958 - 1964  
  15.27%
50.18%


 61.26%  57.84% 
% change 1964 - 1966  
  5.91%
57.84%


 46.12%  61.26% 
% change 1966 - 1968  
  24.71%
61.26%


 36.82%  46.12% 
% change 1968 - 1970  
  20.16%
46.12%


 34.20%  36.82% 
% change 1970 - 1972  
  7.12%
36.82%


Ratios and Rates

We determine ratios by dividing the
frequency of one category by another.
Problems with the government response to Hurricane Katrina arose largely because
of inadequate leadership and management of the crisis by FEMA.
Valid
Mis sing
Total
Strongly disagree
Dis agree
Neutral
Agree
Strongly agree
Total
Sys tem
Frequency
7
24
26
27
15
99
12
111
Percent
6.3
21.6
23.4
24.3
13.5
89.2
10.8
100.0
Valid Percent
7.1
24.2
26.3
27.3
15.2
100.0
Cumulative
Percent
7.1
31.3
57.6
84.8
100.0
Ratios and Rates


The ratio of people who agree that the FEMA
response was inadequate to those who disagree
is (27+15)/(24+7) =42/31 = 1.35 to 1. That is,
for every 10 people who disagree, there are
13.5 who agree.
Rates are defined as the number of actual
occurrences of some phenomenon divided by
the number of possible occurrences per some
unit of population.
Ratios and Rates


Example: In the IAEM-NEMA Survey (Local), I
asked how many emergency managers would
rank wildfires as the mostly likely source of
catastrophic disaster in their jurisdiction.
The survey result indicated that eight out of 111
respondents believed this to be true. Expressed
as a rate per 1,000 emergency managers, this
would be (8/111)*1000, or 72.1 emergency
managers per 1000 believe fires to be the most
likely cause of catastrophic disasters in their
jurisdiction.
Frequency Distributions




Tables that summarize the distribution of a
variable by reporting the number of cases
contained in each category of the variables.
Helpful and commonly used ways of organizing
and working with data.
Almost always the first step in any statistical
analysis.
The problem is that the raw data rarely reveals
any consistent pattern. Data must be grouped
to identify patterns.
Frequency Distributions


The categories of the frequency
distribution must be exhaustive and
mutually exclusive. (Each case must be
counted in one and only one category).
Frequency distributions must have a
descriptive title, clearly labeled categories,
percentages, cumulative percentages, and
a report of the total number of cases.
Frequency Distributions - Nominal

Table 1. Type of organization worked for
ADM 612, Leadership, student
Type of Organization
Valid
Public organization
Private organization
Nonprofit organization
Total
Frequency
42
49
11
102
Percent
41.2
48.0
10.8
100.0
Valid Percent
41.2
48.0
10.8
100.0
Cumulative
Percent
41.2
89.2
100.0
Frequency Distributions - Ordinal

Table 2. Percentage of ADM 612 students
agreeing that they or their supervisors were
articulate.
Articulate - Communicates effectively with others.
Valid
Dis agree
Neutral
Agree
Strongly agree
Total
Frequency
7
10
57
28
102
Percent
6.9
9.8
55.9
27.5
100.0
Valid Percent
6.9
9.8
55.9
27.5
100.0
Cumulative
Percent
6.9
16.7
72.5
100.0
Frequency Distributions –
Grouped Interval

Table 3. Years of emergency
management experience – IAEM survey
respondents.
Years of Emergency Management Experience
Valid
Mis sing
Total
0-5
5-10
10-15
15-20
20-25
25-30
30-35
Over 35
Total
Sys tem
Frequency
25
27
13
16
9
6
4
4
104
7
111
Percent
22.5
24.3
11.7
14.4
8.1
5.4
3.6
3.6
93.7
6.3
100.0
Valid Percent
24.0
26.0
12.5
15.4
8.7
5.8
3.8
3.8
100.0
Cumulative
Percent
24.0
50.0
62.5
77.9
86.5
92.3
96.2
100.0
Charts and Graphs



Researcher use charts and graphs to present
their data in ways that are visually more
dramatic than frequency distributions.
Pie charts and bar charts are appropriate for
discrete data at any level of measurement.
Histograms and line charts or frequency
polygons are used for interval and ratio
variables.
Pie Chart - Nominal
Pie Chart - Ordinal
Bar Chart - Nominal
Bar Chart - Ordinal
Histogram
Line Chart
PPA 501 – Analytical Methods
in Administration
Lecture 5b – Measures of
Central Tendency
Introduction

The benefit of frequency distributions,
graphs, and charts is their ability to
summarize the overall shape of a
distribution.
Introduction


To completely summarize a distribution,
however, you need two additional pieces
of information: some idea of the typical or
average case in the distribution and some
idea about how much variety or
heterogeneity there is in the distribution.
The typical case involves measures of
central tendency.
Introduction

The three most common measures of central
tendency are the mode, median, and the mean.




The mode is the most common score.
The median is the middle score.
The mean is the typical score.
If the distribution has a single peak and is
perfectly symmetrical, all three are the same.
Mode




The value that occurs most frequently.
Best used when dealing with nominal level
variables, although it can be used for higher
levels of measurement.
Limitations: some distributions have no mode or
too many modes.
For ordinal and interval-ratio data, the mode
may not be central to the distribution.
Median


Always represents the exact center of a
distribution of scores.
The median is the score of the case where
half of the cases are higher and half of the
cases are lower. If the median family
income is $30,000, half of the families
make less than $30,000 and half make
more.
Median


Before finding the median, the scores
must be arranged in order from lowest to
highest or highest to lowest.
When the number of cases is odd, the
central case is the median [(N+1)/2 case].
Median


When the number of cases is even, the
median is the arithmetic average of the
two central cases [the mean of case N/2
and case (N/2+1)].
The median can be calculated for ordinal
and interval-ratio data.
Percentiles



The median is a subset of a larger group
of positional measures called percentiles.
The median is the 50th percentile (50% of
the scores are lower.
The 25th percentile would mean that 25%
of the scores are lower (and 75% higher).
Percentiles



Deciles divide distribution into ten equal
segments. The score at the first decile
has 10% of the scores lower, the second
decile had 20% of the scores lower, etc.
Quartiles divide the distribution into
quarters.
The second quartile, the fifth decile and
the median are all the same value.
Mean


The calculation of the mean is
straightforward: add the scores and divide
by the number of scores.
Mathematical formula:
X
 X 
i
N
where
X  the mean;
  X   the summation
i
of the scores;
N  the number of scores
Characteristics of the Mean

The mean is the point around which all of
the scores (Xi) cancel out.
 X

i

X 0
The sum of the squared differences from
the mean is smaller than the difference for
any other point.
 X
i
X
  minimum
2
Characteristics of the Mean

Every score in the distribution affects it.



Advantage: the mean utilizes all the available information.
Disadvantage: a few extreme cases can make the mean
misleading.
Relative to the median, the mean is always pulled in
the direction of extreme scores.

Positive skew: mean higher than the median.




Median income 1998: $46,737
Mean income 1998: $59,589
Jerry Seinfeld income 1998: $267,000,000 (Equivalent to
median income of 5,713 families)
Negative skew: mean lower than the median.
Rules for the Selection of
Measures of Central Tendency

Use the mode when:




Variables are measured at the nominal level.
You want a quick and easy measure for ordinal or
interval measures.
You want to report the most common score.
Use the median when:



Variables are measured at the ordinal level.
Variables measured at the interval-ratio level have
highly skewed distributions.
You want to report the central score.
Rules for the Selection of
Measures of Central Tendency

Use the mean when:



Variables are measured at the interval-ratio
level (except for highly skewed distributions).
You want to report the most typical score.
The mean is the fulcrum that exactly balances
all scores.
You anticipate additional statistical analyses.
Example: Mode
Example: Median
Table 5. Median Disaster Intensity, 1953-2005
Action Year Median Disaster Intensity Action Year Median Disaster Intensity Action Year Median Disaster Intensity
1953 2 (Moderate)
1971 1 (Minor)
1989 1 (Minor)
1954 1 (Minor)
1972 1 (Minor)
1990 1 (Minor)
1955 2.5 (Moderate to Major)
1973 1 (Minor)
1991 1 (Minor)
1956 1 (Minor)
1974 1 (Minor)
1992 1 (Minor)
1957 2 (Moderate)
1975 1 (Minor)
1993 2 (Moderate)
1958 2 (Moderate)
1976 1 (Minor)
1994 1 (Minor)
1959 1 (Minor)
1977 1 (Minor)
1995 1 (Minor)
1960 1 (Minor)
1978 1 (Minor)
1996 1 (Minor)
1961 2 (Moderate)
1979 1 (Minor)
1997 1 (Minor)
1962 2 (Moderate)
1980 1 (Minor)
1998 1 (Minor)
1963 2 (Moderate)
1981 1 (Minor)
1999 1 (Minor)
1964 2 (Moderate)
1982 1 (Minor)
2000 1 (Minor)
1965 2 (Moderate)
1983 1 (Minor)
2001 1 (Minor)
1966 1 (Minor)
1984 1 (Minor)
2002 1 (Minor)
1967 2 (Moderate)
1985 1 (Minor)
2003 1 (Minor)
1968 1 (Minor)
1986 1 (Minor)
2004 1 (Minor)
1969 1 (Minor)
1987 1 (Minor)
2005 1 (Minor)
1970 1 (Minor)
1988 1 (Minor)
Total
1 (Minor)
Example: Mean
PPA 501 – Analytical Methods
in Administration
Lecture 5c – Measures of
Dispersion
Introduction



By themselves, measures of central tendency
cannot summarize data completely.
For a full description of a distribution of scores,
measures of central tendency must be paired
with measures of dispersion.
Measures of dispersion assess the variability of
the data. This is true even if the distributions
being compared have the same measures of
central tendency.
Introduction – Example, JCHA
1999
How safe is your community?
How safe is your community?
Trafford
Red Hollow
3.5
3.5
3.0
3.0
2.5
2.5
2.0
2.0
1.5
1.5
1.0
1.0
Std. Dev = 2.67
.5
Mean = 6.8
N = 14.00
0.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
How safe do you feel in your community?
9.0
10.0
Std. Dev = 3.96
.5
Mean = 6.8
N = 7.00
0.0
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
How safe do you feel in your community?
9.0
10.0
Introduction

Measures of dispersion discussed.


The range and interquartile range.
Standard deviation and variance.
Range and Interquartile Range

Range: the distance between the highest and
lowest scores.



Only uses two scores.
Can be misleading if there are extreme values.
Interquartile range: Only examines the middle
50% of the distribution. Formally, it is the
difference between the value at the 75%
percentile minus the value at the 25th percentile.
Range and Interquartile Range

Problems: only based on two scores.
Ignores remaining cases in the
distribution.
Range  Highest  lowest
IQR  Q3 ( P75 )  Q1 ( P25 )
Range and Interquartile Range: FEMA
Disaster Payouts, 1953 to 2005
The Standard Deviation


The basic limitation of both the range and the
IQR is their failure to use all the scores in the
distribution
A good measure of dispersion should



Use all the scores in the distribution.
Describe the average or typical deviation of the
scores.
Increase in value as the distribution of scores
becomes more heterogeneous.
The Standard Deviation



One way to do this is to start with the
distances between every point and some
central value like the mean.
The distances between the scores are the
mean (Xi-Mean X) are called deviation
scores.
The greater the variability, the greater the
deviation score.
The Standard Deviation


One course of action is to sum the
deviations and divide by the number of
cases, but the sum of the deviations is
always equal to zero.
The next solution is to make all deviations
positive.


Absolute value – average deviation.
Squared deviations – standard deviation.
Average and Population Standard
Deviation
Average Deviation
X

AD 
i
X
N
Variance (populatio n)
2

X


i  X

2
N
Standard Deviation (populatio n)

 X
i
X
N

2
Sample Variance and Standard
Deviation
Sample variance
s
2

X


X
i

2
n 1
Sample standard deviation
s
 X
i
X
n 1

2
Computational Variance and
Standard Deviation - Sample
Computatio nal Variance (Sample)

x

x  n
2
2
s2 
n 1
Computatio nal Sample Standard Deviation
s  s2
Examples – JCHA 1999

N
X
Safety (Xi )
10
9
5
5
10
7
10
10
10
5
81
10
8.1
(X
i
 X )
1.9
0.9
-3.1
-3.1
1.9
-1.1
1.9
1.9
1.9
-3.1
0.0
X
i

X
1.9
0.9
3.1
3.1
1.9
1.1
1.9
1.9
1.9
3.1
20.8
( X i  X )2
3.61
0.81
9.61
9.61
3.61
1.21
3.61
3.61
3.61
9.61
48.90
X2
100
81
25
25
100
49
100
100
100
25
705
Examples – Average and
Standard Deviation
AD 
s
2
Xi  X

X

n
X
n 1
i

2

28
 2.8
10
48.9

 5.43
9
s  s 2  5.43  2.33

x

x  n
2
2
s2 
n 1
812
705 
705  656.1 48.9
10



 5.43
9
9
9
s  s 2  5.43  2.33