Survey

Survey

Document related concepts

Transcript

PPA 501 – Analytical Methods in Administration Lecture 5a - Counting and Charting Responses Percentages and Proportions Percentages and proportions supply a frame of reference for reporting research results by standardizing the raw data: percentages by base 100 and proportions by base 1.00. f Proportion (p) N f Percentage (%) *100 N Percentages and Proportions Example from IAEM-NEMA Survey, 2006. Problems with the government response to Hurricane Katrina arose largely because of inadequate leadership and management of the crisis by FEMA. Valid Mis sing Total Strongly disagree Dis agree Neutral Agree Strongly agree Total Sys tem Frequency 7 24 26 27 15 99 12 111 Percent 6.3 21.6 23.4 24.3 13.5 89.2 10.8 100.0 Valid Percent 7.1 24.2 26.3 27.3 15.2 100.0 Cumulative Percent 7.1 31.3 57.6 84.8 100.0 Percentages and Proportions Guidelines. When working with a small number of cases, report the actual frequencies. Always report the number of observations along with proportions and percentages. Proportions and percentages can be used for any level of measurement. Percentage Change f 2 f1 100 Percentage change f1 where f1 first score, frequency, or value at time 1 f 2 second score, frequency, or value at time 2 Percentage Change Example Percentage Change Example 57.84% 50.18% % change 1958 - 1964 15.27% 50.18% 61.26% 57.84% % change 1964 - 1966 5.91% 57.84% 46.12% 61.26% % change 1966 - 1968 24.71% 61.26% 36.82% 46.12% % change 1968 - 1970 20.16% 46.12% 34.20% 36.82% % change 1970 - 1972 7.12% 36.82% Ratios and Rates We determine ratios by dividing the frequency of one category by another. Problems with the government response to Hurricane Katrina arose largely because of inadequate leadership and management of the crisis by FEMA. Valid Mis sing Total Strongly disagree Dis agree Neutral Agree Strongly agree Total Sys tem Frequency 7 24 26 27 15 99 12 111 Percent 6.3 21.6 23.4 24.3 13.5 89.2 10.8 100.0 Valid Percent 7.1 24.2 26.3 27.3 15.2 100.0 Cumulative Percent 7.1 31.3 57.6 84.8 100.0 Ratios and Rates The ratio of people who agree that the FEMA response was inadequate to those who disagree is (27+15)/(24+7) =42/31 = 1.35 to 1. That is, for every 10 people who disagree, there are 13.5 who agree. Rates are defined as the number of actual occurrences of some phenomenon divided by the number of possible occurrences per some unit of population. Ratios and Rates Example: In the IAEM-NEMA Survey (Local), I asked how many emergency managers would rank wildfires as the mostly likely source of catastrophic disaster in their jurisdiction. The survey result indicated that eight out of 111 respondents believed this to be true. Expressed as a rate per 1,000 emergency managers, this would be (8/111)*1000, or 72.1 emergency managers per 1000 believe fires to be the most likely cause of catastrophic disasters in their jurisdiction. Frequency Distributions Tables that summarize the distribution of a variable by reporting the number of cases contained in each category of the variables. Helpful and commonly used ways of organizing and working with data. Almost always the first step in any statistical analysis. The problem is that the raw data rarely reveals any consistent pattern. Data must be grouped to identify patterns. Frequency Distributions The categories of the frequency distribution must be exhaustive and mutually exclusive. (Each case must be counted in one and only one category). Frequency distributions must have a descriptive title, clearly labeled categories, percentages, cumulative percentages, and a report of the total number of cases. Frequency Distributions - Nominal Table 1. Type of organization worked for ADM 612, Leadership, student Type of Organization Valid Public organization Private organization Nonprofit organization Total Frequency 42 49 11 102 Percent 41.2 48.0 10.8 100.0 Valid Percent 41.2 48.0 10.8 100.0 Cumulative Percent 41.2 89.2 100.0 Frequency Distributions - Ordinal Table 2. Percentage of ADM 612 students agreeing that they or their supervisors were articulate. Articulate - Communicates effectively with others. Valid Dis agree Neutral Agree Strongly agree Total Frequency 7 10 57 28 102 Percent 6.9 9.8 55.9 27.5 100.0 Valid Percent 6.9 9.8 55.9 27.5 100.0 Cumulative Percent 6.9 16.7 72.5 100.0 Frequency Distributions – Grouped Interval Table 3. Years of emergency management experience – IAEM survey respondents. Years of Emergency Management Experience Valid Mis sing Total 0-5 5-10 10-15 15-20 20-25 25-30 30-35 Over 35 Total Sys tem Frequency 25 27 13 16 9 6 4 4 104 7 111 Percent 22.5 24.3 11.7 14.4 8.1 5.4 3.6 3.6 93.7 6.3 100.0 Valid Percent 24.0 26.0 12.5 15.4 8.7 5.8 3.8 3.8 100.0 Cumulative Percent 24.0 50.0 62.5 77.9 86.5 92.3 96.2 100.0 Charts and Graphs Researcher use charts and graphs to present their data in ways that are visually more dramatic than frequency distributions. Pie charts and bar charts are appropriate for discrete data at any level of measurement. Histograms and line charts or frequency polygons are used for interval and ratio variables. Pie Chart - Nominal Pie Chart - Ordinal Bar Chart - Nominal Bar Chart - Ordinal Histogram Line Chart PPA 501 – Analytical Methods in Administration Lecture 5b – Measures of Central Tendency Introduction The benefit of frequency distributions, graphs, and charts is their ability to summarize the overall shape of a distribution. Introduction To completely summarize a distribution, however, you need two additional pieces of information: some idea of the typical or average case in the distribution and some idea about how much variety or heterogeneity there is in the distribution. The typical case involves measures of central tendency. Introduction The three most common measures of central tendency are the mode, median, and the mean. The mode is the most common score. The median is the middle score. The mean is the typical score. If the distribution has a single peak and is perfectly symmetrical, all three are the same. Mode The value that occurs most frequently. Best used when dealing with nominal level variables, although it can be used for higher levels of measurement. Limitations: some distributions have no mode or too many modes. For ordinal and interval-ratio data, the mode may not be central to the distribution. Median Always represents the exact center of a distribution of scores. The median is the score of the case where half of the cases are higher and half of the cases are lower. If the median family income is $30,000, half of the families make less than $30,000 and half make more. Median Before finding the median, the scores must be arranged in order from lowest to highest or highest to lowest. When the number of cases is odd, the central case is the median [(N+1)/2 case]. Median When the number of cases is even, the median is the arithmetic average of the two central cases [the mean of case N/2 and case (N/2+1)]. The median can be calculated for ordinal and interval-ratio data. Percentiles The median is a subset of a larger group of positional measures called percentiles. The median is the 50th percentile (50% of the scores are lower. The 25th percentile would mean that 25% of the scores are lower (and 75% higher). Percentiles Deciles divide distribution into ten equal segments. The score at the first decile has 10% of the scores lower, the second decile had 20% of the scores lower, etc. Quartiles divide the distribution into quarters. The second quartile, the fifth decile and the median are all the same value. Mean The calculation of the mean is straightforward: add the scores and divide by the number of scores. Mathematical formula: X X i N where X the mean; X the summation i of the scores; N the number of scores Characteristics of the Mean The mean is the point around which all of the scores (Xi) cancel out. X i X 0 The sum of the squared differences from the mean is smaller than the difference for any other point. X i X minimum 2 Characteristics of the Mean Every score in the distribution affects it. Advantage: the mean utilizes all the available information. Disadvantage: a few extreme cases can make the mean misleading. Relative to the median, the mean is always pulled in the direction of extreme scores. Positive skew: mean higher than the median. Median income 1998: $46,737 Mean income 1998: $59,589 Jerry Seinfeld income 1998: $267,000,000 (Equivalent to median income of 5,713 families) Negative skew: mean lower than the median. Rules for the Selection of Measures of Central Tendency Use the mode when: Variables are measured at the nominal level. You want a quick and easy measure for ordinal or interval measures. You want to report the most common score. Use the median when: Variables are measured at the ordinal level. Variables measured at the interval-ratio level have highly skewed distributions. You want to report the central score. Rules for the Selection of Measures of Central Tendency Use the mean when: Variables are measured at the interval-ratio level (except for highly skewed distributions). You want to report the most typical score. The mean is the fulcrum that exactly balances all scores. You anticipate additional statistical analyses. Example: Mode Example: Median Table 5. Median Disaster Intensity, 1953-2005 Action Year Median Disaster Intensity Action Year Median Disaster Intensity Action Year Median Disaster Intensity 1953 2 (Moderate) 1971 1 (Minor) 1989 1 (Minor) 1954 1 (Minor) 1972 1 (Minor) 1990 1 (Minor) 1955 2.5 (Moderate to Major) 1973 1 (Minor) 1991 1 (Minor) 1956 1 (Minor) 1974 1 (Minor) 1992 1 (Minor) 1957 2 (Moderate) 1975 1 (Minor) 1993 2 (Moderate) 1958 2 (Moderate) 1976 1 (Minor) 1994 1 (Minor) 1959 1 (Minor) 1977 1 (Minor) 1995 1 (Minor) 1960 1 (Minor) 1978 1 (Minor) 1996 1 (Minor) 1961 2 (Moderate) 1979 1 (Minor) 1997 1 (Minor) 1962 2 (Moderate) 1980 1 (Minor) 1998 1 (Minor) 1963 2 (Moderate) 1981 1 (Minor) 1999 1 (Minor) 1964 2 (Moderate) 1982 1 (Minor) 2000 1 (Minor) 1965 2 (Moderate) 1983 1 (Minor) 2001 1 (Minor) 1966 1 (Minor) 1984 1 (Minor) 2002 1 (Minor) 1967 2 (Moderate) 1985 1 (Minor) 2003 1 (Minor) 1968 1 (Minor) 1986 1 (Minor) 2004 1 (Minor) 1969 1 (Minor) 1987 1 (Minor) 2005 1 (Minor) 1970 1 (Minor) 1988 1 (Minor) Total 1 (Minor) Example: Mean PPA 501 – Analytical Methods in Administration Lecture 5c – Measures of Dispersion Introduction By themselves, measures of central tendency cannot summarize data completely. For a full description of a distribution of scores, measures of central tendency must be paired with measures of dispersion. Measures of dispersion assess the variability of the data. This is true even if the distributions being compared have the same measures of central tendency. Introduction – Example, JCHA 1999 How safe is your community? How safe is your community? Trafford Red Hollow 3.5 3.5 3.0 3.0 2.5 2.5 2.0 2.0 1.5 1.5 1.0 1.0 Std. Dev = 2.67 .5 Mean = 6.8 N = 14.00 0.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 How safe do you feel in your community? 9.0 10.0 Std. Dev = 3.96 .5 Mean = 6.8 N = 7.00 0.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 How safe do you feel in your community? 9.0 10.0 Introduction Measures of dispersion discussed. The range and interquartile range. Standard deviation and variance. Range and Interquartile Range Range: the distance between the highest and lowest scores. Only uses two scores. Can be misleading if there are extreme values. Interquartile range: Only examines the middle 50% of the distribution. Formally, it is the difference between the value at the 75% percentile minus the value at the 25th percentile. Range and Interquartile Range Problems: only based on two scores. Ignores remaining cases in the distribution. Range Highest lowest IQR Q3 ( P75 ) Q1 ( P25 ) Range and Interquartile Range: FEMA Disaster Payouts, 1953 to 2005 The Standard Deviation The basic limitation of both the range and the IQR is their failure to use all the scores in the distribution A good measure of dispersion should Use all the scores in the distribution. Describe the average or typical deviation of the scores. Increase in value as the distribution of scores becomes more heterogeneous. The Standard Deviation One way to do this is to start with the distances between every point and some central value like the mean. The distances between the scores are the mean (Xi-Mean X) are called deviation scores. The greater the variability, the greater the deviation score. The Standard Deviation One course of action is to sum the deviations and divide by the number of cases, but the sum of the deviations is always equal to zero. The next solution is to make all deviations positive. Absolute value – average deviation. Squared deviations – standard deviation. Average and Population Standard Deviation Average Deviation X AD i X N Variance (populatio n) 2 X i X 2 N Standard Deviation (populatio n) X i X N 2 Sample Variance and Standard Deviation Sample variance s 2 X X i 2 n 1 Sample standard deviation s X i X n 1 2 Computational Variance and Standard Deviation - Sample Computatio nal Variance (Sample) x x n 2 2 s2 n 1 Computatio nal Sample Standard Deviation s s2 Examples – JCHA 1999 N X Safety (Xi ) 10 9 5 5 10 7 10 10 10 5 81 10 8.1 (X i X ) 1.9 0.9 -3.1 -3.1 1.9 -1.1 1.9 1.9 1.9 -3.1 0.0 X i X 1.9 0.9 3.1 3.1 1.9 1.1 1.9 1.9 1.9 3.1 20.8 ( X i X )2 3.61 0.81 9.61 9.61 3.61 1.21 3.61 3.61 3.61 9.61 48.90 X2 100 81 25 25 100 49 100 100 100 25 705 Examples – Average and Standard Deviation AD s 2 Xi X X n X n 1 i 2 28 2.8 10 48.9 5.43 9 s s 2 5.43 2.33 x x n 2 2 s2 n 1 812 705 705 656.1 48.9 10 5.43 9 9 9 s s 2 5.43 2.33