Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 Term Project Prepared for HSA 523 Health Data Analysis Professor: Dr. Robert Jantzen Prepared by: Sample Student 2 Introduction: Individuals must be accountable for their actions. This being said, numerous factors have often been thought to be contributors to the incidence of crime. Some of these contributors include economic factors such as poverty, social environment and dysfunctional family structure. It has often been thought that societal recognition and management of these factors could reduce the incidence of serious crime in society. The results of this analysis will demonstrate the varying impact of these explanatory variables collectively and independently on the dependent variable of serious crime rate. Literature Review: A brief literature search identified numerous articles which explore the given dependent variable and explanatory variables. The Policy Analysis paper utilizes a weighted two-stage least-squares regression formula to estimate the relationship of crime rate with the independent variables of economic conditions. The economic conditions were measured by the logarithm of average annual income, male unemployment rate, overall employment rate and the poverty rates. Data for this research was provided by the US Department of Justice through two measures including the Uniform Crime Report (UCR) covering 98% of 3 the population and the National Criminal Victimization Survey (NCVS) which surveyed 66 thousand households with a 96% participation rate. These scales summarized the index crimes which included murder, rape, robbery and aggravated assault and index property crimes of burglary, larceny and auto theft. This study indicates a relationship between economic factors and crime. An increase in the average annual income appears to suggest a reduction in violent crimes as well as property crimes. On a negative note, an increase in poverty rate by 1% appears to increase property crime by 3%. Interestingly, a 1% increase in the overall employment rate appeared to result in an increase in the property crime rate of about 3%. It was suggested that this increase is the result of more homes being vacated, without anyone home during the day, resulting in a vulnerability to burglary. The study suggests additional research to explain this surprising finding. Further findings suggest that the economic condition of specific groups, such as gender, is also important. That is, an increase in the male unemployment rate results in an increase on the violent crime rate. Data and Method: In order to obtain data for this analysis, the CITY.SAV data base was utilized to gather information regarding the dependent and explanatory variables. This 4 economic and social data was obtained in 1989-1991 from 1083 large cities in the United States and analyzed by regression analysis through SPSS. Though some could argue that these explanatory variables do overlap as they could be interrelated, we will overlook this argument for the purpose of completing this exercise. In order to better visualize the data, frequency tables and histograms were completed for all. Further data was obtained to better understand and measure the center and variation of the numbers from the mean (if applicable) and how all numbers differ. SERIOUS CRIMES PER 100,000 PERSONS SERIOUS CRIMES NUMBER RELATIVE VALUES CUMULATIVE VALUES 0----------3800 245 22.6% 22.6% >3800--------7600 487 45% 67.6% >7600--------11,400 260 24% 91.6% >11,400------15,200 76 7% 98.6% >15,200------19,000 14 1.3% 99.9% >19,000------22,800 0 0 99.9% >22,800------26,600 0 0 99.9% >26,600------30,400 0 0 99.9% >30,400------34,200 0 0 99.9% >34,200------38,000 1 .1% 100% ________________________________________________________ 5 1083 100% 100% 1991 serious crimes per 100000 persons N Valid Missing 1083 0 Mean 6233.9954 Median 6145.0000 Std. Deviation 3802.10547 Variance 1.446E7 Range 37903.00 Minimum .00 Maximum 37903.00 Percentiles 25 4052.0000 50 6145.0000 75 8436.0000 Each data set was analyzed to determine the distribution of numbers. That is, the information was evaluated to establish how the numbers differed from the mean, from each other, overall shape and symmetry. Utilizing the SPSS program to analyze each data set, the Mean, Median, Standard Deviation, Variance, Range and Percentiles were computed (See attached). As some of the data sets were skewed, some of this information was unable to be utilized. Additional analysis revealed the Interquartile Range and 5 number summaries. This information is as follows: 6 Serious Crimes per 100,000 persons: The Interquartile Range, which captures the middle 50% of numbers, is 4384. The 5 number summary is 0 (min), 4052 (1st Qtr%), 6145 (median), 8436 (3rd Qrt%), 37,903 (max). This summary again demonstrates the distribution of numbers. Utilizing the Pearson Measure of Skewness (PMS) the numbers appear approximately symmetrical as the PMS =.02. Based on this data, we are able to then utilize the mean, standard deviation and variance to describe the distribution of numbers as provided by SPSS (see chart). Furthermore, the Empirical rule was able to be employed which states that 68% of the numbers are within +/- 1 standard deviation of the mean. In addition, 95% of the numbers are within +/- 2 standard deviations of the mean and 99.7% of the numbers are within +/-3 standard deviations of the mean. 7 12 10 8 6 4 0 0. 00 0 11 0 0. 5 0 10 0 0. 0 10 .0 00 95 0 .0 0 90 0 .0 0 85 0 .0 0 80 .0 00 75 0 .0 0 70 0 .0 0 65 0 .0 0 60 0 .0 0 55 0 .0 0 50 0 .0 0 45 .0 00 40 0 .0 0 35 0 .0 0 30 0 .0 0 25 N = 51.00 0 Std. Dev = 1476.43 2 Mean = 5371.6 1991 serious crimes per 100000 persons 8 UNEMPLOYMENT RATE UNEMPLOYMENT NUMBER RATE RELATIVE CUMULATIVE VALUES VALUES 0----------------3 87 8% 8% >3--------------6 434 40% 48.1% >6--------------9 366 34% 81.9% >9--------------12 145 13% 95.3% >12------------15 40 4% 99% >15------------18 11 1% 100% ________________________________________________________ 1083 100% Statistics 1991 unemployment rate N Valid Missing 1083 0 Mean 6.5331 Median 6.1000 Std. Deviation 2.99854 Variance 8.991 Range 17.90 Minimum .00 Maximum 17.90 Percentiles 25 4.6000 50 6.1000 75 8.2000 100% 9 Unemployment Rate: The Interquartile Range for the above data set is 3.6. The 5 number summary is 0 (min), 4.6 (1st qrt%), 6.1 (median), 8.2 (3rd qrt%), 17.90 (max). Utilizing the Pearson Measure of Skewness (PMS), the numbers are skewed as the PMS=.13. As a result, we are not able to utilize the mean and standard deviation and variance as provided by SPSS in the chart but rather the median (6.1) to describe the middle number that divides all of the numbers into equal sized groups, range (17.90) to describe the difference between the maximum and minimum numbers and the IQR (above) to capture the middle 50% of numbers. As these numbers are not symmetrical, Chebyshev’s rule was applied which states that at least 75% of the data is within +/- 2 standard deviations of the mean and at least 89% of the data falls within +/- 3 standard deviations of the mean. 10 10 8 6 4 2 Std. Dev = 1.53 Mean = 6.39 N = 51.00 0 2.50 3.50 3.00 4.50 4.00 5.50 5.00 6.50 6.00 1991 unemployment rate 7.50 7.00 8.50 8.00 9.50 9.00 10.50 10.00 11 PERCENTAGE OF FAMILIES BELOW POVERTY LEVEL % POVERTY NUMBER RELATIVE CUMULATIVE VALUES VALUES 0----------------7 431 39.8% 39.8% >7-------------14 388 35.8% 75.6% >14-----------21 201 18.6% 94.2% >21-----------28 50 4.6% 98.8% >28-----------35 9 .8% 99.6% >35-----------42 4 .4% 100% _________________________________________________________ 1083 100% Statistics 1989 percentage of families below poverty level N Valid Missing 1083 0 Mean 9.8849 Median 8.9000 Std. Deviation 6.51442 Variance 42.438 Range 39.90 Minimum .50 Maximum 40.40 Percentiles 25 4.5000 50 8.9000 75 13.9000 100% 12 Families below Poverty Level: The Interquartile Range for the data set is 9.4. The 5 number summary is .50, 4.5, 8.9, 13.9 and 40.40 respectively. The PMS is equal to .14 which again, indicates an asymmetry in the numbers. We are again unable to utilize the mean, standard deviation and variance. The median (8.9), range (39.90) and IQR (above) are then utilized to describe the distribution of data. Chebyshev’s rule can again be applied to establish how the numbers differ from the mean. 10 8 6 4 2 Std. Dev = 3.55 Mean = 10.0 N = 51.00 0 4.0 6.0 5.0 8.0 7.0 10.0 9.0 12.0 11.0 14.0 13.0 16.0 15.0 18.0 17.0 20.0 19.0 1989 percentage of families below poverty level 13 MEDIAN FAMILY INCOME MEDIAN INCOME NUMBER RELATIVE CUMULATIVE VALUES VALUES 0-----------------$8200 0 0 0 >$8200------------$16,400 2 .2% .2% >$16,400---------$24,600 66 6% 6.3% >$24,600---------$32,800 372 34% 40.6% >$32,800---------$41,000 293 27% 67.7% >$41,000---------$49,200 170 16% 83.4% >$49,200---------$57,400 116 11% 94.1% >$57,400---------$65,600 40 4% 97.8% >$65,600---------$73,800 12 1% 98.9% >$73,800---------$82,000 6 .6% 99.4% >$82,000---------$92,200 5 .5% 99.9% >$92,000---------$98,400 1 .1% 100% ________________________________________________________ 1083 100% 100% 14 Statistics 1989 median family income $ N Valid Missing 1083 0 Mean 37784.3361 Median 34990.0000 Std. Deviation 11476.80043 Variance 1.317E8 Range 81817.00 Minimum 13785.00 Maximum 95602.00 Percentiles 25 29509.0000 50 34990.0000 75 43723.0000 Median Family Income: The Interquartile Range for the data set is 14,214. This number describes how the middle 50% of numbers differ. The 5 number summary is 13785, 29509, 34990, 43723, 95602. The PMS is .24 which indicates that the numbers are skewed. We then must use the median (34,990) to describe the middle number that divides all of the numbers into two equal sized groups, the range (81,817) to describe the difference between the maximum and minimum numbers and the IQR (above). 15 Chebyshev’s rule can be applied to establish how at least 75% and 89% the numbers differ from the mean. 12 10 8 6 4 2 Std. Dev = 5999.55 Mean = 34370.0 N = 51.00 0 0. 00 50 0.0 00 48 0.0 00 46 .0 0 00 44 0.0 00 42 0.0 00 40 .0 0 00 38 0.0 00 36 0.0 00 34 .0 0 00 32 0.0 00 30 0.0 00 28 .0 0 00 26 0.0 00 24 0 1989 median family income $ The Regression Analysis was utilized to quantify the relationship between the dependent variable of Serious Crime rate per 100,000 persons with the 16 explanatory variables of unemployment rate, percentage of families below poverty level and median family income. SPSS was utilized to compute this data with the results as follows: Variables Entered/Removedb Variables Variables Model Entered Removed 1 1989 median Method family income $, 1991 unemployment . Enter rate, 1989 percentage of families below poverty levela a. All requested variables entered. b. Dependent Variable: 1991 serious crimes per 100000 persons Model Summary Model 1 R R Square .472a .223 Adjusted R Std. Error of the Square Estimate .220 3357.08120 a. Predictors: (Constant), 1989 median family income $, 1991 unemployment rate, 1989 percentage of families below poverty level 17 ANOVAb Model Sum of Squares 1 Regression df Mean Square F 3.481E9 3 1.160E9 Residual 1.216E10 1079 1.127E7 Total 1.564E10 1082 Sig. .000a 102.960 a. Predictors: (Constant), 1989 median family income $, 1991 unemployment rate, 1989 percentage of families below poverty level b. Dependent Variable: 1991 serious crimes per 100000 persons Coefficientsa Standardized Unstandardized Coefficients Model B 1(Constant) 6654.989 809.032 -72.774 40.436 211.250 -.054 1991 unemployment rate 1989 percentage of families below poverty level 1989 median family income $ Std. Error Coefficients Beta 95% Confidence Interval for B t Sig. Lower Bound Upper Bound 8.226 .000 5067.535 8242.443 -.057 -1.800 .072 -152.116 6.569 26.636 .362 7.931 .000 158.986 263.514 .015 -.162 -3.692 .000 -.082 -.025 a. Dependent Variable: 1991 serious crimes per 100000 persons Based on the above results utilizing the city.sav data, the following can be said regarding the relationship between the given explanatory variables and the dependent variable. That is, this is an attempt to quantify the relationship between the dependent variable and the explainers. Firstly, the measurement of 18 the proportion of the behavior of the dependent variable that depends on the explainers is measured by the R squared value. The R squared value and the Adjusted R squared value is essentially the same which would result in utilization of the R squared value. The predictive power of this model indicates how much (22%) of the differences in the dependent variable are based on the unemployment rate, % of families below the poverty level and the mean family income. This does not suggest a strong relationship between the given explanatory variables and dependent variable as then, 78% of the differences in serious crimes is not impacted by the aforementioned explanatory variables. The Constant 6654 shows what the dependent variable would be if all the explainers are at zero. The unstandardized coefficients explain how much the dependent variable would change if the explainers change by 1 unit. The dependent variable of serious crimes per 100,000 would decrease by -73 (less 1%) if the unemployment rate changes by 1 unit, an increase of 211 (3%) crimes would result if the percentage of families below the poverty level increased by 1 unit and a -.054 (0%) change in the number of serious crimes would result with a 1 unit change in median family income. 19 The 95% confidence intervals were computed for the regression coefficients. These numbers suggest the population regression coefficient is within a given range for the explainers with 95% confidence. This data suggests that there is a chance that there is little or no impact on the dependent variable by the unemployment rate as the lower and upper bounds include zero. There then is the possibility of there being no effect of the unemployment explainer variable on this dependent variable requiring additional evidence to prove our theory of there being a relationship. In percentage of families below poverty level, we are 95% confident that the change in number of serious crimes per 100,000 could be between 159 and 263 with a 1% increase in families below poverty level. A percent increase in the median family income could impact the incidence of crime between -.082 and -.025. The standardized coefficients demonstrate the standard deviation change in the dependent variable secondary to a one standard deviation change in the explanatory variables. We are able to establish the impact of the individual explanatory variables on the dependent variable and we are also able to establish which variable has the greatest and least impact on the dependent variable. Based on our data, a 1 standard deviation change in the unemployment rate would have an -.057 standard deviation change in the incidence of serious crime, 20 1 standard deviation in the % of families below the poverty level would have a .362 standard deviation change on the incidence of serious crime and 1 standard deviation change in family income would have a -.162 standard deviation change in the serious crime rate. Based on this data, we are able to determine that the explanatory variable of percentage of families below the poverty level could have the largest impact on the dependent variable with unemployment rate having the least impact on the incidence of crime. Summary Based on this data and research it appears as though all three explainers have varying impact on the incidence of crime rate, while collectively, the proportion of the behavior of the dependent variable that depends on the explainers is only 22%. Individually, percentage of families below poverty level appears to demonstrate the largest impact on the serious crime rate, with the unemployment rate having a smaller impact on the crime rate and median family income having the least impact on the serious crime rate per 100,000 persons. Future research could be focused on poverty rates as this appears to demonstrate the greatest impact on the incidence of crime. Increasing poverty rates in a community may also impact law enforcement practices as well as establishing 21 social programs to combat the growing rates of individuals and families falling below the poverty level. 22 References: Policy Analysis: Crime, Police, and Root Causes. (1994). Retrieved October 29, 2008 from http://www.cato.org/pubs/pas/pa-218.html