Download sampleproject

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Categorical variable wikipedia , lookup

Transcript
1
Term Project
Prepared for HSA 523
Health Data Analysis
Professor: Dr. Robert Jantzen
Prepared by:
Sample Student
2
Introduction:
Individuals must be accountable for their actions. This being said, numerous
factors have often been thought to be contributors to the incidence of crime.
Some of these contributors include economic factors such as poverty, social
environment and dysfunctional family structure. It has often been thought that
societal recognition and management of these factors could reduce the incidence
of serious crime in society. The results of this analysis will demonstrate the
varying impact of these explanatory variables collectively and independently on
the dependent variable of serious crime rate.
Literature Review:
A brief literature search identified numerous articles which explore the
given dependent variable and explanatory variables. The Policy Analysis paper
utilizes a weighted two-stage least-squares regression formula to estimate the
relationship of crime rate with the independent variables of economic conditions.
The economic conditions were measured by the logarithm of average annual
income, male unemployment rate, overall employment rate and the poverty
rates. Data for this research was provided by the US Department of Justice
through two measures including the Uniform Crime Report (UCR) covering 98% of
3
the population and the National Criminal Victimization Survey (NCVS) which
surveyed 66 thousand households with a 96% participation rate. These scales
summarized the index crimes which included murder, rape, robbery and
aggravated assault and index property crimes of burglary, larceny and auto theft.
This study indicates a relationship between economic factors and crime.
An increase in the average annual income appears to suggest a reduction in
violent crimes as well as property crimes. On a negative note, an increase in
poverty rate by 1% appears to increase property crime by 3%. Interestingly, a 1%
increase in the overall employment rate appeared to result in an increase in the
property crime rate of about 3%. It was suggested that this increase is the result
of more homes being vacated, without anyone home during the day, resulting in a
vulnerability to burglary. The study suggests additional research to explain this
surprising finding. Further findings suggest that the economic condition of
specific groups, such as gender, is also important. That is, an increase in the male
unemployment rate results in an increase on the violent crime rate.
Data and Method:
In order to obtain data for this analysis, the CITY.SAV data base was utilized
to gather information regarding the dependent and explanatory variables. This
4
economic and social data was obtained in 1989-1991 from 1083 large cities in the
United States and analyzed by regression analysis through SPSS. Though some
could argue that these explanatory variables do overlap as they could be
interrelated, we will overlook this argument for the purpose of completing this
exercise. In order to better visualize the data, frequency tables and histograms
were completed for all. Further data was obtained to better understand and
measure the center and variation of the numbers from the mean (if applicable)
and how all numbers differ.
SERIOUS CRIMES PER 100,000 PERSONS
SERIOUS CRIMES
NUMBER
RELATIVE
VALUES
CUMULATIVE
VALUES
0----------3800
245
22.6%
22.6%
>3800--------7600
487
45%
67.6%
>7600--------11,400
260
24%
91.6%
>11,400------15,200
76
7%
98.6%
>15,200------19,000
14
1.3%
99.9%
>19,000------22,800
0
0
99.9%
>22,800------26,600
0
0
99.9%
>26,600------30,400
0
0
99.9%
>30,400------34,200
0
0
99.9%
>34,200------38,000
1
.1%
100%
________________________________________________________
5
1083
100%
100%
1991 serious crimes per 100000 persons
N
Valid
Missing
1083
0
Mean
6233.9954
Median
6145.0000
Std. Deviation
3802.10547
Variance
1.446E7
Range
37903.00
Minimum
.00
Maximum
37903.00
Percentiles
25
4052.0000
50
6145.0000
75
8436.0000
Each data set was analyzed to determine the distribution of numbers. That
is, the information was evaluated to establish how the numbers differed from the
mean, from each other, overall shape and symmetry. Utilizing the SPSS program
to analyze each data set, the Mean, Median, Standard Deviation, Variance, Range
and Percentiles were computed (See attached). As some of the data sets were
skewed, some of this information was unable to be utilized. Additional analysis
revealed the Interquartile Range and 5 number summaries. This information is as
follows:
6
Serious Crimes per 100,000 persons:
The Interquartile Range, which captures the middle 50% of numbers, is 4384. The
5 number summary is 0 (min), 4052 (1st Qtr%), 6145 (median), 8436 (3rd Qrt%),
37,903 (max). This summary again demonstrates the distribution of numbers.
Utilizing the Pearson Measure of Skewness (PMS) the numbers appear
approximately symmetrical as the PMS =.02. Based on this data, we are able to
then utilize the mean, standard deviation and variance to describe the
distribution of numbers as provided by SPSS (see chart). Furthermore, the
Empirical rule was able to be employed which states that 68% of the numbers are
within +/- 1 standard deviation of the mean. In addition, 95% of the numbers are
within +/- 2 standard deviations of the mean and 99.7% of the numbers are within
+/-3 standard deviations of the mean.
7
12
10
8
6
4
0
0.
00 0
11 0 0.
5 0
10 0 0.
0
10 .0
00
95 0 .0
0
90 0 .0
0
85 0 .0
0
80 .0
00
75 0 .0
0
70 0 .0
0
65 0 .0
0
60 0 .0
0
55 0 .0
0
50 0 .0
0
45 .0
00
40 0 .0
0
35 0 .0
0
30 0 .0
0
25
N = 51.00
0
Std. Dev = 1476.43
2
Mean = 5371.6
1991 serious crimes per 100000 persons
8
UNEMPLOYMENT RATE
UNEMPLOYMENT
NUMBER
RATE
RELATIVE
CUMULATIVE
VALUES
VALUES
0----------------3
87
8%
8%
>3--------------6
434
40%
48.1%
>6--------------9
366
34%
81.9%
>9--------------12
145
13%
95.3%
>12------------15
40
4%
99%
>15------------18
11
1%
100%
________________________________________________________
1083
100%
Statistics
1991 unemployment rate
N
Valid
Missing
1083
0
Mean
6.5331
Median
6.1000
Std. Deviation
2.99854
Variance
8.991
Range
17.90
Minimum
.00
Maximum
17.90
Percentiles
25
4.6000
50
6.1000
75
8.2000
100%
9
Unemployment Rate:
The Interquartile Range for the above data set is 3.6. The 5 number summary is 0
(min), 4.6 (1st qrt%), 6.1 (median), 8.2 (3rd qrt%), 17.90 (max). Utilizing the
Pearson Measure of Skewness (PMS), the numbers are skewed as the PMS=.13.
As a result, we are not able to utilize the mean and standard deviation and
variance as provided by SPSS in the chart but rather the median (6.1) to describe
the middle number that divides all of the numbers into equal sized groups, range
(17.90) to describe the difference between the maximum and minimum numbers
and the IQR (above) to capture the middle 50% of numbers. As these numbers are
not symmetrical, Chebyshev’s rule was applied which states that at least 75% of
the data is within +/- 2 standard deviations of the mean and at least 89% of the
data falls within +/- 3 standard deviations of the mean.
10
10
8
6
4
2
Std. Dev = 1.53
Mean = 6.39
N = 51.00
0
2.50
3.50
3.00
4.50
4.00
5.50
5.00
6.50
6.00
1991 unemployment rate
7.50
7.00
8.50
8.00
9.50
9.00
10.50
10.00
11
PERCENTAGE OF FAMILIES BELOW POVERTY LEVEL
% POVERTY
NUMBER
RELATIVE
CUMULATIVE
VALUES
VALUES
0----------------7
431
39.8%
39.8%
>7-------------14
388
35.8%
75.6%
>14-----------21
201
18.6%
94.2%
>21-----------28
50
4.6%
98.8%
>28-----------35
9
.8%
99.6%
>35-----------42
4
.4%
100%
_________________________________________________________
1083
100%
Statistics
1989 percentage of families below poverty level
N
Valid
Missing
1083
0
Mean
9.8849
Median
8.9000
Std. Deviation
6.51442
Variance
42.438
Range
39.90
Minimum
.50
Maximum
40.40
Percentiles
25
4.5000
50
8.9000
75
13.9000
100%
12
Families below Poverty Level:
The Interquartile Range for the data set is 9.4. The 5 number summary is .50, 4.5,
8.9, 13.9 and 40.40 respectively. The PMS is equal to .14 which again, indicates
an asymmetry in the numbers. We are again unable to utilize the mean, standard
deviation and variance. The median (8.9), range (39.90) and IQR (above) are then
utilized to describe the distribution of data. Chebyshev’s rule can again be
applied to establish how the numbers differ from the mean.
10
8
6
4
2
Std. Dev = 3.55
Mean = 10.0
N = 51.00
0
4.0
6.0
5.0
8.0
7.0
10.0
9.0
12.0
11.0
14.0
13.0
16.0
15.0
18.0
17.0
20.0
19.0
1989 percentage of families below poverty level
13
MEDIAN FAMILY INCOME
MEDIAN INCOME
NUMBER
RELATIVE
CUMULATIVE
VALUES
VALUES
0-----------------$8200
0
0
0
>$8200------------$16,400
2
.2%
.2%
>$16,400---------$24,600
66
6%
6.3%
>$24,600---------$32,800
372
34%
40.6%
>$32,800---------$41,000
293
27%
67.7%
>$41,000---------$49,200
170
16%
83.4%
>$49,200---------$57,400
116
11%
94.1%
>$57,400---------$65,600
40
4%
97.8%
>$65,600---------$73,800
12
1%
98.9%
>$73,800---------$82,000
6
.6%
99.4%
>$82,000---------$92,200
5
.5%
99.9%
>$92,000---------$98,400
1
.1%
100%
________________________________________________________
1083
100%
100%
14
Statistics
1989 median family income $
N
Valid
Missing
1083
0
Mean
37784.3361
Median
34990.0000
Std. Deviation
11476.80043
Variance
1.317E8
Range
81817.00
Minimum
13785.00
Maximum
95602.00
Percentiles
25
29509.0000
50
34990.0000
75
43723.0000
Median Family Income:
The Interquartile Range for the data set is 14,214. This number describes how the
middle 50% of numbers differ. The 5 number summary is 13785, 29509, 34990,
43723, 95602. The PMS is .24 which indicates that the numbers are skewed. We
then must use the median (34,990) to describe the middle number that divides all
of the numbers into two equal sized groups, the range (81,817) to describe the
difference between the maximum and minimum numbers and the IQR (above).
15
Chebyshev’s rule can be applied to establish how at least 75% and 89% the
numbers differ from the mean.
12
10
8
6
4
2
Std. Dev = 5999.55
Mean = 34370.0
N = 51.00
0
0.
00
50 0.0
00
48 0.0
00
46 .0
0
00
44 0.0
00
42 0.0
00
40 .0
0
00
38 0.0
00
36 0.0
00
34 .0
0
00
32 0.0
00
30 0.0
00
28 .0
0
00
26 0.0
00
24
0
1989 median family income $
The Regression Analysis was utilized to quantify the relationship between
the dependent variable of Serious Crime rate per 100,000 persons with the
16
explanatory variables of unemployment rate, percentage of families below
poverty level and median family income. SPSS was utilized to compute this data
with the results as follows:
Variables Entered/Removedb
Variables
Variables
Model
Entered
Removed
1
1989 median
Method
family income $,
1991
unemployment
. Enter
rate, 1989
percentage of
families below
poverty levela
a. All requested variables entered.
b. Dependent Variable: 1991 serious crimes per 100000
persons
Model Summary
Model
1
R
R Square
.472a
.223
Adjusted R
Std. Error of the
Square
Estimate
.220
3357.08120
a. Predictors: (Constant), 1989 median family income $, 1991
unemployment rate, 1989 percentage of families below poverty level
17
ANOVAb
Model
Sum of Squares
1
Regression
df
Mean Square
F
3.481E9
3
1.160E9
Residual
1.216E10
1079
1.127E7
Total
1.564E10
1082
Sig.
.000a
102.960
a. Predictors: (Constant), 1989 median family income $, 1991 unemployment rate, 1989 percentage
of families below poverty level
b. Dependent Variable: 1991 serious crimes per 100000 persons
Coefficientsa
Standardized
Unstandardized Coefficients
Model
B
1(Constant)
6654.989
809.032
-72.774
40.436
211.250
-.054
1991 unemployment rate
1989 percentage of families
below poverty level
1989 median family income $
Std. Error
Coefficients
Beta
95% Confidence Interval for B
t
Sig.
Lower Bound
Upper Bound
8.226
.000
5067.535
8242.443
-.057
-1.800
.072
-152.116
6.569
26.636
.362
7.931
.000
158.986
263.514
.015
-.162
-3.692
.000
-.082
-.025
a. Dependent Variable: 1991 serious crimes per 100000 persons
Based on the above results utilizing the city.sav data, the following can be
said regarding the relationship between the given explanatory variables and the
dependent variable. That is, this is an attempt to quantify the relationship
between the dependent variable and the explainers. Firstly, the measurement of
18
the proportion of the behavior of the dependent variable that depends on the
explainers is measured by the R squared value. The R squared value and the
Adjusted R squared value is essentially the same which would result in utilization
of the R squared value. The predictive power of this model indicates how much
(22%) of the differences in the dependent variable are based on the
unemployment rate, % of families below the poverty level and the mean family
income. This does not suggest a strong relationship between the given
explanatory variables and dependent variable as then, 78% of the differences in
serious crimes is not impacted by the aforementioned explanatory variables.
The Constant 6654 shows what the dependent variable would be if all the
explainers are at zero. The unstandardized coefficients explain how much the
dependent variable would change if the explainers change by 1 unit. The
dependent variable of serious crimes per 100,000 would decrease by -73 (less 1%)
if the unemployment rate changes by 1 unit, an increase of 211 (3%) crimes would
result if the percentage of families below the poverty level increased by 1 unit and
a -.054 (0%) change in the number of serious crimes would result with a 1 unit
change in median family income.
19
The 95% confidence intervals were computed for the regression
coefficients. These numbers suggest the population regression coefficient is
within a given range for the explainers with 95% confidence. This data suggests
that there is a chance that there is little or no impact on the dependent variable
by the unemployment rate as the lower and upper bounds include zero. There
then is the possibility of there being no effect of the unemployment explainer
variable on this dependent variable requiring additional evidence to prove our
theory of there being a relationship. In percentage of families below poverty
level, we are 95% confident that the change in number of serious crimes per
100,000 could be between 159 and 263 with a 1% increase in families below
poverty level. A percent increase in the median family income could impact the
incidence of crime between -.082 and -.025.
The standardized coefficients demonstrate the standard deviation change
in the dependent variable secondary to a one standard deviation change in the
explanatory variables. We are able to establish the impact of the individual
explanatory variables on the dependent variable and we are also able to establish
which variable has the greatest and least impact on the dependent variable.
Based on our data, a 1 standard deviation change in the unemployment rate
would have an -.057 standard deviation change in the incidence of serious crime,
20
1 standard deviation in the % of families below the poverty level would have a
.362 standard deviation change on the incidence of serious crime and 1 standard
deviation change in family income would have a -.162 standard deviation change
in the serious crime rate. Based on this data, we are able to determine that the
explanatory variable of percentage of families below the poverty level could have
the largest impact on the dependent variable with unemployment rate having the
least impact on the incidence of crime.
Summary
Based on this data and research it appears as though all three explainers
have varying impact on the incidence of crime rate, while collectively, the
proportion of the behavior of the dependent variable that depends on the
explainers is only 22%. Individually, percentage of families below poverty level
appears to demonstrate the largest impact on the serious crime rate, with the
unemployment rate having a smaller impact on the crime rate and median family
income having the least impact on the serious crime rate per 100,000 persons.
Future research could be focused on poverty rates as this appears to demonstrate
the greatest impact on the incidence of crime. Increasing poverty rates in a
community may also impact law enforcement practices as well as establishing
21
social programs to combat the growing rates of individuals and families falling
below the poverty level.
22
References:
Policy Analysis: Crime, Police, and Root Causes. (1994). Retrieved October 29,
2008 from http://www.cato.org/pubs/pas/pa-218.html