Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Incapacitation, Recidivism and Predicting Behavior Easha Anand Intro. To Data Mining April 24, 2007 Background Crime Control Act of 1984 and USSC Idea in U.S. is deterrence, rather than punishment Tending toward formulae—USPC in D.C. uses 14 variables U.S. prison pop. topped 2 million, parole/probation topped 7 million Strategies for Incapacitation Charge-based Historically the case; most USSC guidelines Selective USPC and D.C. Code offenders—based on individual’s characteristics New research focuses on “criminal career” and predicting patterns therein (participation, frequency, seriousness, length, patterning) Rationale The tendency is toward objective decision-making processes to improve accuracy. More and more variables codified as we can track offenders. Sophistication of statistical methods used to combine predictors seems to be relevant to outcomes. The Dataset 6,000 men incarcerated in the 1960s, chosen at random Collected life history info, official institutional record, inmate questionnaire, psychological tests 26 years later, followed up with Bureau of Criminal Statistics Offenses characterized along six dimensions: Nuisance, physical harm, property damage, drugs, fraud, crimes against social order Used 4,897 records Dataset (cont’d) Original Offense Final Dataset Unusable Died Burglary Other Purged Armed Robbery Forgery Homicide Narcotics Other Violent Offenses Usable Problems With Data Dichotomous dependent variable for behavior? Purging = potential bias Done after age 70 OR When 10 years arrest-free No record of out-of-state crimes Philosophical Problems Metric for success False positives: 30,000 arrests could have been prevented! False negatives: 1,413 people jailed unnecessarily… Reduced crime could have to do with repentance, increased policing, age, etc. and not with incapacitation at all Data Pre-processing Only used records where had both 1962 and 1988 data Priors: # of previous convictions weighted by severity of crime PriorsP: # of previous periods of incarceration weighted by length Inst_(M,P,V,F,etc.): # of arrests weighted by severity of crime in each of six categories # of Arrests to Desistance (R^2 = .159) Predictor Priors Age Drugs Serious Free PriorsP Type Alias Regression Coeff 1.115 -.104 -2.155 -.015 -.899 -.413 -.706 .343 Standardized Reg. Coeff .270 -.144 -.154 -.058 -.062 -.085 -.05 .046 T 11.02 -6.39 -7.94 -2.92 -3.18 -2.37 -2.31 2.31 # of Arrests to Desistance (Violent Crimes Only—n=1,998) Predictor Priors Age InstP PriorsP Regression Coeff -.022 .134 .253 -.066 R^2 = .061; p<.05 Standardized Reg. Coeff -.174 .184 .076 -.077 T -7.85 7.45 3.35 -2.91 What Next? Multiple Linear Regression Try using different things as class— nuisance only, arrest rate, crime-free time Try different predictors—have 119 variables BUT No reason to believe predictors are linearly independent No reason to believe non-linear correlation What Next? Better technique: Decision trees “White Box” model mimics human decisionmaking Use some kind of feature-selection algorithm? Maybe ensemble learning, once featureselection is in place? Acknowledgements Trevor Gardner, UC Berkeley Don Gottfredson, Rutgers University Bureau of Criminal Statistics