Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Community-based Security Informatics Research: The COPLINK Experience Acknowledgement: NSF, CIA/ITIC, DHS, NIJ/DOJ, NLM/NIH, COPS, TPD, PPD, KCC Hsinchun Chen, Ph.D. Director, COPLINK Center of Excellence, Artificial Intelligence Lab, Hoffman E-Commerce Lab, University of Arizona Outline • COPLINK Background and Research Framework • COPLINK Connect and Detect: Community-based Research • COPLINK STV, Agent, and Deception Detection Research • COPLINK Visual Criminal Network Analysis Research • From COPLINK to BorderSafe and Terrorism Research Outline COPLINK Background and Research Framework Introduction • The concern about national security has increased significantly since the terrorist attack on September 11, 2001 • Intelligence agencies such as the CIA and FBI are actively collecting and analyzing information to investigate terrorists’ activities • Local law enforcement agencies have also become more alert to criminal activities in their own jurisdictions that may be relevant to national security COPLINK Progression 1990-present NSF CISE funding (IIS, Digital Government, Digital Library, NSDL, ITR, IDM, CSS), NLM/NIH (medical informatics), DARPA 1997 NIJ COPLINK funding; Web-enabled data warehousing for law enforcement 2000 NIJ AGILE interoperability funding; information sharing 2001 NSF Digital Government funding; data/text mining, agents, and knowledge management; COPLINK Center 2002 NSF/CIA KDD funding; intelligence community 2003 DHS BorderSafe funding; NSF/CIA disease informatics (bioterrorism) funding; NSF ITR funding, terrorism portal Goal: A model and testbed for law enforcement and national security research Crime Types Type Local Law Enforcement Level National Security Level Traffic Violations Driving under the influence (DUI), fatal or personal injury, property damage, traffic accident, road rage - Sex Crime Sexual offenses, sexual assault, child molesting Organized prostitution Theft Robbery, burglary, larceny, motor vehicle theft, stolen property Theft of national secrets or weapons Fraud Forgery and counterfeiting, frauds, embezzlement, identity deception Transnational money laundering, identity fraud, transnational financial fraud Arson Arson on buildings, apartments Gang / drug offenses Narcotic drug offenses (sales or possession) Transnational drug trafficking Violent Crime Criminal homicide, armed robbery, aggravated assault, other assaults Terrorism (bioterrorism, bombing, hijacking, etc.) Cyber Crime Internet frauds (e.g., credit card fraud, advance fee fraud, fraudulent Internet banking sites), illegal trading, network intrusion/hacking, virus spreading, netspionage, cyberpiracy, cyber-pornography, cyber-terrorism, theft of confidential information, hate crime - The COPLNK Research Framework Building the Science of Intelligence and Security Informatics Outline COPLINK Testbed: Data Characteristics Information Sharing and Interoperability Tucson PD Data Sources • TPD Record Management System: Stores a wide range of information from incident reports to warrants to pawn tickets, from person descriptions to vehicles to weapons and property items. Incident data goes back as early as 1983. Database: Litton PRC RMS31 on Oracle 7.3, Compaq OpenVMS • TPD Mug Shot Database: Stores about 90,000 mug shots taken by the ID Department. Database: ImageWare on SQL Server 7.0, Windows NT 4.0 Server • TPD Gang Database: Stores comprehensive information about 3,200 gang members: their activities, aliases, physical descriptions, vehicles, etc. Database: In House Access 97, Windows NT 4.0 Server Tucson PD RMS Documents • Incident Reports: Report number, crime type, precinct, MOs, date and time. Number of Documents 2,500,000 • Pawn Tickets: Ticket number, data and time. • Warrants: Warrant number, docket number, type and issue date. • Field Interviews: FI number, type, precinct, date and time. 150,000 Reports Pawn Tickets 65,000 45,000 Warrants Field Interviews Tucson PD RMS Data Objects • Person: True names, aliases, descriptions, addresses, IDs, marks and phone numbers. Number of Data Objects 1,300,000 • Organization: Name, address and phones. • Vehicle: VIN, license plate, make, model, style, year and colors. 420,000 • Property: 400,000 Serial number, type, make, model, size and colors. 85,000 39,000 • Weapon: Serial number, type, manufacturer, caliber and colors. Person Property Vehicle Organization Weapon COPLINK Database: Tucson PD Coplink 2.5 Database Size, TPD Node 16 GB 7 GB Data Indices COPLINK Documentation OBJECTS PK Sample COPLINK ERD, Entity Relationship Diagram OBJECTPK OBJECTTYPE OBJECTDESC PERSONS L_EYECOLTYPES PK PK,FK5 PERSONPK COLORTYPE COLORCODE COLORDESC COLORRANK FK4 FK2 L_HAIRCOLTYPES PK COLORTYPE COLORCODE COLORDESC COLORRANK FK1 FK3 REALNAME DOB RACE GENDER MINDOB MAXDOB MINHEIGHT MAXHEIGHT MINWEIGHT MAXWEIGHT EYECOLOR HAIRCOLOR GANGFLAG CAUTIONFLAG WANTEDFLAG PAWNERFLAG FBIID SID LOCALID FNGRPRTID DNAID PHOTOFILENAME PHOTOIMAGE L_RACETYPES PK RACETYPE RACECODE RACEDESC RACERANK L_GENDERTYPES PK GENDERTYPE GENDERCODE GENDERDESC GENDERRANK COPLINK Documentation COPLINK Data Dictionary: 217 Tables, 1000 attributes TABLE NO TABLE NAME 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS COLUMN NAME ORDER DATATYPE SIZE DEC NULL PK PERSONPK REALNAME DOB RACE GENDER MINDOB MAXDOB MINHEIGHT MAXHEIGHT MINWEIGHT MAXWEIGHT EYECOLOR HAIRCOLOR GANGFLAG CAUTIONFLAG WANTEDFLAG PAWNERFLAG FBIID SID LOCALID FNGRPRTID DNAID PHOTOFILENAME PHOTOIMAGE 1 NUMERIC 2 VARCHAR 3 VARCHAR 4 NUMERIC 5 NUMERIC 6 VARCHAR 7 VARCHAR 8 NUMERIC 9 NUMERIC 10 NUMERIC 11 NUMERIC 12 NUMERIC 13 NUMERIC 14 NUMERIC 15 NUMERIC 16 NUMERIC 17 NUMERIC 18 VARCHAR 19 VARCHAR 20 VARCHAR 21 VARCHAR 22 VARCHAR 23 VARCHAR 24 IMAGE 18 320 8 2 1 8 8 4 4 5 5 2 2 1 1 1 1 100 100 100 100 100 255 0 0 0 0 0 1 1 0 0 0 0 0 0 N N N N N N N N N N N N N N N N N N N N N N N Y FK TABLE 1 OBJECTS FK COLUMN OBJECTPK L_RACETYPES L_GENDERTYPES RACETYPE GENDERTYPE L_EYECOLTYPES L_HAIRCOLTYPES COLORTYPE COLORTYPE COPLINK Data Formats • Delimited ASCII text files • SQL Server 2000 backup file • SQL Server 2000 detached database • Oracle 8i/9i dump file • Oracle 8i/9i transportable tablespace • DB2 UDB 7 backup file • TPD data available: 10/1/2002, PPD data: 2/1/2003 Information Management Challenges: Tucson PD Data Across all Crime Types • Incident Reports: Report number, crime type, precinct, MOs, date and time. Number of Documents 2,500,000 • Pawn Tickets: Ticket number, data and time. • Warrants: Warrant number, docket number, type and issue date. • Field Interviews: FI number, type, precinct, date and time. 150,000 Reports Pawn Tickets 65,000 45,000 Warrants Field Interviews Information Management Challenges: Sample COPLINK Table COPLINK Data Dictionary: 217 Tables, 1000 attributes TABLE NO TABLE NAME 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS 181 PERSONS COLUMN NAME ORDER DATATYPE SIZE DEC NULL PK PERSONPK REALNAME DOB RACE GENDER MINDOB MAXDOB MINHEIGHT MAXHEIGHT MINWEIGHT MAXWEIGHT EYECOLOR HAIRCOLOR GANGFLAG CAUTIONFLAG WANTEDFLAG PAWNERFLAG FBIID SID LOCALID FNGRPRTID DNAID PHOTOFILENAME PHOTOIMAGE 1 NUMERIC 2 VARCHAR 3 VARCHAR 4 NUMERIC 5 NUMERIC 6 VARCHAR 7 VARCHAR 8 NUMERIC 9 NUMERIC 10 NUMERIC 11 NUMERIC 12 NUMERIC 13 NUMERIC 14 NUMERIC 15 NUMERIC 16 NUMERIC 17 NUMERIC 18 VARCHAR 19 VARCHAR 20 VARCHAR 21 VARCHAR 22 VARCHAR 23 VARCHAR 24 IMAGE 18 320 8 2 1 8 8 4 4 5 5 2 2 1 1 1 1 100 100 100 100 100 255 0 0 0 0 0 1 1 0 0 0 0 0 0 N N N N N N N N N N N N N N N N N N N N N N N Y FK TABLE 1 OBJECTS FK COLUMN OBJECTPK L_RACETYPES L_GENDERTYPES RACETYPE GENDERTYPE L_EYECOLTYPES L_HAIRCOLTYPES COLORTYPE COLORTYPE Outline COPLINK Connect and Detect: Community-based Research User-centered Design, Information Sharing, Information Retrieval, HCI, and Association Rule Mining COPLINK Connect: Information Sharing Consolidating & sharing information promotes problem solving and collaboration Records Management Systems (RMS) Gang Database Mugshots Database COPLINK Connect Functionality • Generic, common XML based criminal elements representation • Data migration (batch and incremental) and mapping for all major databases and legacy systems • Database independent: ODBC compliance data warehouse • Multi-layered Web-based architecture: database server, Web server, browser • Powerful and flexible search tools for various reports, e.g., incidents, warrants, pawns, etc. • Graphical browser-based GUI interface for ease of use, training and maintenance H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K. Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003. COPLINK Detect: Crime Analysis Consolidated information enables targeted problem solving via powerful investigative criminal association analysis COPLINK Detect Functionality • Simple association rule mining applied to criminal elements relationships • Generic, common XML based representation for criminal relationships • Incremental data migration and association analysis on databases • Support powerful, multi-attribute queries using partial crime information • Graphical browser-based GUI interface for simple crime relationship analysis and case retrieval H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing Law Enforcement Data and Knowledge,” Communications of the ACM, 2003. COPLINK Detect 2.0/2.5 COPLINK Connect/Detect Deployment • Tucson, Phoenix (Arizona) • Huntsville (Texas) • Montgomery County (Maryland) • Polk County/Des Moines (Iowa) • Ann Arbor (Michigan) • Boston (Massachusetts) • Redmond (Washington) • Henderson County (North Carolina) • Shawnee County (Kansas) • San Diego (CA) • Pima County, Arizona DHS (Arizona) • State of Alaska, Los Angeles (CA) Serving 20+ states, 300+ agencies, protecting 30M+ citizens Outline COPLINK STV, Deception Detection and Agent Research Visualization, HCI, Agent, Data Mining COPLINK Spatial-Temporal Visualization: Timeline Tool • Visualizes the chronologically ordered set of events associated with user-selected database entities • Events placed along horizontal axis • Entities placed along vertical axis – Entities can be grouped together – Each row contains all events associated with the entities in a group • Time-based Zooming – User can zoom into a specific time interval for more detail, while hiding uninteresting portions of the timeline COPLINK Spatial-Temporal Visualization: GeoMapping Tool • Plots location of incident events within a selected time interval • Zooming/panning capabilities • User-selectable GIS layers • Overview map – Provides context to the currently selected region • Plot events over time – Plot events as they occur, use different color shadings to indicate when it occurred relative to other events – Plot events as they occur and remove them after they are over, using directed arrows to highlight movement from one event to the next in time COPLINK Spatial-Temporal Visualization: Periodic Pattern Tool • Reveals periodic patterns of incident occurrence • Incident events will be plotted continuously on a circular graph – Time period represented along circle (day, week, month, etc.) – Height from center indicates number of incidents that occurred at that specific time • Customizable granularity (e.g. year, month, day, etc.) • 3-sigma statistical significance line – Indicates unusually large or small number of occurrences at a specific time COPLINK Data Mining Research Deception Detection, a data mining approach • • • • • • “An agent must spell a suspect’s name exactly right, or the FBI computer will not recognize it. That can be particularly frustrating in cases such as the Sept. 11 probe, in which suspects have used multiple names and sometimes created identities by switching a few letters in their names.” – FBI FBI’s problem with 9/11 suspect names, e.g., “Majed M.GH Moqed,” “Majed Moqed,” and “Majed Mashaan Moqed,” and DOB, e.g., “01-01-1976” and “03-03-1976.” A deception taxonomy was created based on criminal deceptions in law enforcement databases Patterns existed in criminal deceptions, e.g., SSN variations, name variations, etc. Phonetic and syntactic string comparators are adopted Promising initial testing result: 94% accuracy in deception detection G. Wang, H. Chen, H. Atabakhsh, “Automatically Detecting Deceptive Criminal Identities,” Communications of the ACM, forthcoming, 2002. A Taxonomy for Deceptions in Criminal Identity Criminal Identity Deception DOB Deception ID Deception Residency Deception Name Deception Birth year deception Birth month deception Birth date deception Street type deception Street direction deception Street name deception Street number deception Name exchange Partly missing names Similar pronunciation Abbreviation and add-on Changing middle inital Completely deceptive name Partly deceptive name A Taxonomy of Deceptions in Criminal Identity: Name Deception • Name Deception: – Either false first name or false last name (62.5%) – Only the middle initial is changed (62.5%) – Similar pronunciation but different spelling (42%) – A Completely false name (29.2%) – Using abbreviated names or adding extra letters (29.2%) – Leaving out the first name or last name (29.2%) – Exchanging last name and first name (8%) A Taxonomy of Deceptions in Criminal Identity: DOB, SSN, Residency • DOB and ID (SSN) deception: – In most cases, criminals only make minor changes in DOB and SSN, e.g., 19700207 19700208 • Residency deception: – 42% criminals in the collection deceived on address information. In most cases, only one portion of the address is changed slightly, e.g., street number. String Comparators • Phonetic Russell SoundEx code: Newcombe [1959], encodes a name with a format having a prefix letter followed by a three-digit number, – e.g., PEARCE and PIERCE both coded as: “P620”. However, phonetic matching is particularly poor at finding matches [Zobel and Dart 1996]; • Spelling string comparator [Jaro 1976; Winkler 1990]. – compares spelling variations between two strings instead of phonetic codes Limitation: common characters in both strings must be within half the length of the shorter string Other Approximate String Matching tool • Agrep [Wu, Manber 1992]: A general string matching algorithm that can handle character variations of insertion, deletion, and substitution. • The pattern is represented as a bit array. The computation only involves simple bit operations (RightShift) and logic operations (AND, OR) on bit arrays. Rdj+1=Rshift[Rdj] AND Sc OR Rshift[Rd-1j OR Rd-1j+1] OR Rd-1j • Agrep has been integrated into Unix and been in wide use since June 1991 Algorithm Design • Compare corresponding fields of each pair of records (disagreement): Sname, SDOB, Saddr, and SID • To capture different types of name deceptions, agrep (last1 first1, last 2 first 2) agrep (last1 first1, first 2 last 2) S name (name1, name2) min SoundEx(last1 first1, last 2 first 2) SoundEx ( last 1 first 1 , first 2 last 2 ) Calculate the Normalized Euclidean Distance for the overall dissimilarity between two records, i.e., Disagreement = S name S DOB S addr S ID 4 2 2 2 2 Experimental Results (Training: 80 cases) Disagreement Value R1 R2 R3 R4 R5 R6 R7 R8 ….. (up to R80) R1 * 0.53 0.71 0.67 0.54 0.65 0.63 0.62 … R2 0.53 * 0.66 0.71 0.64 0.73 0.58 0.67 … R3 0.71 0.66 * 0.62 0.7 0.64 0.7 0.68 … R4 0.67 0.71 0.62 * 0.67 0.68 0.67 0.66 … R5 0.54 0.62 0.72 0.65 * 0.73 0.67 0.58 … R6 0.65 0.73 0.66 0.68 0.73 * 0.7 0.64 … R7 0.63 0.6 0.68 0.67 0.67 0.7 * 0.69 … R8 0.62 0.67 0.68 0.66 0.58 0.64 0.69 * … ….. (up to R80) … … … … … … … … * Table: Distance matrix, the distance value shows the degree of disagreement between each pair of records in the training data set. Experimental Results (Training: 80 cases) Threshold 0.4 0.45 0.46 0.47 0.48 0.49 0.5 Accuracy False Negative False Positive 76.60% 23.40% 0.00% 92.20% 7.80% 0.00% 93.50% 6.50% 2.60% 96.10% 3.90% 2.60% 97.40% 2.60% 2.60% 97.40% 2.60% 6.50% 97.40% 2.60% 11.70% Table: Determining best threshold value (0.48) Training Result 120.0% Rate 100.0% 80.0% Accuray 60.0% False Negative False Positive 40.0% 20.0% 0.0% 0.35 -20.0% 0.40 0.45 4.8 Threshold 0.50 0.55 Experimental Results (Testing: 40 cases) Threshold 0.48 Accuracy False Negative False Positive 94.0% 0.0% 0.0% Table: Accuracy of deception detection when the best threshold value (0.48) is applied to the testing data set (40 records) COPLINK Agent Research COPLINK Agent: alert and collaboration in a wireless architecture • Enhance police information timeliness, collaboration, mobility, and safety via a web-based wireless alerting system (under testing at TPD) • Real-time alert of time-critical information from multiple databases, e.g., CAD (computer-aided dispatching) database, MVD • Identify and inform officers/detectives who are working on similar cases • Push time-critical information via wireless and personalized communications, i.e., web alert, email, cell phone, and pager COPLINK Agent: Wireless Alert and Collaboration • Allows Patrol Officers to enhance their community expertise • Further promotes Officer safety through curbside knowledge • Secure wireless access and alert: laptop, PDA, pager, cell phone • Alert: 24-7 monitoring of time-critical information from different databases • Collaboration: Automatically informing detectives working on similar cases COPLINK Agent: Vehicle Search Form Multi-DB Search Notificat ion setting Alert Method COPLINK Agent: Web and E-mail Collaboration Alerts Web Alert Email Alert COPLINK Agent: Cell Phone and Pager Alert Cell phone alert Pager alert with case number Agent User Study and Result Summary • Study Design: – Case study method based on structured interviews, archival records analysis, and usability survey. – Use QUIS (Questionnaire for User Interaction Satisfaction) survey instrument developed by the HCI Lab at the U. of Maryland. – 10 participants: crime analysts and detectives in several TPD units. • Positive feedback on system Effectiveness and Efficiency: – Monitoring: “… the information I have received back was instrumental in making at least 2 felony cases that will be prosecuted on the federal level.” – Collaboration from CAD Alert: “… allowing us to respond to incidents we know are important that the field units perhaps don’t realize in a timely manner.” – Multi-database Search: “The Tucson City Court Search was helpful because I located one of my suspects on her court date.” • High User Satisfaction from QUIS survey items: – Averaged 5.5 for 49 items on a 7-point Likert scale (7: most useful). – Strengths: Offers good Investigative power; Easy to read layout; Potential for Collaborative information sharing; CAD Integration; High intention to use. – Weaknesses: Lack of help messages; Difficult for inexperienced users; Obscure user preference settings. Arizona Daily Star, Jan 7, 2001 New York Times, Nov 2, 2002 Newsweek, March 3, 2003 Interacting with the LE Community • User-centered design (2 officers assigned to project); frequent, focused, staged user studies (a user study team); quick prototyping and user feedback (quarterly) • TPD user briefings: 30+ user groups and management demos/briefings (2 chiefs, 7 assistant chiefs) • Arizona/regional partner briefings: 30+ regional partners demos/meetings; Phoenix, Pima, etc. • Annual COPLINK Center research workshop, under NSF Digital Government Program • National/regional NIJ/DOJ and LE meetings: 20+ LE IT meetings; International Association of Chiefs of Police (IACP) meetings • Regional deployment and success: Arizona, TX, Iowa, Michigan, Boston, Alaska, CA, etc. COPLINK Lessons Learned • Know their pain and build something they can use. What street cops need. • Build trust and know the culture. security, policy, training, user acceptance (build a Living Lab) • Early and consistent user involvement. 2 TPD officers, 7 asst. chiefs, 2 chiefs • Create early and small successes. Detect/Connect, group to division and department • Spread the success and solicit partners. Tucson, AZ, CA, TX, MA, Montgomery, MA, Alaska, etc. • Understand funding agencies expectation. NIJ (tools), NSF (research) • Development and research prioritization. research (Ph.D.) after development (MS/BS); little cutting-edge research in the first two years • Establish deployment partners. KCC, diff(operational system,research prototype) = $2M • Work with university technology transfer office. office of (preventing) technology transfer? Outline COPLINK Visual Criminal Network Analysis (CNA) Research: BorderSafe, and Dark Web Terrorism Research Research Approach • Testbed and community grounded algorithm, toolkit and system research and development • Advanced visual criminal network analysis and knowledge mapping research and technologies BorderSafe: Research Objectives • Participate in DHS BorderSafe IFE Experiment, in partnership with CNRI, ARJIS (SD), SDSC, TPD, and AZ DHS • Develop (1) border-crosser and border-crossing vehicle analysis techniques, by (2) leveraging local law enforcement and local DHS data, and (3) using COPLINK crime analysis abilities • Advance visual criminal network analysis (CNA) and knowledge mapping research and technologies (e.g., terrorism, terrorist, terrorized) Current Capability: Criminal Network Analysis (LE and Intelligence Community) • First generation — manual approach – Anacapa Chart (Harper & Harris, 1975) • Second generation — graphics-based approach – Analyst’s Notebook, Netmap, Watson – COPLINK hyperbolic tree view, network view • Third generation — structural analysis approach Anacapa Chart (1st generation) Association Matrix • Manually extract criminal associations from data files • Construct an association matrix and draw a link chart based on the association matrix Link chart Analyst’s Notebook, Netmap, Watson (2nd generation) Analyst’s Notebook. Network nodes are automatically arranged for easy interpretation. Source: i2, Inc. Netmap. Different colors are used to represent different entity types. Source: Netmap Analytics, LLC. Watson. Relations among a group of people (the central sphere) based on telephone records. Source: Xanalysis, Ltd. A 9/11 Terrorist Network: centrality, cliques, typology… BorderSafe Visual Criminal Network Analysis (CNA) Design Structural Analysis Criminal -justice Data Network Partition Hierarchical Clustering Network Creation Network Visualization Concept Space Centrality Measures Networked Data Blockmodeling MDS J. Xu and H. Chen, “Criminal Network Analysis and Visualization: A Data Mining Perspective,” Communications of the ACM, 2004, forthcoming. Visual CNA: Network Display Nodes represent individual criminals labeled by their names Links represent relationships between criminals Adjust the slider to perform clustering and blockmodeling Visual CNA: Subgroup Display The reduced star structure found using blockmodeling • Circles represent groups. • The size of a circle is proportional to the number of group members. • Each group is labeled by its leader’s name. Visual CNA: Member Ranking The rankings of each group member in terms of centrality measures The first one of each column is the leader, gatekeeper, and outlier, respectively The inner structure of a selected group Adjust the slider to do further blockmodeling Meth World: Subgroup Verification • Subgroups detected have different characteristics: The subgroups found are consistent with the groups’ specializations or responsibilities in a network White gang members who were involved in assaults and murders White gang members who were involved in crack cocaine Drug dealers Offenders who were responsible for stealing, counterfeiting, and cashing checks and providing money to other groups to carry out drug transactions Visual CNA: Network Structure A chain structure found in a 60-member network using blockmodel analysis Temporal CNA: The Evolution of Meth World The network in Year: 1995, 1996, 1998, 1999, 2002 Cross-Jurisdictional CNA: The Extended Meth World (TPD & PPD) • Highlighted (red) nodes represent criminals who appear in both TPD and PPD databases Tucson Phoenix Customs and Border Protection (CBP) Border Crossing Information • CBP has provided the Border Safe project with license plate numbers seen crossing the border. These can be integrated with local data to enhance the analysis. • Video equipment automatically extracts license plate numbers of cars as they cross into or out of the country at the Douglas AZ port of entry. 1,125,155 Records: plate, state, date, time 226,207 Distinct vehicles 209 Days of information over an 18 month period 130,195 Plates issued in AZ 5,546 Plates issued in CA 90,466 Plates issued in Mexico Border Crossing Records and TPD • Many of the vehicles found in the CBP data also show activity in the TPD database. • The fact that a vehicle frequently crosses the border is of interest in criminal investigations. • The TPD data provides a link between license plates and criminal activity networks. 8,300 Distinct vehicles appear in both datasets 34,632 Crossings recorded crossings involve those vehicles A Vehicle to Watch? This network contains 5 border crossing plates (outlined in red). The large green dots were confirmed to be criminals of significant interest. Shape Indicates Object Type circles are people rectangles are vehicles Color Denotes Activity History Gang related Violent crimes Narcotics crimes Violent & Narcotics Larger Size Indicates higher levels of activity Border Crossing Plates are outlined in Red A Vehicle to Watch? Plate ABC-123 - Crossed border 35 times. - No prior Narcotics associations “Jane” - Associated with Vehicle - No known Narcotics activity “Joe” - Related with Jane and vehicle in ‘Suspicious Activity’ report - Some prior narcotics activity “Bob” - Related with Joe in Narcotics - Involved in 11 narcotics incidents - Connected to a big narcotics network Truncated version of previous network Name Removed People / Vehicles previously never linked to narcotics can be identified using such Networks to focus and support investigations. From COPLINK to BorderSafe to Terrorism Knowledge Portal • Terrorism: Identify key terrorism literature, resources, and experts (web portal, meta searching, citation network analysis, knowledge maps, expert finder) • Terrorist: Understand how the terrorist groups are revealed on the web and how they use the web (Dark Web, web spidering and mining, back-link analysis, terrorist network analysis, multilingual entity and event extraction) • Terrorized: Assist citizens and victims responding to terrorism (pattern-based chatbots, system assessment, victim consultation and resources, scalable anonymous robot assistance) • (In collaboration with Drs. Reid, Sageman and Levine and Sandia National Lab) The Web Dark Web Hate Groups | Racial Supremacy | Suicidal Attackers | Activists / Extremists | Anti-Government | Information Terrorist Group Sources Web sites Collection Methods Automatic Spidering … Search Engines Personal Profile Search Terrorism databases Meta Searching Back link search Government information Downloading from Gov’t Web sites Filtering Data Storage International Terrorism Domestic Terrorism Terrorism research information Sageman’s Global Salafi Jihad (GSJ) Data • Data collected and cross-validated from open sources regarding 172 GSJ members (Dr. Sageman, U. Penn) • Background – From upper or middle class (3/4) – Average age is 26 – Affiliation through friendship, kinship, discipleship, and worship • Four clusters (based on geographical distribution): lieutenants and network structures – Central Staff: Osama bin Laden – Core Arabs: Khalid Sheikh Mohammed – Maghreb Arabs: Zain al Abidin Mohd Hussein – Indonesians: Abu Bakar Baasyir Jihad CNA: “Combined” = “Link to GSJ” + “Operational” + “Family” (107 nodes) Scale free network Maghreb Arabs Core Arabs Cliques A clique Scale free network Osamar bin Larden Hierarchical network Indonesians Jihad CNA: 9/11 Hijackers in “Combined” Network Monitor New Open Sources Mohemed Atta Outline Developing the Science of Intelligence and Security Informatics (ISI) Develop the Science of Intelligence and Security Informatics (ISI) • ISI: The study of the use and development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy based approach. similar to “Biomedical Informatics” (information centric) • National Security is a long-term mission Need to develop long-term research agenda and partnership (researchers, practitioners, policy makers, industries, law enforcement and intelligence professionals, etc.) • A bottom-up, success-driven approach From selected demonstration sites, to regional partnership, and then to national deployment. Build small successes first. Building an ISI Community • Federal funding priority: Building community-based “Living Labs” • Many disparate LE, intelligence, and industry meetings (vendor driven) • Academic ISI special issues: JASIST, DSS, and ACM TOIT, forthcoming, 2004 • IEEE Intelligence and Security Informatics Conference: Sponsored by NSF, NIJ, CIA, and DHS, 2003 (Tucson), 2004 (Tucson), 2005 (Atlanta), 2006 (San Diego), 2007 (NJ), 2008 (Taiwan) For project information: http://ai.arizona.edu/COPLINK [email protected]