* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Example: Data Mining for the NBA - The University of Texas at Dallas
Entity–attribute–value model wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Clusterpoint wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Information privacy law wikipedia , lookup
3D optical data storage wikipedia , lookup
Database model wikipedia , lookup
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Data Warehousing, Data Mining and Security September 2014 Outline Background on Data Warehousing Security Issues for Data Warehousing Data Mining and Security What is a Data Warehouse? A Data Warehouse is a: - Subject-oriented - Integrated - Nonvolatile - Time variant - Collection of data in support of management’s decisions - From: Building the Data Warehouse by W. H. Inmon, John Wiley and Sons Integration of heterogeneous data sources into a repository Summary reports, aggregate functions, etc. Example Data Warehouse Users Query the Warehouse Oracle DBMS for Employees Data Warehouse: Data correlating Employees With Medical Benefits and Projects Sybase DBMS for Projects Could be any DBMS; Usually based on the relational data model Informix DBMS for Medical Some Data Warehousing Technologies Heterogeneous Database Integration Statistical Databases Data Modeling Metadata Access Methods and Indexing Language Interface Database Administration Parallel Database Management Data Warehouse Design Appropriate Data Model is key to designing the Warehouse Higher Level Model in stages - Stage 1: Corporate data model - Stage 2: Enterprise data model - Stage 3: Warehouse data model Middle-level data model - A model for possibly for each subject area in the higher level model Physical data model - Include features such as keys in the middle-level model Need to determine appropriate levels of granularity of data in order to build a good data warehouse Distributing the Data Warehouse Issues similar to distributed database systems Branch A Branch B Central Bank Central Warehouse Non-distributed Warehouse Branch A Branch A Warehouse Branch B Central Bank Central Warehouse Distributed Warehouse Branch B Warehouse Multidimensional Data Model Project Name Project Leader Project Sponsor Years Project Cost Months Project Duration Weeks Dollars Pounds Yen Indexing for Data Warehousing Bit-Maps Multi-level indexing Storing parts or all of the index files in main memory Dynamic indexing Metadata Mappings Metadata for the Warehouse Metadata for Mappings and Transformations Metadata for Data source A Metadata for Mappings and Transformations Metadata for Data source B Metadata for Mappings and Transformations Metadata for Data source C Data Warehousing and Security Security for integrating the heterogeneous data sources into the repository - e.g., Heterogeneity Database System Security, Statistical Database Security Security for maintaining the warehouse - Query, Updates, Auditing, Administration, Metadata Multilevel Security - Multilevel Data Models, Trusted Components Example Secure Data Warehouse User Secure Data Warehouse Manager Secure DBMS A Secure Database Secure DBMS B Secure Database Secure Warehouse Secure DBMS C Secure Database Secure Data Warehouse Technologies Secure Data Warehousing Technologies: Secure data modeling Secure heterogeneous database integration Database security Secure access methods and indexing Secure query languages Secure database administration Secure high performance computing technologies Secure metadata management Security for Integrating Heterogeneous Data Sources Integrating multiple security policies into a single policy for the warehouse - Apply techniques for federated database security? Need to transform the access control rules Security impact on schema integration and metadata - Maintaining transformations and mappings Statistical database security - Inference and aggregation e.g., Average salary in the warehouse could be unclassified while the individual salaries in the databases could be classified Administration and auditing - - Security Policy for the Warehouse Federated Policy for Federation F2 Federated Policy for Federation F1 Export Policy for Component A Export Policy for Component B Export Policy for Component B Export Policy for Component C Generic Policy for Component A Generic Policy for Component B Generic policy for Component C Component Policy for Component A Component Policy for Component B Component Policy for Component C Security Policy Integration and Transformation Federated policies become warehouse policies? Security Policy for the Warehouse - II Policy for the Warehouse Policy for Mappings and Transformations Policy For Data Source A Policy for Mappings and Transformations Policy for Mappings and Transformations Policy For Data Source B Policy For Data Source C Secure Data Warehouse Model Project Name, U Project Leader, U Project Sponsor, S Year, U Project Cost, S Months, U Project Duration, U Weeks, U U = Unclassified S = Secret Dollars, S Pounds, S Yen, S Methodology for Developing a Secure Data Warehouse Integrate Secure data sources Secure data sources Clean/ modify data Sources. Integrate policies Build secure data model, schemas, access methods, and index strategies for the secure warehouse Multi-Tier Architecture Tier N: Secure Data Warehouse Data Warehouse Builds on Tier N-1 * * Tier 2: Builds on Tier 1 Tier 1:Secure Data Sources Each layer builds on the Previous Layer Schemas/Metadata/Policies Administration Roles of Database Administrators, Warehouse Administrators, Database System Security officers, and Warehouse System Security Officers? When databases are updated, can trigger mechanism be used to automatically update the warehouse? - i.e., Will the individual database administrators permit such mechanism? Auditing Should the Warehouse be audited? - Advantages Keep up-to-date information on access to the warehouse Disadvantages May need to keep unnecessary data in the warehouse May need a lower level granularity of data May cause changes to the timing of data entry to the warehouse as well as backup and recovery restrictions Need to determine the relationships between auditing the warehouse and auditing the databases - Multilevel Security Multilevel data models - Extensions to the data warehouse model to support classification levels Trusted Components - How much of the warehouse should be trusted? - Should the transformations be trusted? Covert channels, inference problem Inference Controller User Inference Controller Secure Data Warehouse Manager Secure DBMS A Secure Database Secure DBMS B Secure Database Secure Warehouse Secure DBMS C Secure Database Status and Directions Commercial data warehouse vendors are incorporating role- based security (e.g., Oracle) Many topics need further investigation - Building a secure data warehouse - Policy integration - Secure data model - Inference control Data Mining for Counter-terrorism Data Mining for Counterterrorism Data Mining for Non real-time Threats: Gather data, build terrorist profiles Mine data, prune results Data Mining for Real-time Threats: Gather data in real-time, build real-time models, Mine data, Report results Data Mining Needs for Counterterrorism: Non-real-time Data Mining Gather data from multiple sources - Information on terrorist attacks: who, what, where, when, how - Personal and business data: place of birth, ethnic origin, religion, education, work history, finances, criminal record, relatives, friends and associates, travel history, . . . - Unstructured data: newspaper articles, video clips, speeches, emails, phone records, . . . Integrate the data, build warehouses and federations Develop profiles of terrorists, activities/threats Mine the data to extract patterns of potential terrorists and predict future activities and targets Find the “needle in the haystack” - suspicious needles? Data integrity is important Techniques have to SCALE Data Mining for Non Real-time Threats Integrate data sources Clean/ modify data sources Build Profiles of Terrorists and Activities Mine the data Data sources with information about terrorists and terrorist activities Report final results Examine results/ Prune results Data Mining Needs for Counterterrorism: Real-time Data Mining Nature of data - Data arriving from sensors and other devices Continuous data streams - Breaking news, video releases, satellite images - Some critical data may also reside in caches Rapidly sift through the data and discard unwanted data for later use and analysis (non-real-time data mining) Data mining techniques need to meet timing constraints Quality of service (QoS) tradeoffs among timeliness, precision and accuracy Presentation of results, visualization, real-time alerts and triggers Data Mining for Real-time Threats Integrate data sources in real-time Rapidly sift through data and discard irrelevant data Build real-time models Mine the data Data sources with information about terrorists and terrorist activities Report final results Examine Results in Real-time Data Mining Outcomes and Techniques for Counter-terrorism Data Mining Outcomes and Techniques Classification: Build profiles of Terrorist and classify terrorists Association: John and James often seen together after an attack Link Analysis: Follow chain from A to B to C to D Clustering: Divide population; People from country X of a certain religion; people from Country Y Interested in airplanes Anomaly Detection: John registers at flight school; but des not care about takeoff or landing Example Success Story - COPLINK COPLINK developed at University of Arizona - Research transferred to an operational system currently in use by Law Enforcement Agencies What does COPLINK do? Provides integrated system for law enforcement; integrating law enforcement databases - If a crime occurs in one state, this information is linked to similar cases in other states It has been stated that the sniper shooting case may have been solved earlier if COPLINK had been operational at that time - Where are we now? We have some tools for - building data warehouses from structured data - integrating structured heterogeneous databases - mining structured data - forming some links and associations - information retrieval tools - image processing and analysis - pattern recognition - video information processing - visualizing data - managing metadata What are our challenges? Do the tools scale for large heterogeneous databases and petabyte sized databases? Building models in real-time; need training data Extracting metadata from unstructured data Mining unstructured data Extracting useful patterns from knowledge-directed data mining Rapidly forming links and associations; get the big picture for real- time data mining Detecting/preventing cyber attacks Mining the web Evaluating data mining algorithms Conducting risks analysis / economic impact Building testbeds IN SUMMARY: Data Mining is very useful to solve Security Problems - Data mining tools could be used to examine audit data - and flag abnormal behavior Much recent work in Intrusion detection e.g., Neural networks to detect abnormal patterns Tools are being examined to determine abnormal patterns for national security Classification techniques, Link analysis Fraud detection Credit cards, calling cards, identity theft etc. BUT CONCERNS FOR PRIVACY