Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehousing Data Mining Privacy Reading Farkas CSCE 824 - Spring 2011 2 Data Warehousing Repository of data providing organized and cleaned enterprisewide data (obtained form a variety of sources) in a standardized format – Data mart (single subject area) – Enterprise data warehouse (integrated data marts) – Metadata Farkas CSCE 824 - Spring 2011 3 OLAP Analysis Farkas Aggregation functions Factual data access Complex criteria Visualization CSCE 824 - Spring 2011 4 Warehouse Evaluation Farkas Enterprise-wide support Consistency and integration across diverse domain Security support Support for operational users Flexible access for decision makers CSCE 824 - Spring 2011 5 Data Integration Farkas Data access Data federation Change capture Need ETL (extraction, transformation, load) CSCE 824 - Spring 2011 6 Data Warehouse Users Internal users – Employees – Managerial External users – Reporting and auditing – Research Farkas CSCE 824 - Spring 2011 7 Data Mining Farkas Databases to be mined Knowledge to be mined Techniques Used Applications supported CSCE 824 - Spring 2011 8 Data Mining Task Farkas Prediction Tasks – Use some variables to predict unknown or future values of other variables Description Tasks – Find human-interpretable patterns that describe the data CSCE 824 - Spring 2011 9 Common Tasks Farkas Classification [Predictive] Clustering [Descriptive] Association Rule Mining [Descriptive] Sequential Pattern Mining [Descriptive] Regression [Predictive] Deviation Detection [Predictive] CSCE 824 - Spring 2011 10 Security for Data Warehousing Farkas Establish organizations security policies and procedures Implement logical access control Restrict physical access Establish internal control and auditing CSCE 824 - Spring 2011 11 Security for Data Warehousing (cont.) Farkas Security Issues in Data Warehousing and Data Mining: Panel Discussion Panel discussion of Bhavani Thuraisingham, The MITRE Corporation, Linda Schlipper, The MITRE Corporation, Pierangela Samarati, SRI International, T. Y. Lin, San Jose State University, Sushil Jajodia, George Mason University, Chris Clifton, The MITRE Corporation, xanadu.cs.sjsu.edu/~tylin/publications/pape rList/109_security.ps CSCE 824 - Spring 2011 12 Integrity Farkas Poor quality data: inaccurate, incomplete, missing meta-data Source data quality vs. derived data quality CSCE 824 - Spring 2011 13 Access Control Layered defense: – Access to processes that extract operational data – Access to data and process that transforms operational data – Access to data and meta-data in the warehouse Farkas CSCE 824 - Spring 2011 14 Access Control Issues Farkas Mapping from local to warehouse policies How to handle “new” data Scalability Identity Management CSCE 824 - Spring 2011 15 Inference Problem Data Mining: discover “new knowledge” how to evaluate security risks? Example security risks: – Prediction of sensitive information – Misuse of information Assurance of “discovery” Interesting Read: C. C. Aggarwal and P.S. Yu, PRIVACY-PRESERVING DATA MINING: MODELS AND ALGORITHMS, http://charuaggarwal.net/toc.pdf Farkas CSCE 824 - Spring 2011 16 Privacy Farkas Large volume of private (personal) data Need: – Proper acquisition, maintenance, usage, and retention policy – Integrity verification – Control of analysis methods (aggregation may reveal sensitive data) CSCE 824 - Spring 2011 17 Privacy Farkas What is the difference between confidentiality and privacy? Identity, location, activity, etc. Anonymity vs. accountability CSCE 824 - Spring 2011 18 Legislations Privacy Act of 1974, U.S. Department of Justice (http://www.usdoj.gov/oip/04_7_1.html ) Family Educational Rights and Privacy Act (FERPA), U.S. Department of Education, (http://www.ed.gov/policy/gen/guid/fpco/ferpa/in dex.html ) Health Insurance Portability and Accountability Act of 1996 (HIPAA), (http://en.wikipedia.org/wiki/Health_Insurance_Por tability_and_Accountability_Act ) Telecommunications Consumer Privacy Act (http://www.answers.com/topic/electroniccommunications-privacy-act ) Farkas CSCE 824 - Spring 2011 19 Online Social Network Social Relationship Communication context changes social relationships Social relationships maintained through different media grow at different rates and to different depths No clear consensus which media is the best Farkas CSCE 824 - Spring 2011 20 Internet and Social Relationships Internet Bridges distance at a low cost New participants tend to “like” each other more Less stressful than face-to-face meeting People focus on communicating their “selves” (except a few malicious users) Farkas CSCE 824 - Spring 2011 21 Social Network Description of the social structure between actors Connections: various levels of social familiarities, e.g., from casual acquaintance to close familiar bonds Support online interaction and content sharing Farkas CSCE 824 - Spring 2011 22 Social Network Analysis The mapping and measuring of relationships and flows between people, groups, organizations, computers or other information processing entities Behavioral Profiling Note: Social Network Signatures – User names may change, family and friends are more difficult to change Farkas CSCE 824 - Spring 2011 23 Interesting Read: Farkas M. Chew, D. Balfanz, B. Laurie, (Under)mining Privacy in Social Networks, http://citeseer.ist.psu.edu/viewd oc/summary?doi=10.1.1.149.446 8 CSCE 824 - Spring 2011 24 Next Hippocratic Databases Farkas CSCE 824 - Spring 2011 25 Next Class Stream Data Farkas CSCE 824 - Spring 2011 26