Download slides - cse.sc.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Warehousing
Data Mining
Privacy
Reading
Farkas
CSCE 824 - Spring 2011
2
Data Warehousing

Repository of data providing
organized and cleaned enterprisewide data (obtained form a
variety of sources) in a
standardized format
– Data mart (single subject area)
– Enterprise data warehouse (integrated
data marts)
– Metadata
Farkas
CSCE 824 - Spring 2011
3
OLAP Analysis




Farkas
Aggregation functions
Factual data access
Complex criteria
Visualization
CSCE 824 - Spring 2011
4
Warehouse Evaluation





Farkas
Enterprise-wide support
Consistency and integration
across diverse domain
Security support
Support for operational users
Flexible access for decision
makers
CSCE 824 - Spring 2011
5
Data Integration




Farkas
Data access
Data federation
Change capture
Need ETL (extraction,
transformation, load)
CSCE 824 - Spring 2011
6
Data Warehouse Users

Internal users
– Employees
– Managerial

External users
– Reporting and auditing
– Research
Farkas
CSCE 824 - Spring 2011
7
Data Mining




Farkas
Databases to be mined
Knowledge to be mined
Techniques Used
Applications supported
CSCE 824 - Spring 2011
8
Data Mining Task


Farkas
Prediction Tasks
– Use some variables to predict
unknown or future values of other
variables
Description Tasks
– Find human-interpretable patterns
that describe the data
CSCE 824 - Spring 2011
9
Common Tasks






Farkas
Classification [Predictive]
Clustering [Descriptive]
Association Rule Mining [Descriptive]
Sequential Pattern Mining [Descriptive]
Regression [Predictive]
Deviation Detection [Predictive]
CSCE 824 - Spring 2011
10
Security for Data
Warehousing




Farkas
Establish organizations security
policies and procedures
Implement logical access control
Restrict physical access
Establish internal control and
auditing
CSCE 824 - Spring 2011
11
Security for Data
Warehousing (cont.)


Farkas
Security Issues in Data Warehousing and
Data Mining: Panel Discussion
Panel discussion of Bhavani Thuraisingham,
The MITRE Corporation, Linda Schlipper,
The MITRE Corporation, Pierangela
Samarati, SRI International, T. Y. Lin, San
Jose State University, Sushil Jajodia, George
Mason University, Chris Clifton, The MITRE
Corporation,
xanadu.cs.sjsu.edu/~tylin/publications/pape
rList/109_security.ps
CSCE 824 - Spring 2011
12
Integrity


Farkas
Poor quality data: inaccurate,
incomplete, missing meta-data
Source data quality vs. derived
data quality
CSCE 824 - Spring 2011
13
Access Control

Layered defense:
– Access to processes that extract
operational data
– Access to data and process that
transforms operational data
– Access to data and meta-data in the
warehouse
Farkas
CSCE 824 - Spring 2011
14
Access Control Issues




Farkas
Mapping from local to warehouse
policies
How to handle “new” data
Scalability
Identity Management
CSCE 824 - Spring 2011
15
Inference Problem




Data Mining: discover “new knowledge”  how to
evaluate security risks?
Example security risks:
– Prediction of sensitive information
– Misuse of information
Assurance of “discovery”
Interesting Read: C. C. Aggarwal and P.S. Yu,
PRIVACY-PRESERVING DATA MINING: MODELS
AND ALGORITHMS,
http://charuaggarwal.net/toc.pdf
Farkas
CSCE 824 - Spring 2011
16
Privacy


Farkas
Large volume of private (personal) data
Need:
– Proper acquisition, maintenance,
usage, and retention policy
– Integrity verification
– Control of analysis methods
(aggregation may reveal sensitive
data)
CSCE 824 - Spring 2011
17
Privacy



Farkas
What is the difference between
confidentiality and privacy?
Identity, location, activity, etc.
Anonymity vs. accountability
CSCE 824 - Spring 2011
18
Legislations




Privacy Act of 1974, U.S. Department of Justice
(http://www.usdoj.gov/oip/04_7_1.html )
Family Educational Rights and Privacy Act (FERPA),
U.S. Department of Education,
(http://www.ed.gov/policy/gen/guid/fpco/ferpa/in
dex.html )
Health Insurance Portability and Accountability Act
of 1996 (HIPAA),
(http://en.wikipedia.org/wiki/Health_Insurance_Por
tability_and_Accountability_Act )
Telecommunications Consumer Privacy Act
(http://www.answers.com/topic/electroniccommunications-privacy-act )
Farkas
CSCE 824 - Spring 2011
19
Online Social Network

Social Relationship
 Communication context changes
social relationships
 Social relationships maintained
through different media grow at
different rates and to different
depths
 No clear consensus which media is
the best
Farkas
CSCE 824 - Spring 2011
20
Internet and Social
Relationships
Internet
 Bridges distance at a low cost
 New participants tend to “like” each
other more
 Less stressful than face-to-face
meeting
 People focus on communicating
their “selves” (except a few
malicious users)
Farkas
CSCE 824 - Spring 2011
21
Social Network

Description of the social structure
between actors

Connections: various levels of social
familiarities, e.g., from casual
acquaintance to close familiar bonds

Support online interaction and
content sharing
Farkas
CSCE 824 - Spring 2011
22
Social Network Analysis



The mapping and measuring of
relationships and flows between
people, groups, organizations,
computers or other information
processing entities
Behavioral Profiling
Note: Social Network Signatures
– User names may change, family and
friends are more difficult to change
Farkas
CSCE 824 - Spring 2011
23
Interesting Read:

Farkas
M. Chew, D. Balfanz, B. Laurie,
(Under)mining Privacy in Social
Networks,
http://citeseer.ist.psu.edu/viewd
oc/summary?doi=10.1.1.149.446
8
CSCE 824 - Spring 2011
24
Next Hippocratic
Databases
Farkas
CSCE 824 - Spring 2011
25
Next Class
Stream Data
Farkas
CSCE 824 - Spring 2011
26