Download Statistical Databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Security Methods for
Statistical Databases
by Karen Goodwin
Introduction



Statistical Databases containing medical
information are often used for research
Some of the data is protected by laws to
help protect the privacy of the patient
Proper security precautions must be
implemented to comply with laws and
respect the sensitivity of the data
Accuracy vs. Confidentiality
Accuracy –
Researchers want to
extract accurate and
meaningful data
Confidentiality –
Patients, laws and
database
administrators want to
maintain the privacy
of patients and the
confidentiality of their
information
Laws

Health Insurance Portability and Accountability Act –
HIPAA (Privacy Rule)

Covered organizations must comply by April 14, 2003
Designed to improve efficiency of healthcare system by using
electronic exchange of data and maintaining security
Covered entities (health plans, healthcare clearinghouses,
healthcare providers) may not use or disclose protected
information except as permitted or required
Privacy Rule establishes a “minimum necessary standard” for
the purpose of making covered entities evaluate their current
regulations and security precautions



HIPAA Compliance



Companies offer 3rd Party Certification of
covered entities
Such companies will check your company
and associating companies for compliance
with HIPAA
Can help with rapid implementation and
compliance to HIPAA regulations
Types of Statistical Databases


Static – a static
database is made
once and never
changes
Example: U.S. Census

Dynamic – changes
continuously to reflect
real-time data

Example: most online
research databases
Security Methods







Access Restriction
Query Set Restriction
Microaggregation
Data Perturbation
Output Perturbation
Auditing
Random Sampling
Access Restriction


Databases normally have different access levels
for different types of users
User ID and passwords are the most common
methods for restricting access
 In a medical database:


Doctors/Healthcare Representative – full access to
information
Researchers – only access to partial information
(e.g. aggregate information)
Query Set Restriction



A query-set size control can limit the
number of records that must be in the
result set
Allows the query results to be displayed
only if the size of the query set satisfies
the condition
Setting a minimum query-set size can help
protect against the disclosure of individual
data
Query Set Restriction

Let K represents the minimum number or
records to be present for the query set
Let R represents the size of the query set

The query set can only be displayed if

KR
Query Set Restriction
Query 2
Query 1
Original
Database
Query 1
Results
K
Query
Results
Query 2
Results
K
Query
Results
Microaggregation




Raw (individual) data is grouped into small
aggregates before publication
The average value of the group replaces each
value of the individual
Data with the most similarities are grouped
together to maintain data accuracy
Helps to prevent disclosure of individual data
Microaggregation



National Agricultural Statistics Service (NASS)
publishes data about farms
To protect against data disclosure, data is only
released at the county level
Farms in each county are averaged together to
maintain as much purity, yet still protect against
disclosure
Microaggregation
Age
Microaggregated
Age
10
11.67
12
Average
11.67
13
11.67
57
56.67
54
59
Average
56.67
56.67
Microaggregation
su
lts
Re
Qu
e
ry
User
Original
Data
Averaged
Microaggregated
Data
Data Perturbation



Perturbed data is raw data with noise
added
Pro: With perturbed databases, if
unauthorized data is accessed, the true
value is not disclosed
Con: Data perturbation runs the risk of
presenting biased data
Data Perturbation
User 1
Re
su
lts
Qu
e
ry
Noise Added
Original
Database
Perturbed
Database
Re
ue
Q
su
lts
ry
User 2
Output Perturbation


Instead of the raw data being transformed
as in Data Perturbation, only the output or
query results are perturbed
The bias problem is less severe than with
data perturbation
Output Perturbation
ry
e
u
Q
Query
User 1
Results
su
Re
lts
Noise Added
to Results
Original
Database
Re
Query
Q
ue
sul
ts
Results
ry
User 2
Auditing



Auditing is the process of keeping track of
all queries made by each user
Usually done with up-to-date logs
Each time a user issues a query, the log is
checked to see if the user is querying the
database maliciously
Random Sampling



Only a sample of the records meeting the
requirements of the query are shown
Must maintain consistency by giving exact
same results to the same query
Weakness - Logical equivalent queries
can result in a different query set
Comparison Methods
The following criteria are used to determine the most effective
methods of statistical database security:

Security – possibility of exact disclosure, partial
disclosure, robustness

Richness of Information – amount of nonconfidential information eliminated, bias,
precision, consistency

Costs – initial implementation cost, processing
overhead per query, user education
A Comparison of Methods
Method
Security
Richness of
Information
Costs
Query-set Restriction
Low
Low1
Low
Microaggregation
Moderate
Moderate
Moderate
Data Perturbation
High
High-Moderate
Low
Moderate
Moderate-low
Low
Auditing
Moderate-Low
Moderate
High
Sampling
Moderate
Moderate-Low
Moderate
Output Perturbation
1
Quality is low because a lot of information can be eliminated if the query does not meet the
requirements
Sources


This presentation is posted on
http://www.cs.jmu.edu/users/aboutams
Adam, Nabil R. ; Wortmann, John C.; Security-Control
Methods for Statistical Databases: A Comparative Study;
ACM Computing Surveys, Vol. 21, No. 4, December
1989 (http://delivery.acm.org/10.1145/80000/76895/p515adam.pdf?key1=76895&key2=1947043301&coll=portal&dl=ACM&CFID=4702747&CFTOKEN=83773110)


Official HIPAA – (http://cms.hhs.gov/hipaa/) incur
Bernstein, Stephen W.; Impact of HIPAA on
BioTech/Pharma Research: Rules of the Road
(http://www.privacyassociation.org/docs/3-02bernstein.pdf)

Service Bureau; 3rd Party Testing (http://hipaatesting.com/service_bureau.html)