Download Chapter 6 Database and Data Mining Security

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter
p 6
Database and Data Mining Security
Ch l P.
Charles
P Pfleeger
Pfl
& Sh
Sharii LLawrence Pfl
Pfleeger, SSecurity
i in
i Computing,
C
i
4th Ed., Pearson Education, 2007
1
y In this chapter
y Integrity
I
i for
f ddatabases:
b
recordd iintegrity,
i data
d correctness,
update integrity
y Security
S it for
f databases:
dtb
access control,
t l inference,
if
andd aggregation
ti
y Multilevel secure databases: partitioned, cryptographically sealed,
filt d
filtered
y Security in data mining applications
2
6.1. Introduction to Databases
y Concept of a Database
y A database
d b isi a collection
ll i off data
d andd a set off rules
l that
h organize
i
the data by specifying certain relationships among the data.
y
y
The user ddescribes
Th
ib a logical
l i l format
f
t for
f the
th data.
dt
The precise physical format of the file is of no concern to the user
y A database administrator is a person who defines the rules that
organize the data and also controls who should have access to
what parts of the data.
data
y The user interacts with the database through a program called
a database manager or a database management system (DBMS),
(DBMS)
informally known as a front end.
3
y Components of Databases
y Record
R d – contain
i one related
l d group off data
d
y Each record contains fields or elements
y The
Th llogical
i l structure
t t off a ddatabase
t b iis called
ll d a schema
h
y A particular user may have access to only part of the database,
called
ll d a subschema
b h
Adams
212 Market St.
Columbus
OH
43210
Benchly
501 Union St.
Chicago
IL
60603
Carter
411 Elm St.
Columbus
OH
43210
4
Related Parts of a Database
Schema of Database
Name
First
Address
City
State
Zip
Airport
Adams
Charles
212 Market St.
Columbus
OH
43210
CMH
Adams
Edward
212 Market St.
Columbus
OH
43210
CMH
Benchly
Zeke
501 Union St.
Chicago
IL
60603
ORD
C t
Carter
M l
Marlene
411 Elm St.
El St
C l b
Columbus
OH
43210
CMH
Carter
Beth
411 Elm St.
Columbus
OH
43210
CMH
Carter
Ben
411 Elm St.
Columbus
OH
43210
CMH
Carter
Elisabeth
411 Elm St.
Columbus
OH
43210
CMH
Carter
Mary
411 Elm St.
Columbus
OH
43210
CMH
y The name of each column is called an attribute of the database
y A relation
l ti isi a sett off columns
l
Table 6-3. Relation in a Database.
Namee
Zip
ADAMS
43210
BENCHLY
60603
CARTER
43210
y Queries
y Users
U interact
i
with
i h database
d b managers through
h h commands
d to the
h
DBMS that retrieve, modify, add, or delete fields and records of the
dtb
database.
y A command is called a query.
y
F example,
For
l
SELECT NAME = 'ADAMS'
8
y Queries (Cont’d)
y The
Th resultl off executing
i a query iis a subschema.
b h
y For example, we might select records in which ZIP=43210
Result of Select Query
Name
First
Address
City
State
Zip
Airport
ADAMS
Charles
212 Market St.
Columbus
OH
43210
CMH
ADAMS
Edward
212 Market St.
Columbus
OH
43210
CMH
CARTER
Marlene
411 Elm St.
Columbus
OH
43210
CMH
CARTER
Beth
411 Elm St.
Columbus
OH
43210
CMH
CARTER
Ben
411 Elm St.
Columbus
OH
43210
CMH
CARTER
Lisabeth
b h
411 Elm St.
l
Columbus
l b
OH
43210
CMH
CARTER
Mary
411 Elm St.
Columbus
OH
43210
CMH
9
y Queries (Cont’d)
y Other,
O h more complex,
l selection
l i criteria
i i are possible,
ibl with
i h logical
l i l
operators such as and (∧) and or (∨), and comparisons such as less
th (<).
than
( )
y An example of a select query is
SELECT (ZIP='43210') ∧ (NAME='ADAMS')
10
y Queries (Cont’d)
y After
Af hhaving
i selected
l d records,
d we may project
j these
h records
d onto
one or more attributes.
y
y
The select
Th
l t operation
ti id
identifies
tifi certain
t i rows ffrom th
the ddatabase
tb
A project operation extracts the values from certain fields (columns) of those
records.
y For example, we might
y Select records meeting the condition ZIP=43210
y Project the results onto the attributes NAME and FIRST,
11
Results of Select-Project
Select Project Query
ADAMS
Charles
ADAMS
Ed
Edward
d
CARTER
Marlene
CARTER
Beth
CARTER
Ben
CARTER
Lisabeth
CARTER
Mary
12
y Queries (Cont’d)
y Notice
N i that
h we do
d not have
h to project
j onto the
h same attribute(s)
ib ( ) on
which the selection is done. For example, we can build a query
using
i ZIP andd NAME bbutt project
j t th
the resultlt onto
t FIRST
FIRST:
SHOW FIRST WHERE (ZIP='43210')
(ZIP '43210') ∧ (NAME='ADAMS')
(NAME 'ADAMS')
y The
Th resultlt would
ld bbe a lilistt off th
the fifirstt names off people
l whose
h llastt
names are ADAMS and ZIP is 43210.
13
y Queries (Cont’d)
y We
W can also
l merge two subschema
b h
on a common element
l
by
b using
i
a join query.
14
y Advantage of Using Databases
y A ddatabase
b iis a single
i l collection
ll i off ddata, storedd andd maintained
i i d at
one central location, to which many people may have access as
needed
dd
y The users are unaware of the physical arrangements; the unified
l i l arrangementt iis allll they
logical
th see
15
y Advantage of Using Databases
y Shared
Sh d access – users use one common, centralized
li d set off data
d
y Minimal redundancy – users do not have to collect and maintain
theiri own sets
th
t off ddata
t
y Data consistency – change to a data value affects all users of the
d t value
data
l
y Data integrity – data values are protected against accidental or
malicious
li i undesirable
d i bl changes
h
y Controlled access – only authorized users are allowed to view or
t modify
to
dif data
d t values
l
16
6.2. Securityy Requirements
q
y A list of requirements for database security.
y Physical
Ph i l database
d b integrity.
i
i
y The data of a database are immune to physical problems, such as power
failures and someone can reconstruct the database if it is destroyed through
failures,
a catastrophe.
y Logical
g database integrity.
g y
y The structure of the database is preserved. With logical integrity of a
database, a modification to the value of one field does not affect other
fields, for example.
17
y A list of requirements for database security. (Cont’d)
y Element
El
integrity.
i
i
y The data contained in each element are accurate.
y Auditability.
Auditability
y It is possible to track who or what has accessed (or modified) the elements
in the database.
y Access control.
y A user is allowed to access only authorized data, and different users can be
restricted to different modes of access (such as read or write).
18
y A list of requirements for database security. (Cont’d)
y User
U authentication.
h i i
y Every user is positively identified, both for the audit trail and for permission
to access certain data.
data
y Availability.
y Users can access the database in ggeneral and all the data for which theyy are
authorized.
19
y Integrity of the Database
y Two
T situations
i i can affect
ff the
h iintegrity
i off a ddatabase:
b
y when the whole database is damaged
y when individual data items are unreadable.
unreadable
y Integrity of the database as a whole is the responsibility of
y The DBMS
y The operating system
y The (human) computing system manager.
20
y Integrity of the Database (Cont’d)
y Sometimes
S
i iti isi iimportant to bbe able
bl to reconstruct the
h database
d b at
the point of a failure.
y
y
The DBMS mustt maintain
Th
i t i a llog off ttransactions.
ti
The system can obtain accurate account balances by reverting to a backup
py of the database and reprocessing
p
g all later transactions from the log.g
copy
21
y Element Integrity
y The integrity of database elements is their correctness or accuracy.
y This corrective action can be taken in three ways .
y Field checks - activities that test for appropriate values in a position.
y Access control
y A change log - A change log lists every change made to the database; it
contains both original and modified values. Using this log, a database
administrator can undo any changes that were made in error.
22
y Auditability
y For
F some applications
li i iti may be
b desirable
d i bl to generate an audit
di
record of all access (read or write) to a database.
y Such
S h a recordd can help
h l tto maintain
i t i the
th ddatabase's
t b ' integrity,
i t it or att
least to discover after the fact who had affected which values and
when.
h
23
y Access Control
y Databases
D b are often
f separatedd logically
l i ll by
b user access privileges.
i il
y User Authentication
y The
Th DBMS can require
i rigorous
i
user authentication.
th ti ti
y A DBMS might insist that a user pass both specific password and
titime-of-day
f d checks.
h k
y This authentication supplements the authentication performed by
th operating
the
ti system.
t
24
y Availability
y Integrity/Confidentiality/Availability – Computer Security
y Integrity
I t it iis a major
j concern in
i the
th ddesign
i off ddatabase
t b managementt
systems.
y Confidentiality
C fid ti lit iis a kkey iissue with
ith ddatabases
t b bbecause off th
the
inference problem,
y availability
il bilit iis iimportant
t t bbecause off th
the shared
h d access motivation
ti ti
underlying database development.
25
6.3. Reliabilityy and Integrity
g y
y Databases amalgamate data from many sources, and users expect a
DBMS to provide
id access to the
h ddata iin a reliable
li bl way.
y Reliability - mean that the software runs for very long periods of time
without
ith t ffailing.
ili
26
y Database concerns about reliability and integrity can be viewed from
three
h dimensions:
di
i
y Database integrity: concern that the database as a whole is
protected
t t d against
i t ddamage
y Element integrity: concern that the value of a specific data
element
l
t iis written
itt or changed
h d only
l bby authorized
th i d users.
y Element accuracy: concern that only correct values are written
i t the
into
th elements
l
t off a database.
dtb
27
y Protection Features from the Operating System
y A responsible
ibl system administrator
d ii
backs
b k up the
h files
fil off a ddatabase
b
periodically along with other user files.
y The
Th files
fil are protected
t t d dduring
i normall execution
ti against
i t
outside access by the operating system's standard access control
f iliti
facilities.
y Finally, the operating system performs certain integrity checks for all
d t as a partt off normall readd andd write
data
it operations
ti for
f I/O devices.
d i
28
y Two-Phase Update
y A serious
i problem
bl for
f a ddatabase
b manager iis the
h failure
f il off the
h
computing system in the middle of modifying data.
y If the
th data
d t item
it tto bbe modified
difi d was a llong fifield,
ld hhalflf off th
the fifield
ld
might show the new value, while the other half would contain the
old.
ld
29
y Two-Phase Update (Cont’d)
y Update
U d Technique
T h i
y The intent phase - the DBMS gathers the resources it needs to perform the
update
y Committing, involves the writing of a commit flag to the database. The
commit flag means that the DBMS has passed the point of no return: After
committing, the DBMS begins making permanent changes.
30
y Two-Phase Update (Cont’d)
y Example
E
l
1.
The stockroom checks the database to determine that 50 boxes of paper
clips are on hand.
hand If not,
not the requisition is rejected and the transaction is
finished.
2.
If enough paper clips are in stock, the stockroom deducts 50 from the
inventory figure in the database (107 - 50 = 57).
3.
The stockroom charges accounting's supplies budget (also in the database)
for 50 boxes of paper clips.
31
y Two-Phase Update (Cont’d)
y Example
E
l (Cont’d)
(C ’d)
4.
The stockroom checks its remaining quantity on hand (57) to determine
whether the remaining quantity is below the reorder point
point. Because it is,
is a
notice to order more paper clips is generated, and the item is flagged as
"on order" in the database.
5.
A delivery order is prepared, enabling 50 boxes of paper clips to be sent
to accounting.
All five of these steps must be completed in the order listed for the database
to be accurate and for the transaction to be processed correctly.
32
y Two-Phase Update (Cont’d)
y Example
E
l (Cont’d)
(C ’d)
y When a two-phase commit is used, shadow values are maintained for key
data points.
points A shadow data value is computed and stored locally during the
intent phase, and it is copied to the actual database during the commit
phase.
33
y Two-Phase Update (Cont’d)
y Example
E
l (Cont’d)
(C ’d)
y Intent:
y
y
y
y
y
Check the value of COMMIT
COMMIT-FLAG
FLAG in the database. If it is set, this phase cannot be
performed. Halt or loop, checking COMMIT-FLAG until it is not set.
Compare number of boxes of paper clips on hand to number requisitioned; if more are
requisitioned than are on hand,
hand halt.
halt
Compute TCLIPS = ONHAND - REQUISITION.
Obtain BUDGET, the current supplies budget remaining for accounting department.
C
Compute
TBUDGET
B DG = BBUDGET
DG - COST,
COS where
h COST
COS is the
h cost off 500 boxes
b
off clips.
l
Check whether TCLIPS is below reorder point; if so, set TREORDER = TRUE; else set
TREORDER = FALSE
34
y Two-Phase Update (Cont’d)
y Example
E
l (Cont’d)
(C ’d)
y Commit:
y Set COMMIT-FLAG in database.
database
y Copy TCLIPS to CLIPS in database.
y Copy TBUDGET to BUDGET in database.
y Copy TREORDER to REORDER in database.
y Prepare notice to deliver paper clips to accounting department. Indicate
transaction completed in log.
y Unset COMMIT-FLAG.
35
y Redundancy/Internal Consistency
y Error
E Detection
D
i andd Correction
C
i Codes
C d
y Shadow Fields
y Entire
E ti attributes
tt ib t or entire
ti records
d can bbe dduplicated
li t d iin a ddatabase.
t b If th
the
data are irreproducible, this second copy can provide an immediate
p
if an error is detected.
replacement
36
y Recovery
y In
I addition
ddi i to these
h error correction
i processes, a DBMS can maintain
i i
a log of user accesses, particularly changes. In the event of a failure,
th database
the
d t b isi reloaded
l d d from
f a bbackup
k copy andd allll llater
t changes
h
are then applied from the audit log.
37
y Concurrency/Consistency
y Database
D b systems are often
f multiuser
l i systems.
y If both users try to modify the same data items, we often assume
thatt there
th
th iis no conflict
fli t because
b
eachh knows
k
what
h t tto write;
it th
the
value to be written does not depend on the previous value of the
d t item.
data
it HHowever, thi
this supposition
iti isi nott quite
it accurate.
t
38
y Concurrency/Consistency (Cont’d)
y E.g.,
E
y Agent A submits the update command
SELECT (SEAT-NO = '11D')
11D ) ASSIGN 'MOCK
MOCK, E'
E TO PASSENGER-NAME
y while Agent B submits the update sequence
SELECT (SEAT-NO = '11D') ASSIGN 'EHLERS, P' TO PASSENGER-NAME
y To resolve this problem, a DBMS treats the entire queryupdate cycle
as a single atomic operation.
39
y Monitors
y The
Th monitor
i isi the
h uniti off a DBMS responsible
ibl for
f the
h structurall
integrity of the database.
y Forms
F
off monitors
it
y
y
y
Range Comparisons
y A range comparison monitor tests each new value to ensure that the
value is within an acceptable range
y Filters or patterns are more general types of data form checks.
State constraints describe the condition of the entire database.
Transition constraints describe conditions necessary before changes can be
applied to a database.
40
6.4. Sensitive Data
y Sensitive data are data that should not be made public.
y There
Th exist
i cases that
h some but
b not allll off the
h elements
l
iin the
h
database are sensitive.
y There
Th may bbe varying
i ddegrees off sensitivity.
iti it
41
y Several factors can make data sensitive.
y Inherently
Ih
l sensitive.
ii
y The value itself may be so revealing that it is sensitive. Examples are the
locations of defensive missiles.
missiles
y From a sensitive source.
y The source of the data mayy indicate a need for confidentiality.
y An example
p
is information from an informer whose identity would be compromised if the
information were disclosed.
42
y Several factors can make data sensitive (Cont’d)
y Declared
D l d sensitive.
ii
y The database administrator or the owner of the data may have declared the
data to be sensitive.
sensitive
y Part of a sensitive attribute or a sensitive record.
y In a database,, an entire attribute or record mayy be classified as sensitive.
y Sensitive in relation to previously disclosed information.
y Some data become sensitive in the presence of other data.
y For example, the longitude coordinate of a secret gold mine reveals little,
but the longitude coordinate in conjunction with the latitude coordinate
pinpoints the mine.
43
y Access Decisions
y The
Th DBMS may consider
id severall factors
f
when
h ddeciding
idi whether
h h to
permit an access.
y
y
y
AAvailability
il bilit off the
th data
dt
Acceptability of the access
Authenticityy of the user.
44
y Access Decisions (Cont’d)
y Availability
A il bili off Data
D
y One or more required elements may be inaccessible.
y For example,
example if a user is updating several fields,
fields other users'
users accesses to
those fields must be blocked temporarily. This blocking ensures that users
do not receive inaccurate information
y Acceptability of Access
y One or more values of the record may be sensitive and not accessible by
the general user. A DBMS should not release sensitive data to unauthorized
individuals.
45
y Access Decisions (Cont’d)
y Assurance
A
off Authenticity
A h ii
y Certain characteristics of the user external to the database may also be
considered when permitting access.
access
y For example, to enhance security, the database administrator may permit
someone to access the database only at certain times, such as during
working hours.
46
y Access Decisions (Cont’d)
y Types
T
off Disclosures
Di l
y Exact Data - The most serious disclosure is the exact value of a sensitive
data item itself
y Bounds - Another exposure is disclosing bounds on a sensitive value; that is,
indicating that a sensitive value, y, is between two values, L and H.
y Negative Result - Sometimes we can word a query to determine a negative
result. That is, we can learn that z is not the value of y.
y Existence - The existence of data is itself a sensitive piece of data.
47
y Access Decisions (Cont’d)
y Types
T
off Di
Disclosures
l
(C
(Cont’d)
’d)
y Probable Value - it may be possible to determine the probability that a
certain element has a certain value.
value
48
y Security versus Precision
49
6.5. Inference
y Inference is a way to infer or derive sensitive data from nonsensitive
ddata.
Sample Database
Name
Sex
Race
Aid
Fines
Drugs
Dorm
Adams
M
C
5000
45.
1
Holmes
Bailey
M
B
0
0.
0
Grey
Chin
F
A
3000
20.
0
West
Dewitt
M
B
1000
35.
3
Grey
Earhart
F
C
2000
95.
1
Holmes
Fein
F
C
1000
5
15.
0
West
Groff
M
C
4000
0.
3
West
Hill
F
B
5000
10.
2
Holmes
Koch
F
C
0
0.
1
West
Liu
F
A
0
10.
2
Grey
Majors
M
C
2000
0.
2
Grey
50
y Direct Attack
y A user tries
i to ddetermine
i values
l off sensitive
i i fifields
ld by
b seeking
ki them
h
directly with queries that yield few records.
y A sensitive
iti query might
i ht be
b
List
NAME
where
h
SEX M ∧ DRUGS=1
SEX=M
DRUGS 1
This query di
Thi
discloses
l
th
thatt ffor recordd ADAMS
ADAMS, DRUGS
DRUGS=1.
1 HHowever, it isi an obvious
bi
attack because it selects people for whom DRUGS=1, and the DBMS might
reject
j the query
q y because it selects records for a specific
p
value of the sensitive
attribute DRUGS.
51
y Direct Attack (Cont’d)
y A lless obvious
b i query iis
List
NAME
where
h
(SEX M ∧ DRUGS=1)
(SEX=M
DRUGS 1) ∨
(SEX≠M ∧ SEX ≠ F) ∨
(DORM AYRES)
(DORM=AYRES)
This query still retrieves only one record, revealing a name that corresponds to
the sensitive DRUG value.
value The DBMS needs to know that SEX has only two
possible values so that the second clause will select no records. Even if that
were possible, the DBMS would also need to know that no records exist with
DORM=AYRES, even though AYRES might in fact be an acceptable value for
DORM.
52
y Direct Attack (Cont’d)
y Do
D not reveall results
l when
h a smallll number
b off people
l make
k up a
large proportion of a category.
y The
Th rule
l off "n
" items
it over k percent"t" means th
thatt ddata
t should
h ld bbe
withheld if n items represent over k percent of the result reported.
53
y Indirect Attack
y Sum
S - An
A attackk by
b sum tries
i to infer
i f a value
l from
f a reportedd sum.
y Count - The count can be combined with the sum to produce
some even more revealing
li results.
lt
y Mean - The arithmetic mean (average) allows exact disclosure if the
attacker
tt k can manipulate
i l t th
the subject
bj t population.
l ti
y Median
54
55
y Tracker Attacks
y A tracker
k attackk can fool
f l the
h database
d b manager iinto locating
l i the
h
desired data by using additional queries that produce small results.
y The
Th ttracker
k adds
dd additional
dditi l records
d tto bbe retrieved
t i d ffor ttwo diff
differentt
queries; the two sets of records cancel each other out, leaving only
th statistic
the
t ti ti or ddata
t ddesired.
i d Th
The approachh iis tto use iintelligent
t lli t
padding of two queries.
y In
I other
th words,
d iinstead
t d off ttrying
i tto id
identify
tif a unique
i value,
l we
request n - 1 other values (where there are n values in the
d t b ) Given
database).
Gi n andd n - 1,
1 we can easily
il compute
t th
the ddesired
i d
single element.
56
y Tracker Attacks (Cont’d)
y For
F iinstance, suppose we wish
i h to kknow hhow many female
f l
Caucasians live in Holmes Hall. A query posed might be
count ((SEX=F) (RACE=C) (DORM=Holmes))
The database management system might consult the database, find
th t the
that
th answer isi 1,
1 andd refuse
f to
t answer that
th t query bbecause one
record dominates the result of the query.
57
y Tracker Attacks (Cont’d)
y The
Th query
q=count((SEX=F) ∧ (RACE=C) ∧ (DORM=Holmes))
isi off th
the form
f
q = count(a ∧ b ∧ c)
y By
B using
i th
the rules
l off llogici andd algebra,
l b we can ttransform
f thi
this query
to
q = count(a
t( ∧ b ∧ c)) = count(a)
t( ) - count(a
t( ∧ ¬(b
(b ∧ c))))
58
y Tracker Attacks (Cont’d)
y Thus,
Th the
h original
i i l query iis equivalent
i l to
count (SEX=F)
minus
i
count ((SEX=F) ∧ ((RACE ≠ C) ∨ (DORM ≠ Holmes)))
59
y Controls for Statistical Inference Attacks
y Suppression
S
i - sensitive
i i data
d values
l are not provided;
id d the
h query isi
rejected without response.
y Concealing
C
li - the
th answer provided
id d isi close
l tto but
b t nott exactly
tl th
the
actual value.
60
y Controls for Statistical Inference Attacks (Cont’d)
y These
Th two controlsl reflect
fl the
h contrast bbetween security
i andd
precision.
y
y
With suppression,
i any results
lt provided
id d are correct,t yett many responses mustt
be withheld to maintain security.
With concealing,g, more results can be provided,
p
, but the precision
p
of the
results is lower.
The choice between suppression and concealing depends on the
context of the database.
61
y Random Sample
y With
Wi h random
d sample
l control,l a resultl iis not dderived
i d from
f the
h whole
hl
database; instead the result is computed on a random sample of
th database.
the
dtb
y The sample chosen is large enough to be valid.
y Random
R d DData
t PPerturbation
t b ti
y It is sometimes useful to perturb the values of the database by a
smallll error.
y Generate a small random error term εi and add it to xi for statistical
results.
lt
62
y Query Analysis
y A more complex
l form
f off security
i uses query analysis.
l i
y Here, a query and its implications are analyzed to determine
whether
h th a resultlt should
h ld be
b provided.
id d
63
y Conclusion on the Inference Problem
y No
N perfect
f solutions
l i to the
h inference
if
problem.
bl
y The approaches to controlling it
y Suppress
S
obviously
b i l sensitive
iti information.
if
ti
y Track what the user knows.
y Disguise
g the data.
64
y Aggregation
y Building
B ildi sensitive
i i results
l from
f less
l sensitive
i i inputs.
i
y Data mining is the process of sifting through multiple databases and
correlating
l ti multiple
lti l ddata
t elements
l
t tto fifindd useful
f l iinformation
f
ti
65
6.6. Multilevel Databases
y The Case for Differentiated Security
Name
Department
Salary
Phone
Performance
Rogers
g
training
g
43,800
43,
4‐5067
4
5 7
A2
Jenkins
research
62,900
6‐4281
D4
Poling
training
38,200
4‐4501
B1
Garland
user services
54,600
6‐6600
A4
Hilten
user services
5
44,500
4‐5351
535
B1
Davis
administration
51,400
4‐9505
A3
66
y The Case for Differentiated Security (Cont’d)
y Three
Th characteristics
h
i i off database
d b security
i emerge.
y The security of a single element may be different from the security of other
elements of the same record or from other values of the same attribute.
attribute
This situation implies that security should be implemented for each
individual element.
y Two levels sensitive and non-sensitive are inadequate to represent some
security situations. Several grades of security may be needed.
y The security of an aggregate a sum, a count, or a group of values in a
database may differ from the security of the individual elements. The
security of the aggregate may be higher or lower than that of the individual
elements.
67
6.7. Proposals
p
for Multilevel Securityy
y Separation
y Partitioning
P ii i
y The database is divided into separate databases, each at its own level of
sensitivity.
sensitivity
y This control destroys a basic advantage of databases: elimination of
redundancy and improved accuracy through having only one field to update.
y It does not address the problem of a high-level user who needs access to
some low-level data combined with high-level data.
68
y Separation (Cont’d)
y Encryption
E
i
y If sensitive data are encrypted, a user who accidentally receives them
cannot interpret the data.
data
69
Cryptographic Separation: Different Encryption Keys.
Cryptographic Separation: Block Chaining
70
y Separation (Cont’d)
y Integrity
I
i Lock
L k
y First proposed at the U.S. Air Force Summer Study on Data Base
Security [AFS83].
[AFS83]
y The lock is a way to provide both integrity and limited access for a database.
y The operation was nicknamed "spray paint" because each element is
figuratively painted with a color that denotes its sensitivity.
71
y Separation (Cont’d)
y Integrity
I
i Lock
L k (Cont’d)
(C ’d)
y The sensitivity label should be
y unforgeable,
unforgeable so that a malicious subject cannot create a new sensitivity
level for an element
y unique, so that a malicious subject cannot copy a sensitivity level from
another element
y concealed, so that a malicious subject cannot even determine the
sensitivity level of an arbitrary element
72
y Separation (Cont’d)
y Integrity
I
i Lock
L k (Cont’d)
(C ’d)
y The third piece of the integrity lock for a field is an error-detecting code,
called a cryptographic checksum.
checksum
73
y Separation (Cont’d)
y Sensitivity
S i i i Lock
L k
y A sensitivity lock is a combination of a
unique identifier (such as the record
number) and the sensitivity level.
y
y
y
Because the identifier is unique, each lock relates
to one particular record.
Many different elements will have the same
sensitivity level.
A malicious subject should not be able to
identify two elements having identical sensitivity
levels or identical data values jjust byy lookingg at
the sensitivity level portion of the lock.
74
y Designs of Multilevel Secure Database
y Integrity
I
i Lock
L k
y A short-term solution to the security problem for multilevel databases.
y The intention was to be able to use any (untrusted) database manager with
a trusted procedure that handles access control.
y The sensitive data were obliterated or concealed with encryption that
protected both a data item and its sensitivity.
75
Trusted Database Manager
76
y Designs of Multilevel Secure Database
y Integrity
I
i Lock
L k (Cont’d)
(C ’d)
y The efficiency of integrity locks is a serious drawback.
y The space needed for storing an element must be expanded to contain
the sensitivity label.
y The processing time efficiency of an integrity lock.
y The untrusted database manager sees all data
77
y Trusted Front End
y Trusted
T d front
f endd iis also
l kknown as a guardd andd operates muchh like
lik
the reference monitor
78
y Trusted Front End (Cont’d)
1.
2.
3.
4.
5.
6.
A user identifies
id ifi himself
hi lf or hherselflf to the
h ffront end;
d the
h ffront endd
authenticates the user's identity.
Th user iissues a query tto th
The
the front
f t end.
d
The front end verifies the user's authorization to data.
Th front
The
f t endd iissues a query tto th
the ddatabase
t b manager.
The database manager performs I/O access, interacting with lowl l access control
level
t l tto achieve
hi access tto actual
t l data.
dt
The database manager returns the result of the query to the
t t d ffrontt end.
trusted
d
79
y Trusted Front End (Cont’d)
The front
Th
f endd analyzes
l
the
h sensitivity
i i i llevelsl off the
h data
d items
i
iin
the result and selects those items consistent with the user's
security
it level.
l l
8. The front end transmits selected data to the untrusted front end
f formatting.
for
f
tti
9. The untrusted front end transmits formatted data to the user.
7.
80
y Commutative Filters
y A process that
h forms
f
an interface
i f between
b
the
h user andd a DBMS.
DBMS
y The filter reformats the query so that the database manager does as much
of the work as possible
possible, screening out many unacceptable records
records.
y The filter then provides a second screening to select only data to which the
user has access.
81
82
y Distributed Databases
y Distributed
Di ib d or federated
fd
d database
d b
y A trusted front end controls access to two unmodified commercial
DBMSs:
DBMS
y
y
one for all low-sensitivity data and
one for all high-sensitivity data.
data
y The distributed database design is not popular because the front
end, which must be trusted
end
trusted, is complex
complex, potentially including most
of the functionality of a full DBMS itself.
83
y Window/View
y One
O off the
h advantages
d
off using
i a DBMS ffor multiple
l i l users off
different interests (but not necessarily different sensitivity levels) is
th ability
the
bilit to
t create
t a different
diff t view
i ffor eachh user.
y Each user is restricted to a picture of the data reflecting only what
th user needs
the
d to
t see.
y A window (or a view) is a subset of a database, containing exactly
th information
the
if
ti that
th t a user isi entitled
titl d tto access.
y A view can represent a single user's subset database so that all of a
user's' queries
i access only
l that
th t database.
dtb
84
(a) Airline's View.
FLT#
ORIG
DEST
DEP
ARR
CAP
TYPE
PILOT
TAIL
362
JFK
BWI
0830
0950
114
PASS
Dosser 2463
397
JFK
ORD
0830
1020
114
PASS
Botto
ms
202
IAD
LGW
1530
0710
183
PASS
Jevins 2007
749
LGA
ATL
0947
1120
0
CARG Witt
O
286
STA
SFO
1020
1150
117
PASS
3621
3116
Gross 4026
…
…
85
( )
(b) Travel Agent's View.
g
FLT
ORIG DEST DEP
ARR
CAP
3362
JJFK
BWI
0830
3
0950
95
114
4
397
JFK
ORD
0830
1020
114
202
IAD
LGW
1530
0710
183
286
STA
SFO
1020
1150
117
…
…
86
Secure Database Decomposition
87
6.8. Data Miningg
y Databases are great repositories of data. More data are being collected
andd saved.
d
y But to find needles of information in those vast fields of haystacks of
d t requires
data
i iintelligent
t lli t analyzing
l i andd querying
i off the
th data.
dt
y Indeed, a whole specialization, called data mining, has emerged.
y In
I a llargely
l automated
t t d way, ddata
t mining
i i applications
li ti sortt andd searchh
thorough data.
88
y Data mining uses statistics, machine learning, mathematical models,
pattern recognition,
i i andd other
h techniques
hi
to di
discover patterns andd
relations on large datasets.
y Data
D t mining
i i tools
t l use association
i ti (one
( eventt often
ft goes with
ith another),
th )
sequences (one event often leads to another), classification (events
exhibit
hibit patterns,
tt
ffor example
l coincidence),
i id ) clustering
l t i ((some ititems hhave
similar characteristics), and forecasting (past events foretell future ones).
y Data
D t mining
i i presents
t probable
b bl relationships,
l ti hi bbutt th
these are nott
necessarily cause-and-effect relationships.
89
y Privacy and Sensitivity
y Because
B
the
h goall off ddata mining
i i iis summary results,
l not iindividual
di id l
data items, you would not expect a problem with sensitivity of
i di id l data
individual
d t items.
it
y Unfortunately that is not true. Why ??? ☺
90
6.9. Summaryy of Database Securityy
y Address three aspects of security for database management systems:
y Confidentiality
C fid i li andd iintegrity
i problems
bl specific
ifi to ddatabase
b
applications
y
y
CConfidentiality
fid ti lit can bbe broken
b k by
b indirect
i di t disclosure
di l
off a negative
ti resultlt or off
the bounds of a value.
Integrity
g y of the entire database is a responsibility
p
y of the DBMS software
y The inference problem for statistical databases
y Arise from the mathematical relationships between data elements and query
results.
y Problems of including users and data of different sensitivity levels in
one database.
91