* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Chapter 6 Database and Data Mining Security
Oracle Database wikipedia , lookup
Microsoft Access wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Ingres (database) wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
Chapter p 6 Database and Data Mining Security Ch l P. Charles P Pfleeger Pfl & Sh Sharii LLawrence Pfl Pfleeger, SSecurity i in i Computing, C i 4th Ed., Pearson Education, 2007 1 y In this chapter y Integrity I i for f ddatabases: b recordd iintegrity, i data d correctness, update integrity y Security S it for f databases: dtb access control, t l inference, if andd aggregation ti y Multilevel secure databases: partitioned, cryptographically sealed, filt d filtered y Security in data mining applications 2 6.1. Introduction to Databases y Concept of a Database y A database d b isi a collection ll i off data d andd a set off rules l that h organize i the data by specifying certain relationships among the data. y y The user ddescribes Th ib a logical l i l format f t for f the th data. dt The precise physical format of the file is of no concern to the user y A database administrator is a person who defines the rules that organize the data and also controls who should have access to what parts of the data. data y The user interacts with the database through a program called a database manager or a database management system (DBMS), (DBMS) informally known as a front end. 3 y Components of Databases y Record R d – contain i one related l d group off data d y Each record contains fields or elements y The Th llogical i l structure t t off a ddatabase t b iis called ll d a schema h y A particular user may have access to only part of the database, called ll d a subschema b h Adams 212 Market St. Columbus OH 43210 Benchly 501 Union St. Chicago IL 60603 Carter 411 Elm St. Columbus OH 43210 4 Related Parts of a Database Schema of Database Name First Address City State Zip Airport Adams Charles 212 Market St. Columbus OH 43210 CMH Adams Edward 212 Market St. Columbus OH 43210 CMH Benchly Zeke 501 Union St. Chicago IL 60603 ORD C t Carter M l Marlene 411 Elm St. El St C l b Columbus OH 43210 CMH Carter Beth 411 Elm St. Columbus OH 43210 CMH Carter Ben 411 Elm St. Columbus OH 43210 CMH Carter Elisabeth 411 Elm St. Columbus OH 43210 CMH Carter Mary 411 Elm St. Columbus OH 43210 CMH y The name of each column is called an attribute of the database y A relation l ti isi a sett off columns l Table 6-3. Relation in a Database. Namee Zip ADAMS 43210 BENCHLY 60603 CARTER 43210 y Queries y Users U interact i with i h database d b managers through h h commands d to the h DBMS that retrieve, modify, add, or delete fields and records of the dtb database. y A command is called a query. y F example, For l SELECT NAME = 'ADAMS' 8 y Queries (Cont’d) y The Th resultl off executing i a query iis a subschema. b h y For example, we might select records in which ZIP=43210 Result of Select Query Name First Address City State Zip Airport ADAMS Charles 212 Market St. Columbus OH 43210 CMH ADAMS Edward 212 Market St. Columbus OH 43210 CMH CARTER Marlene 411 Elm St. Columbus OH 43210 CMH CARTER Beth 411 Elm St. Columbus OH 43210 CMH CARTER Ben 411 Elm St. Columbus OH 43210 CMH CARTER Lisabeth b h 411 Elm St. l Columbus l b OH 43210 CMH CARTER Mary 411 Elm St. Columbus OH 43210 CMH 9 y Queries (Cont’d) y Other, O h more complex, l selection l i criteria i i are possible, ibl with i h logical l i l operators such as and (∧) and or (∨), and comparisons such as less th (<). than ( ) y An example of a select query is SELECT (ZIP='43210') ∧ (NAME='ADAMS') 10 y Queries (Cont’d) y After Af hhaving i selected l d records, d we may project j these h records d onto one or more attributes. y y The select Th l t operation ti id identifies tifi certain t i rows ffrom th the ddatabase tb A project operation extracts the values from certain fields (columns) of those records. y For example, we might y Select records meeting the condition ZIP=43210 y Project the results onto the attributes NAME and FIRST, 11 Results of Select-Project Select Project Query ADAMS Charles ADAMS Ed Edward d CARTER Marlene CARTER Beth CARTER Ben CARTER Lisabeth CARTER Mary 12 y Queries (Cont’d) y Notice N i that h we do d not have h to project j onto the h same attribute(s) ib ( ) on which the selection is done. For example, we can build a query using i ZIP andd NAME bbutt project j t th the resultlt onto t FIRST FIRST: SHOW FIRST WHERE (ZIP='43210') (ZIP '43210') ∧ (NAME='ADAMS') (NAME 'ADAMS') y The Th resultlt would ld bbe a lilistt off th the fifirstt names off people l whose h llastt names are ADAMS and ZIP is 43210. 13 y Queries (Cont’d) y We W can also l merge two subschema b h on a common element l by b using i a join query. 14 y Advantage of Using Databases y A ddatabase b iis a single i l collection ll i off ddata, storedd andd maintained i i d at one central location, to which many people may have access as needed dd y The users are unaware of the physical arrangements; the unified l i l arrangementt iis allll they logical th see 15 y Advantage of Using Databases y Shared Sh d access – users use one common, centralized li d set off data d y Minimal redundancy – users do not have to collect and maintain theiri own sets th t off ddata t y Data consistency – change to a data value affects all users of the d t value data l y Data integrity – data values are protected against accidental or malicious li i undesirable d i bl changes h y Controlled access – only authorized users are allowed to view or t modify to dif data d t values l 16 6.2. Securityy Requirements q y A list of requirements for database security. y Physical Ph i l database d b integrity. i i y The data of a database are immune to physical problems, such as power failures and someone can reconstruct the database if it is destroyed through failures, a catastrophe. y Logical g database integrity. g y y The structure of the database is preserved. With logical integrity of a database, a modification to the value of one field does not affect other fields, for example. 17 y A list of requirements for database security. (Cont’d) y Element El integrity. i i y The data contained in each element are accurate. y Auditability. Auditability y It is possible to track who or what has accessed (or modified) the elements in the database. y Access control. y A user is allowed to access only authorized data, and different users can be restricted to different modes of access (such as read or write). 18 y A list of requirements for database security. (Cont’d) y User U authentication. h i i y Every user is positively identified, both for the audit trail and for permission to access certain data. data y Availability. y Users can access the database in ggeneral and all the data for which theyy are authorized. 19 y Integrity of the Database y Two T situations i i can affect ff the h iintegrity i off a ddatabase: b y when the whole database is damaged y when individual data items are unreadable. unreadable y Integrity of the database as a whole is the responsibility of y The DBMS y The operating system y The (human) computing system manager. 20 y Integrity of the Database (Cont’d) y Sometimes S i iti isi iimportant to bbe able bl to reconstruct the h database d b at the point of a failure. y y The DBMS mustt maintain Th i t i a llog off ttransactions. ti The system can obtain accurate account balances by reverting to a backup py of the database and reprocessing p g all later transactions from the log.g copy 21 y Element Integrity y The integrity of database elements is their correctness or accuracy. y This corrective action can be taken in three ways . y Field checks - activities that test for appropriate values in a position. y Access control y A change log - A change log lists every change made to the database; it contains both original and modified values. Using this log, a database administrator can undo any changes that were made in error. 22 y Auditability y For F some applications li i iti may be b desirable d i bl to generate an audit di record of all access (read or write) to a database. y Such S h a recordd can help h l tto maintain i t i the th ddatabase's t b ' integrity, i t it or att least to discover after the fact who had affected which values and when. h 23 y Access Control y Databases D b are often f separatedd logically l i ll by b user access privileges. i il y User Authentication y The Th DBMS can require i rigorous i user authentication. th ti ti y A DBMS might insist that a user pass both specific password and titime-of-day f d checks. h k y This authentication supplements the authentication performed by th operating the ti system. t 24 y Availability y Integrity/Confidentiality/Availability – Computer Security y Integrity I t it iis a major j concern in i the th ddesign i off ddatabase t b managementt systems. y Confidentiality C fid ti lit iis a kkey iissue with ith ddatabases t b bbecause off th the inference problem, y availability il bilit iis iimportant t t bbecause off th the shared h d access motivation ti ti underlying database development. 25 6.3. Reliabilityy and Integrity g y y Databases amalgamate data from many sources, and users expect a DBMS to provide id access to the h ddata iin a reliable li bl way. y Reliability - mean that the software runs for very long periods of time without ith t ffailing. ili 26 y Database concerns about reliability and integrity can be viewed from three h dimensions: di i y Database integrity: concern that the database as a whole is protected t t d against i t ddamage y Element integrity: concern that the value of a specific data element l t iis written itt or changed h d only l bby authorized th i d users. y Element accuracy: concern that only correct values are written i t the into th elements l t off a database. dtb 27 y Protection Features from the Operating System y A responsible ibl system administrator d ii backs b k up the h files fil off a ddatabase b periodically along with other user files. y The Th files fil are protected t t d dduring i normall execution ti against i t outside access by the operating system's standard access control f iliti facilities. y Finally, the operating system performs certain integrity checks for all d t as a partt off normall readd andd write data it operations ti for f I/O devices. d i 28 y Two-Phase Update y A serious i problem bl for f a ddatabase b manager iis the h failure f il off the h computing system in the middle of modifying data. y If the th data d t item it tto bbe modified difi d was a llong fifield, ld hhalflf off th the fifield ld might show the new value, while the other half would contain the old. ld 29 y Two-Phase Update (Cont’d) y Update U d Technique T h i y The intent phase - the DBMS gathers the resources it needs to perform the update y Committing, involves the writing of a commit flag to the database. The commit flag means that the DBMS has passed the point of no return: After committing, the DBMS begins making permanent changes. 30 y Two-Phase Update (Cont’d) y Example E l 1. The stockroom checks the database to determine that 50 boxes of paper clips are on hand. hand If not, not the requisition is rejected and the transaction is finished. 2. If enough paper clips are in stock, the stockroom deducts 50 from the inventory figure in the database (107 - 50 = 57). 3. The stockroom charges accounting's supplies budget (also in the database) for 50 boxes of paper clips. 31 y Two-Phase Update (Cont’d) y Example E l (Cont’d) (C ’d) 4. The stockroom checks its remaining quantity on hand (57) to determine whether the remaining quantity is below the reorder point point. Because it is, is a notice to order more paper clips is generated, and the item is flagged as "on order" in the database. 5. A delivery order is prepared, enabling 50 boxes of paper clips to be sent to accounting. All five of these steps must be completed in the order listed for the database to be accurate and for the transaction to be processed correctly. 32 y Two-Phase Update (Cont’d) y Example E l (Cont’d) (C ’d) y When a two-phase commit is used, shadow values are maintained for key data points. points A shadow data value is computed and stored locally during the intent phase, and it is copied to the actual database during the commit phase. 33 y Two-Phase Update (Cont’d) y Example E l (Cont’d) (C ’d) y Intent: y y y y y Check the value of COMMIT COMMIT-FLAG FLAG in the database. If it is set, this phase cannot be performed. Halt or loop, checking COMMIT-FLAG until it is not set. Compare number of boxes of paper clips on hand to number requisitioned; if more are requisitioned than are on hand, hand halt. halt Compute TCLIPS = ONHAND - REQUISITION. Obtain BUDGET, the current supplies budget remaining for accounting department. C Compute TBUDGET B DG = BBUDGET DG - COST, COS where h COST COS is the h cost off 500 boxes b off clips. l Check whether TCLIPS is below reorder point; if so, set TREORDER = TRUE; else set TREORDER = FALSE 34 y Two-Phase Update (Cont’d) y Example E l (Cont’d) (C ’d) y Commit: y Set COMMIT-FLAG in database. database y Copy TCLIPS to CLIPS in database. y Copy TBUDGET to BUDGET in database. y Copy TREORDER to REORDER in database. y Prepare notice to deliver paper clips to accounting department. Indicate transaction completed in log. y Unset COMMIT-FLAG. 35 y Redundancy/Internal Consistency y Error E Detection D i andd Correction C i Codes C d y Shadow Fields y Entire E ti attributes tt ib t or entire ti records d can bbe dduplicated li t d iin a ddatabase. t b If th the data are irreproducible, this second copy can provide an immediate p if an error is detected. replacement 36 y Recovery y In I addition ddi i to these h error correction i processes, a DBMS can maintain i i a log of user accesses, particularly changes. In the event of a failure, th database the d t b isi reloaded l d d from f a bbackup k copy andd allll llater t changes h are then applied from the audit log. 37 y Concurrency/Consistency y Database D b systems are often f multiuser l i systems. y If both users try to modify the same data items, we often assume thatt there th th iis no conflict fli t because b eachh knows k what h t tto write; it th the value to be written does not depend on the previous value of the d t item. data it HHowever, thi this supposition iti isi nott quite it accurate. t 38 y Concurrency/Consistency (Cont’d) y E.g., E y Agent A submits the update command SELECT (SEAT-NO = '11D') 11D ) ASSIGN 'MOCK MOCK, E' E TO PASSENGER-NAME y while Agent B submits the update sequence SELECT (SEAT-NO = '11D') ASSIGN 'EHLERS, P' TO PASSENGER-NAME y To resolve this problem, a DBMS treats the entire queryupdate cycle as a single atomic operation. 39 y Monitors y The Th monitor i isi the h uniti off a DBMS responsible ibl for f the h structurall integrity of the database. y Forms F off monitors it y y y Range Comparisons y A range comparison monitor tests each new value to ensure that the value is within an acceptable range y Filters or patterns are more general types of data form checks. State constraints describe the condition of the entire database. Transition constraints describe conditions necessary before changes can be applied to a database. 40 6.4. Sensitive Data y Sensitive data are data that should not be made public. y There Th exist i cases that h some but b not allll off the h elements l iin the h database are sensitive. y There Th may bbe varying i ddegrees off sensitivity. iti it 41 y Several factors can make data sensitive. y Inherently Ih l sensitive. ii y The value itself may be so revealing that it is sensitive. Examples are the locations of defensive missiles. missiles y From a sensitive source. y The source of the data mayy indicate a need for confidentiality. y An example p is information from an informer whose identity would be compromised if the information were disclosed. 42 y Several factors can make data sensitive (Cont’d) y Declared D l d sensitive. ii y The database administrator or the owner of the data may have declared the data to be sensitive. sensitive y Part of a sensitive attribute or a sensitive record. y In a database,, an entire attribute or record mayy be classified as sensitive. y Sensitive in relation to previously disclosed information. y Some data become sensitive in the presence of other data. y For example, the longitude coordinate of a secret gold mine reveals little, but the longitude coordinate in conjunction with the latitude coordinate pinpoints the mine. 43 y Access Decisions y The Th DBMS may consider id severall factors f when h ddeciding idi whether h h to permit an access. y y y AAvailability il bilit off the th data dt Acceptability of the access Authenticityy of the user. 44 y Access Decisions (Cont’d) y Availability A il bili off Data D y One or more required elements may be inaccessible. y For example, example if a user is updating several fields, fields other users' users accesses to those fields must be blocked temporarily. This blocking ensures that users do not receive inaccurate information y Acceptability of Access y One or more values of the record may be sensitive and not accessible by the general user. A DBMS should not release sensitive data to unauthorized individuals. 45 y Access Decisions (Cont’d) y Assurance A off Authenticity A h ii y Certain characteristics of the user external to the database may also be considered when permitting access. access y For example, to enhance security, the database administrator may permit someone to access the database only at certain times, such as during working hours. 46 y Access Decisions (Cont’d) y Types T off Disclosures Di l y Exact Data - The most serious disclosure is the exact value of a sensitive data item itself y Bounds - Another exposure is disclosing bounds on a sensitive value; that is, indicating that a sensitive value, y, is between two values, L and H. y Negative Result - Sometimes we can word a query to determine a negative result. That is, we can learn that z is not the value of y. y Existence - The existence of data is itself a sensitive piece of data. 47 y Access Decisions (Cont’d) y Types T off Di Disclosures l (C (Cont’d) ’d) y Probable Value - it may be possible to determine the probability that a certain element has a certain value. value 48 y Security versus Precision 49 6.5. Inference y Inference is a way to infer or derive sensitive data from nonsensitive ddata. Sample Database Name Sex Race Aid Fines Drugs Dorm Adams M C 5000 45. 1 Holmes Bailey M B 0 0. 0 Grey Chin F A 3000 20. 0 West Dewitt M B 1000 35. 3 Grey Earhart F C 2000 95. 1 Holmes Fein F C 1000 5 15. 0 West Groff M C 4000 0. 3 West Hill F B 5000 10. 2 Holmes Koch F C 0 0. 1 West Liu F A 0 10. 2 Grey Majors M C 2000 0. 2 Grey 50 y Direct Attack y A user tries i to ddetermine i values l off sensitive i i fifields ld by b seeking ki them h directly with queries that yield few records. y A sensitive iti query might i ht be b List NAME where h SEX M ∧ DRUGS=1 SEX=M DRUGS 1 This query di Thi discloses l th thatt ffor recordd ADAMS ADAMS, DRUGS DRUGS=1. 1 HHowever, it isi an obvious bi attack because it selects people for whom DRUGS=1, and the DBMS might reject j the query q y because it selects records for a specific p value of the sensitive attribute DRUGS. 51 y Direct Attack (Cont’d) y A lless obvious b i query iis List NAME where h (SEX M ∧ DRUGS=1) (SEX=M DRUGS 1) ∨ (SEX≠M ∧ SEX ≠ F) ∨ (DORM AYRES) (DORM=AYRES) This query still retrieves only one record, revealing a name that corresponds to the sensitive DRUG value. value The DBMS needs to know that SEX has only two possible values so that the second clause will select no records. Even if that were possible, the DBMS would also need to know that no records exist with DORM=AYRES, even though AYRES might in fact be an acceptable value for DORM. 52 y Direct Attack (Cont’d) y Do D not reveall results l when h a smallll number b off people l make k up a large proportion of a category. y The Th rule l off "n " items it over k percent"t" means th thatt ddata t should h ld bbe withheld if n items represent over k percent of the result reported. 53 y Indirect Attack y Sum S - An A attackk by b sum tries i to infer i f a value l from f a reportedd sum. y Count - The count can be combined with the sum to produce some even more revealing li results. lt y Mean - The arithmetic mean (average) allows exact disclosure if the attacker tt k can manipulate i l t th the subject bj t population. l ti y Median 54 55 y Tracker Attacks y A tracker k attackk can fool f l the h database d b manager iinto locating l i the h desired data by using additional queries that produce small results. y The Th ttracker k adds dd additional dditi l records d tto bbe retrieved t i d ffor ttwo diff differentt queries; the two sets of records cancel each other out, leaving only th statistic the t ti ti or ddata t ddesired. i d Th The approachh iis tto use iintelligent t lli t padding of two queries. y In I other th words, d iinstead t d off ttrying i tto id identify tif a unique i value, l we request n - 1 other values (where there are n values in the d t b ) Given database). Gi n andd n - 1, 1 we can easily il compute t th the ddesired i d single element. 56 y Tracker Attacks (Cont’d) y For F iinstance, suppose we wish i h to kknow hhow many female f l Caucasians live in Holmes Hall. A query posed might be count ((SEX=F) (RACE=C) (DORM=Holmes)) The database management system might consult the database, find th t the that th answer isi 1, 1 andd refuse f to t answer that th t query bbecause one record dominates the result of the query. 57 y Tracker Attacks (Cont’d) y The Th query q=count((SEX=F) ∧ (RACE=C) ∧ (DORM=Holmes)) isi off th the form f q = count(a ∧ b ∧ c) y By B using i th the rules l off llogici andd algebra, l b we can ttransform f thi this query to q = count(a t( ∧ b ∧ c)) = count(a) t( ) - count(a t( ∧ ¬(b (b ∧ c)))) 58 y Tracker Attacks (Cont’d) y Thus, Th the h original i i l query iis equivalent i l to count (SEX=F) minus i count ((SEX=F) ∧ ((RACE ≠ C) ∨ (DORM ≠ Holmes))) 59 y Controls for Statistical Inference Attacks y Suppression S i - sensitive i i data d values l are not provided; id d the h query isi rejected without response. y Concealing C li - the th answer provided id d isi close l tto but b t nott exactly tl th the actual value. 60 y Controls for Statistical Inference Attacks (Cont’d) y These Th two controlsl reflect fl the h contrast bbetween security i andd precision. y y With suppression, i any results lt provided id d are correct,t yett many responses mustt be withheld to maintain security. With concealing,g, more results can be provided, p , but the precision p of the results is lower. The choice between suppression and concealing depends on the context of the database. 61 y Random Sample y With Wi h random d sample l control,l a resultl iis not dderived i d from f the h whole hl database; instead the result is computed on a random sample of th database. the dtb y The sample chosen is large enough to be valid. y Random R d DData t PPerturbation t b ti y It is sometimes useful to perturb the values of the database by a smallll error. y Generate a small random error term εi and add it to xi for statistical results. lt 62 y Query Analysis y A more complex l form f off security i uses query analysis. l i y Here, a query and its implications are analyzed to determine whether h th a resultlt should h ld be b provided. id d 63 y Conclusion on the Inference Problem y No N perfect f solutions l i to the h inference if problem. bl y The approaches to controlling it y Suppress S obviously b i l sensitive iti information. if ti y Track what the user knows. y Disguise g the data. 64 y Aggregation y Building B ildi sensitive i i results l from f less l sensitive i i inputs. i y Data mining is the process of sifting through multiple databases and correlating l ti multiple lti l ddata t elements l t tto fifindd useful f l iinformation f ti 65 6.6. Multilevel Databases y The Case for Differentiated Security Name Department Salary Phone Performance Rogers g training g 43,800 43, 4‐5067 4 5 7 A2 Jenkins research 62,900 6‐4281 D4 Poling training 38,200 4‐4501 B1 Garland user services 54,600 6‐6600 A4 Hilten user services 5 44,500 4‐5351 535 B1 Davis administration 51,400 4‐9505 A3 66 y The Case for Differentiated Security (Cont’d) y Three Th characteristics h i i off database d b security i emerge. y The security of a single element may be different from the security of other elements of the same record or from other values of the same attribute. attribute This situation implies that security should be implemented for each individual element. y Two levels sensitive and non-sensitive are inadequate to represent some security situations. Several grades of security may be needed. y The security of an aggregate a sum, a count, or a group of values in a database may differ from the security of the individual elements. The security of the aggregate may be higher or lower than that of the individual elements. 67 6.7. Proposals p for Multilevel Securityy y Separation y Partitioning P ii i y The database is divided into separate databases, each at its own level of sensitivity. sensitivity y This control destroys a basic advantage of databases: elimination of redundancy and improved accuracy through having only one field to update. y It does not address the problem of a high-level user who needs access to some low-level data combined with high-level data. 68 y Separation (Cont’d) y Encryption E i y If sensitive data are encrypted, a user who accidentally receives them cannot interpret the data. data 69 Cryptographic Separation: Different Encryption Keys. Cryptographic Separation: Block Chaining 70 y Separation (Cont’d) y Integrity I i Lock L k y First proposed at the U.S. Air Force Summer Study on Data Base Security [AFS83]. [AFS83] y The lock is a way to provide both integrity and limited access for a database. y The operation was nicknamed "spray paint" because each element is figuratively painted with a color that denotes its sensitivity. 71 y Separation (Cont’d) y Integrity I i Lock L k (Cont’d) (C ’d) y The sensitivity label should be y unforgeable, unforgeable so that a malicious subject cannot create a new sensitivity level for an element y unique, so that a malicious subject cannot copy a sensitivity level from another element y concealed, so that a malicious subject cannot even determine the sensitivity level of an arbitrary element 72 y Separation (Cont’d) y Integrity I i Lock L k (Cont’d) (C ’d) y The third piece of the integrity lock for a field is an error-detecting code, called a cryptographic checksum. checksum 73 y Separation (Cont’d) y Sensitivity S i i i Lock L k y A sensitivity lock is a combination of a unique identifier (such as the record number) and the sensitivity level. y y y Because the identifier is unique, each lock relates to one particular record. Many different elements will have the same sensitivity level. A malicious subject should not be able to identify two elements having identical sensitivity levels or identical data values jjust byy lookingg at the sensitivity level portion of the lock. 74 y Designs of Multilevel Secure Database y Integrity I i Lock L k y A short-term solution to the security problem for multilevel databases. y The intention was to be able to use any (untrusted) database manager with a trusted procedure that handles access control. y The sensitive data were obliterated or concealed with encryption that protected both a data item and its sensitivity. 75 Trusted Database Manager 76 y Designs of Multilevel Secure Database y Integrity I i Lock L k (Cont’d) (C ’d) y The efficiency of integrity locks is a serious drawback. y The space needed for storing an element must be expanded to contain the sensitivity label. y The processing time efficiency of an integrity lock. y The untrusted database manager sees all data 77 y Trusted Front End y Trusted T d front f endd iis also l kknown as a guardd andd operates muchh like lik the reference monitor 78 y Trusted Front End (Cont’d) 1. 2. 3. 4. 5. 6. A user identifies id ifi himself hi lf or hherselflf to the h ffront end; d the h ffront endd authenticates the user's identity. Th user iissues a query tto th The the front f t end. d The front end verifies the user's authorization to data. Th front The f t endd iissues a query tto th the ddatabase t b manager. The database manager performs I/O access, interacting with lowl l access control level t l tto achieve hi access tto actual t l data. dt The database manager returns the result of the query to the t t d ffrontt end. trusted d 79 y Trusted Front End (Cont’d) The front Th f endd analyzes l the h sensitivity i i i llevelsl off the h data d items i iin the result and selects those items consistent with the user's security it level. l l 8. The front end transmits selected data to the untrusted front end f formatting. for f tti 9. The untrusted front end transmits formatted data to the user. 7. 80 y Commutative Filters y A process that h forms f an interface i f between b the h user andd a DBMS. DBMS y The filter reformats the query so that the database manager does as much of the work as possible possible, screening out many unacceptable records records. y The filter then provides a second screening to select only data to which the user has access. 81 82 y Distributed Databases y Distributed Di ib d or federated fd d database d b y A trusted front end controls access to two unmodified commercial DBMSs: DBMS y y one for all low-sensitivity data and one for all high-sensitivity data. data y The distributed database design is not popular because the front end, which must be trusted end trusted, is complex complex, potentially including most of the functionality of a full DBMS itself. 83 y Window/View y One O off the h advantages d off using i a DBMS ffor multiple l i l users off different interests (but not necessarily different sensitivity levels) is th ability the bilit to t create t a different diff t view i ffor eachh user. y Each user is restricted to a picture of the data reflecting only what th user needs the d to t see. y A window (or a view) is a subset of a database, containing exactly th information the if ti that th t a user isi entitled titl d tto access. y A view can represent a single user's subset database so that all of a user's' queries i access only l that th t database. dtb 84 (a) Airline's View. FLT# ORIG DEST DEP ARR CAP TYPE PILOT TAIL 362 JFK BWI 0830 0950 114 PASS Dosser 2463 397 JFK ORD 0830 1020 114 PASS Botto ms 202 IAD LGW 1530 0710 183 PASS Jevins 2007 749 LGA ATL 0947 1120 0 CARG Witt O 286 STA SFO 1020 1150 117 PASS 3621 3116 Gross 4026 … … 85 ( ) (b) Travel Agent's View. g FLT ORIG DEST DEP ARR CAP 3362 JJFK BWI 0830 3 0950 95 114 4 397 JFK ORD 0830 1020 114 202 IAD LGW 1530 0710 183 286 STA SFO 1020 1150 117 … … 86 Secure Database Decomposition 87 6.8. Data Miningg y Databases are great repositories of data. More data are being collected andd saved. d y But to find needles of information in those vast fields of haystacks of d t requires data i iintelligent t lli t analyzing l i andd querying i off the th data. dt y Indeed, a whole specialization, called data mining, has emerged. y In I a llargely l automated t t d way, ddata t mining i i applications li ti sortt andd searchh thorough data. 88 y Data mining uses statistics, machine learning, mathematical models, pattern recognition, i i andd other h techniques hi to di discover patterns andd relations on large datasets. y Data D t mining i i tools t l use association i ti (one ( eventt often ft goes with ith another), th ) sequences (one event often leads to another), classification (events exhibit hibit patterns, tt ffor example l coincidence), i id ) clustering l t i ((some ititems hhave similar characteristics), and forecasting (past events foretell future ones). y Data D t mining i i presents t probable b bl relationships, l ti hi bbutt th these are nott necessarily cause-and-effect relationships. 89 y Privacy and Sensitivity y Because B the h goall off ddata mining i i iis summary results, l not iindividual di id l data items, you would not expect a problem with sensitivity of i di id l data individual d t items. it y Unfortunately that is not true. Why ??? ☺ 90 6.9. Summaryy of Database Securityy y Address three aspects of security for database management systems: y Confidentiality C fid i li andd iintegrity i problems bl specific ifi to ddatabase b applications y y CConfidentiality fid ti lit can bbe broken b k by b indirect i di t disclosure di l off a negative ti resultlt or off the bounds of a value. Integrity g y of the entire database is a responsibility p y of the DBMS software y The inference problem for statistical databases y Arise from the mathematical relationships between data elements and query results. y Problems of including users and data of different sensitivity levels in one database. 91