* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Related Concepts Database Systems Relational Database Models
Survey
Document related concepts
Transcript
Related Concepts Data Mining Goal: Examine some areas which are related to data mining. CS 341, Spring 2007 n n n Lecture 2: Related Concepts (I) n Database Systems Decision Support System Data Warehousing Fuzzy Sets and Logic © Prentice Hall Database Systems n Relational Database Models Database Based on the relational model developed by E.F. Codd in 1970 Data and relationship between them are organized in tables. Properties of Relational Tables: n – A collection of related records – The conceptual view: table with rows and columns n n Schema – A structural description of the type of facts held in a database – E.g. (EmployeeID (EmployeeID,, Name, Address, Salary, JobNo) JobNo) – – – – – – – Your own example? n Database models : modeling database structure 3 © Prentice Hall ProdID 123 123 150 150 150 Relation: A rectangular table – Attribute: A column in the table – Tuple: Tuple: A row in the table 150 200 300 500 500 © Prentice Hall 5 4 A Relation Containing product information Relational Database Model n Values Are Atomic Each Row is Unique Column Values Are of the Same Kind The Sequence of Columns is Insignificant The Sequence of Rows is Insignificant Each Column Has a Unique Name Dominant in commercial data processing systems n © Prentice Hall 2 1 LocID Dallas Houston Dallas Dallas Fort Worth Chicago Seattle Rochester Bradenton Chicago Date 022900 020100 031500 031500 021000 Quantity 5 10 1 5 5 UnitPrice 25 20 100 95 80 012000 030100 021500 022000 012000 20 5 200 15 10 75 50 5 20 25 © Prentice Hall 6 1 A relation containing redundancy A relation containing employee information © Prentice Hall 7 © Prentice Hall An employee database consisting of three relations Relational Operations n n n © Prentice Hall 9 The SELECT operation © Prentice Hall 8 Select: Choose rows Project: Choose columns Join: Assemble information from two or more relations © Prentice Hall 10 The PROJECT operation 11 © Prentice Hall 12 2 Another example of the JOIN operation The JOIN operation © Prentice Hall 13 © Prentice Hall An application of the JOIN operation 14 Structured Query Language (SQL) n Operations to manipulate tuples – insert – update – delete – select © Prentice Hall 15 © Prentice Hall SQL Examples n n SQL Examples (continued) select EmplId, EmplId, Dept from ASSIGNMENT, JOB where ASSIGNMENT.JobId = JOB.JobId and ASSIGNMENT.TermData = “*” insert into EMPLOYEE values (‘ (‘43212’ 43212’, ‘Sue A. Burt’ Burt’, ’33 Fair St.’ St.’, ‘444661111’ 444661111’) © Prentice Hall 16 17 n delete from EMPLOYEE where Name = ‘G. Jerry Smith’ Smith’ n update EMPLOYEE set Address = ‘1812 Napoleon Ave.’ Ave.’ where Name = ‘Joe E. Baker’ Baker’ © Prentice Hall 18 3 Maintaining Database Integrity n Maintaining database integrity (continued) Transaction: A sequence of operations that must all happen together n – Incorrect summary problem – Lost update problem – Example: transferring money between bank accounts n Transaction log: A nonnon-volatile record of each transaction’ transaction’s activities, built before the transaction is allowed to execute – Commit point: The point at which a transaction has been recorded in the log – RollRoll-back: The process of undoing a transaction © Prentice Hall Simultaneous access problems n Locking = preventing others from accessing data being used by a transaction – Shared lock: used when reading data – Exclusive lock: used when altering data 19 © Prentice Hall Other Database Models n Database Systems Hierarchical model n Network model n » More than 1 parent per child, m : m mapping n » Add database functionality to object programming language. 21 The conceptual layers of a database implementation © Prentice Hall Database applications – – – – ObjectObject-oriented model © Prentice Hall A relational database management system – A software package used to create a database (Oracle, Microsoft SQL sever, MYSQL) » a tree structure, parentparent-child relationships, 1:m mapping n 20 Human resource management system Sales management system Inventory management system Decision support system © Prentice Hall 22 A database vs. a file 23 © Prentice Hall 24 4 Decision Support System n n n n What is a Data Warehouse Computer systems and related tools that assist managers in making decisions and solving problems. Build upon database systems, systems, provide specific information needed by management More ad hoc and customized information DSS may use data mining tools on a data warehouse © Prentice Hall n n n DM: May access data in warehouse. 25 © Prentice Hall SubjectSubject-oriented n – Data related to the same event or object are linked together n n timetime-variant – Changes to the data in database are tracked and recorded n Nonvolatile n Integrated n – Data in ware house never be deleted or changed 27 Operational Data Data Warehouse OLTP Precise Queries Snapshot Dynamic Application Operational Values Gigabits Detailed Often Few Seconds Relational OLAP Ad Hoc Historical Static Business Integrated Terabits Summarized Less Often Minutes Star/Snowflake © Prentice Hall Data mining tools often access data warehouses rather than operational data. © Prentice Hall Operational vs. Informational Application Use Temporal Modification Orientation Data Size Level Access Response Data Schema Operational Data: Data used in day to day needs of company. Informational Data: Supports other functions such as planning and forecasting. DM: May access data in warehouse. – Contains data from all applications for an organization. – Keep consistent © Prentice Hall 26 Operational Data vs. Informational Data What is a Data Warehouse n The main repository of the organization's historical data Contains the raw material for management's decision support system The data warehouse is optimized for reporting and analysis 28 OLAP n n n n Online Analytic Processing (OLAP): provides more complex queries than OLTP. OnLine Transaction Processing (OLTP): traditional database/transaction processing. Dimensional data; cube view Visualization of operations: – – – Slice: examine subsub-cube. Dice: rotate cube to look at another dimension. Roll Up/Drill Down DM: May use OLAP queries. 29 © Prentice Hall 30 5 Fuzzy Sets and Logic n n n n Fuzzy Sets Fuzzy Set: Set membership function is a real valued function with output in the range [0,1]. f(x): Probability x is in F. 1-f(x): Probability x is not in F. EX: – T = {x | x is a person and x is tall} – Let f(x) be the probability that x is tall – Here f is the membership function DM: Prediction and classification are fuzzy. © Prentice Hall 31 © Prentice Hall Classification/Prediction is Fuzzy Next Lecture: n n Loan Reject 32 Information Retrieval, Question Answering, Web Search Reading assignments: Chapter 2 Reject Amnt Accept Accept Simple Fuzzy © Prentice Hall 33 © Prentice Hall 34 6