* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data and Knowledge Management
Survey
Document related concepts
Transcript
Chapter 10 Data and Knowledge Management Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Data • Set of discrete, objective facts about events • Business - structured records of transactions • Little relevance or purpose Information • Message with sender and receiver • Meant to change way receiver perceives something • Have an impact on his judgment / behavior Data Processing • Contextualize - why was data gathered? • Categorize - what are its key components? • Calculate - analyze mathematically • Condense - summarize in more concise form Information Processing • Compare - in kind and in time • Consequences - how used in decisions / actions • Connections - relation to other information • Conversation - what other people think about this information Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Database • • • • • • Element Types Structure Models Creation Topology Element • • • • Bit, byte, field, record, file, database Entity, attribute, key field Relation Class, object Database Types • • • • • • • • Business database Geographical information database Knowledge database / deductive database Multimedia database Data warehouse Data marts Multimedia and hypermedia database Object-oriented database Database Structure • Data definition language – Schema & subschema • Data Manipulation language – Structured Query Language (SQL) – Query By Example (QBE) • Data dictionary Database Models • Hierarchical – One to many – TPS or routine MIS • Network – Many to many – TPS or routine MIS • Relational – Normalization – Ad hoc reports or DSS • Object-oriented – E-commerce Database Creation • Conceptual design – Logical view – Entity-relationship (ER) diagram – Normalization Entity Relationship Diagram • Entity: object or concept • Relationship: meaning association between objects • Attribute: property of an object – Simple & Composite – Single-valued & multi-valued – Derived • Key – Primary key – Foreign key Normalization • A technique for identifying a true primary key for a relation • Types – First normal form: not repeating group – Second normal form: every non-primary-key attribute is fully functionally dependent on the entire primary key – Third normal form: no transit dependency Structured Query Language • Select • Join SQL DML - SELECT • SELECT [DISTINCT|ALL] {* | [colexpr [AS newname]][,...] FROM table-name [alias] [,...] [WHERE condition] [GROUP by colm [, colm] [HAVING condition]] ORDER BY colm [, colm] SQL DML - SELECT • SELECT attributes (or calculations: +, -, /, *) FROM relation • SELECT DISTINCT attributes FROM relation Examples • SELECT stuname FROM student; • SELECT stuid, stuname, credit FROM student; • SELECT stuid, stuname, credit+10 FROM student; • SELECT DISTINCT major FROM student; SQL DML - SELECT • SELECT attributes (or * wild card) FROM relation WHERE condition Examples • SELECT * FROM student; • SELECT stuname, major, credit FROM student WHERE stuid = ‘S114’; • SELECT * FROM faculty WHERE dept = ‘MIS’; SELECT - WHERE condition • • • • • • AND OR NOT IN NOT IN BETWEEN IS NULL IS NOT NULL LIKE '%' multiple characters LIKE ‘_’ single characters Examples • SELECT * FROM faculty WHERE dept = ‘MIS’ AND rank = ‘full professor’; • SELECT * FROM faculty WHERE dept = ‘MIS’ OR rank = ‘full professor’; • SELECT * FROM faculty WHERE dept = ‘MIS’ NOT rank = ‘full professor’; • SELECT * FROM class WHERE room LIKE ‘B_S%’; • SELECT * FROM class WHERE room NOT LIKE ‘BUS%’; • SELECT productid, productname FROM inventory WHERE onhand BETWEEN 50 and 100; • SELECT companyid, companyname FROM company WHERE companyname BETWEEN ‘G’ AND ‘K’; • SELECT productid, productname FROM inventory WHERE onhand NOT BETWEEN 50 and 100; • SELECT companyid, companyname FROM company WHERE companyname NOT BETWEEN ‘G’ AND ‘K’; • SELECT facname FROM faculty WHERE dept IN (‘MIS’, ‘ACT’); • SELECT facname FROM faculty WHERE rank NOT IN (‘assistant’, ‘lecture’); • SELECT customername FROM customer WHERE emailadd IS NOT NULL; • SELECT customername FROM customer WHERE creditlimit IS NULL; SELECT - aggregate functions • • • • • • COUNT (*) COUNT SUM AVG MIN MAX Examples • SELECT COUNT(*) FROM student; • SELECT COUNT(major) FROM student; • SELECT COUNT(DISTINCT major) FROM student; • SELECT COUNT(stuid), SUM(credit), AVG(credit), MAX(credit), MIN(credit) FROM student; SELECT - GROUP • • GROUP BY HAVING Examples • • SELECT major, AVG(credit) FROM student GROUP BY major HAVING COUNT(*) > 2; SELECT course#, COUNT(stuid) FROM enrollment GROUP BY course# HAVING COUNT(*) > 2; • SELECT major, AVG(credit) FROM student WHERE major IN (‘MIS’, ‘ACT’) GROUP BY major HAVING COUNT(*) > 2; SELECT - ORDER BY • • ORDER BY ORDER BY ... DESC Examples • • SELECT facname, rank FROM faculty ORDER BY facname; SELECT facname, rank FROM faculty ORDER BY rank DESC, facname; SELECT - JOIN Tables • • Multiple tables in FROM clause MUST have join conditions!!! Examples • SELECT stuname, grade FROM student, enrollment WHERE student.stuid = enrollment.stuid; • SELECT enrollment.course#, stuname, major FROM class, enrollment, student WHERE class.course# = enrollment.course# AND enrollment.stuid = student.stuid AND facid = ‘F114’ ORDER BY enrollment.course#; SUBQUERY, EXIST, NOT EXIST • SELECT s.stuname, major FROM student s WHERE EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid); • SELECT s.stuname, major FROM student s WHERE NOT EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid); Database Creation • Physical design – Physical view – Data topology (organization) • Centralized • Distributed database – Replicated database – Partitioned • Organization & access method – Sequential file – Indexed sequential file – Direct or random file • Security – Logical, physical, and transmitting Selection Criteria • • • • • • • • • User’ needs (type of application) Compatibility Portability Reliability Cost Feature Performance Vendor’s support Others? Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Data Administrator • • • • Clean up data definitions Control shared data Manage distributed data Maintain data quality Clean Up Definitions • Synonyms / aliases • Standard data definitions – Names and formats • Data Dictionary – Active – Integrated Control Shared Data • Local - used by one unit • Shared - used by two or more activities • Impact of proposed program changes on shared data • Program-to-data element matrix Manage Distributed Data • Geographically dispersed – Whether shared data or not • Different levels of detail – Different management levels Maintain Data Quality • Put owners in charge of data – Verify data accuracy and quality • Purge old data Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining The DBMS Data Base Management System: software that permits a firm to: – Centralize data – Manage them efficiently – Provide access to applications • Such as payroll, inventory DBMS Components • • • • Data Definition Language (DDL) Data Manipulation Language (DML) Inquiry Language (IQL) Teleprocessing Interface (TP) Definitions • Views: – Physical - how stored – Logical - how viewed and used by users • Schema - Overall logical layout of records and fields in a database • Subschema: Individual user’s logical portion of database (view) Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Distributing Data • Centralized files • Fragemented files – Distribute data without duplication – Users unaware of where data located Distributing Data • Replicated files – Data duplicated – One site has master file – Problem with data synchronization • Decentralized files – Local data autonomy Distributing Data • Distributed files – – – – – Client / server systems Stored centrally Portion downloaded to workstation Workstation can change data Changes uploaded to central computer Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Data Warehousing • Collect large amounts of data from multiple sources over several years • Classify each record into multiple categories – Age – Location – Gender Data Warehousing • Rapidly select and retrieve by multiple dimensions – All females in Chicago under 25 years of age • Provide tailored, on-demand reports • Data mart: a replicated subset of the data warehouse – A functional or regional area Data Mining • Fitting models to, or determining patterns from, warehoused data • Purposes: – Analyze large amount of data – Find critical points of knowledge – Perform automatic analyses Data Mining Terms • Data Visualization • Drill-down Analysis – Hierarchical structure – Leads to increasing level of detail • Expert System (ES) methodology – e.g., neural networks Applications • • • • • Finance - fraud detection Stock Market - forecasting Real estate - property evaluation Airlines - customer retention Retail - customer targeting Data Mining Example • What type customers are buying specific products? • When are the times customers will most likely shop? • What types of products can be sold together? Points to Remember • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Discussion Questions • How can a database help an organization? • Why normalization is very important for building a database? • Do you see any problem of the database in your organization? Discussion Questions • What kind of database model is most suitable for – School? – Department store? – Police? • Some organizations are hesitant to distribute data. These organizations feel that they may lose control. – Do they lose control? Why? – Could you suggest a “good” tactic? • Could Data Mining pose a threat to individual privacy? – Why or why not? – If so, how can we mitigate that threat? – Do the advantages outweigh the disadvantages? Assignment • • • • Review chapters 10 Read chapter 8, 9, and 11 Group assignment Research paper