Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 10 Data and Knowledge Management Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Data • Set of discrete, objective facts about events • Business - structured records of transactions • Little relevance or purpose Information • Message with sender and receiver • Meant to change way receiver perceives something • Have an impact on his judgment / behavior Data Processing • Contextualize - why was data gathered? • Categorize - what are its key components? • Calculate - analyze mathematically • Condense - summarize in more concise form Information Processing • Compare - in kind and in time • Consequences - how used in decisions / actions • Connections - relation to other information • Conversation - what other people think about this information Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Database • • • • • • Element Types Structure Models Creation Topology Element • • • • Bit, byte, field, record, file, database Entity, attribute, key field Relation Class, object Database Types • • • • • • • • Business database Geographical information database Knowledge database / deductive database Multimedia database Data warehouse Data marts Multimedia and hypermedia database Object-oriented database Database Structure • Data definition language – Schema & subschema • Data Manipulation language – Structured Query Language (SQL) – Query By Example (QBE) • Data dictionary Database Models • Hierarchical – One to many – TPS or routine MIS • Network – Many to many – TPS or routine MIS • Relational – Normalization – Ad hoc reports or DSS • Object-oriented – E-commerce Database Creation • Conceptual design – Logical view – Entity-relationship (ER) diagram – Normalization Entity Relationship Diagram • Entity: object or concept • Relationship: meaning association between objects • Attribute: property of an object – Simple & Composite – Single-valued & multi-valued – Derived • Key – Primary key – Foreign key Normalization • A technique for identifying a true primary key for a relation • Types – First normal form: not repeating group – Second normal form: every non-primary-key attribute is fully functionally dependent on the entire primary key – Third normal form: no transit dependency Structured Query Language • Select • Join SQL DML - SELECT • SELECT [DISTINCT|ALL] {* | [colexpr [AS newname]][,...] FROM table-name [alias] [,...] [WHERE condition] [GROUP by colm [, colm] [HAVING condition]] ORDER BY colm [, colm] SQL DML - SELECT • SELECT attributes (or calculations: +, -, /, *) FROM relation • SELECT DISTINCT attributes FROM relation Examples • SELECT stuname FROM student; • SELECT stuid, stuname, credit FROM student; • SELECT stuid, stuname, credit+10 FROM student; • SELECT DISTINCT major FROM student; SQL DML - SELECT • SELECT attributes (or * wild card) FROM relation WHERE condition Examples • SELECT * FROM student; • SELECT stuname, major, credit FROM student WHERE stuid = ‘S114’; • SELECT * FROM faculty WHERE dept = ‘MIS’; SELECT - WHERE condition • • • • • • AND OR NOT IN NOT IN BETWEEN IS NULL IS NOT NULL LIKE '%' multiple characters LIKE ‘_’ single characters Examples • SELECT * FROM faculty WHERE dept = ‘MIS’ AND rank = ‘full professor’; • SELECT * FROM faculty WHERE dept = ‘MIS’ OR rank = ‘full professor’; • SELECT * FROM faculty WHERE dept = ‘MIS’ NOT rank = ‘full professor’; • SELECT * FROM class WHERE room LIKE ‘B_S%’; • SELECT * FROM class WHERE room NOT LIKE ‘BUS%’; • SELECT productid, productname FROM inventory WHERE onhand BETWEEN 50 and 100; • SELECT companyid, companyname FROM company WHERE companyname BETWEEN ‘G’ AND ‘K’; • SELECT productid, productname FROM inventory WHERE onhand NOT BETWEEN 50 and 100; • SELECT companyid, companyname FROM company WHERE companyname NOT BETWEEN ‘G’ AND ‘K’; • SELECT facname FROM faculty WHERE dept IN (‘MIS’, ‘ACT’); • SELECT facname FROM faculty WHERE rank NOT IN (‘assistant’, ‘lecture’); • SELECT customername FROM customer WHERE emailadd IS NOT NULL; • SELECT customername FROM customer WHERE creditlimit IS NULL; SELECT - aggregate functions • • • • • • COUNT (*) COUNT SUM AVG MIN MAX Examples • SELECT COUNT(*) FROM student; • SELECT COUNT(major) FROM student; • SELECT COUNT(DISTINCT major) FROM student; • SELECT COUNT(stuid), SUM(credit), AVG(credit), MAX(credit), MIN(credit) FROM student; SELECT - GROUP • • GROUP BY HAVING Examples • • SELECT major, AVG(credit) FROM student GROUP BY major HAVING COUNT(*) > 2; SELECT course#, COUNT(stuid) FROM enrollment GROUP BY course# HAVING COUNT(*) > 2; • SELECT major, AVG(credit) FROM student WHERE major IN (‘MIS’, ‘ACT’) GROUP BY major HAVING COUNT(*) > 2; SELECT - ORDER BY • • ORDER BY ORDER BY ... DESC Examples • • SELECT facname, rank FROM faculty ORDER BY facname; SELECT facname, rank FROM faculty ORDER BY rank DESC, facname; SELECT - JOIN Tables • • Multiple tables in FROM clause MUST have join conditions!!! Examples • SELECT stuname, grade FROM student, enrollment WHERE student.stuid = enrollment.stuid; • SELECT enrollment.course#, stuname, major FROM class, enrollment, student WHERE class.course# = enrollment.course# AND enrollment.stuid = student.stuid AND facid = ‘F114’ ORDER BY enrollment.course#; SUBQUERY, EXIST, NOT EXIST • SELECT s.stuname, major FROM student s WHERE EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid); • SELECT s.stuname, major FROM student s WHERE NOT EXIST (SELECT * FROM enrollment e WHERE s.stuid = e.stuid); Database Creation • Physical design – Physical view – Data topology (organization) • Centralized • Distributed database – Replicated database – Partitioned • Organization & access method – Sequential file – Indexed sequential file – Direct or random file • Security – Logical, physical, and transmitting Selection Criteria • • • • • • • • • User’ needs (type of application) Compatibility Portability Reliability Cost Feature Performance Vendor’s support Others? Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Data Administrator • • • • Clean up data definitions Control shared data Manage distributed data Maintain data quality Clean Up Definitions • Synonyms / aliases • Standard data definitions – Names and formats • Data Dictionary – Active – Integrated Control Shared Data • Local - used by one unit • Shared - used by two or more activities • Impact of proposed program changes on shared data • Program-to-data element matrix Manage Distributed Data • Geographically dispersed – Whether shared data or not • Different levels of detail – Different management levels Maintain Data Quality • Put owners in charge of data – Verify data accuracy and quality • Purge old data Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining The DBMS Data Base Management System: software that permits a firm to: – Centralize data – Manage them efficiently – Provide access to applications • Such as payroll, inventory DBMS Components • • • • Data Definition Language (DDL) Data Manipulation Language (DML) Inquiry Language (IQL) Teleprocessing Interface (TP) Definitions • Views: – Physical - how stored – Logical - how viewed and used by users • Schema - Overall logical layout of records and fields in a database • Subschema: Individual user’s logical portion of database (view) Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Distributing Data • Centralized files • Fragemented files – Distribute data without duplication – Users unaware of where data located Distributing Data • Replicated files – Data duplicated – One site has master file – Problem with data synchronization • Decentralized files – Local data autonomy Distributing Data • Distributed files – – – – – Client / server systems Stored centrally Portion downloaded to workstation Workstation can change data Changes uploaded to central computer Agenda • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Data Warehousing • Collect large amounts of data from multiple sources over several years • Classify each record into multiple categories – Age – Location – Gender Data Warehousing • Rapidly select and retrieve by multiple dimensions – All females in Chicago under 25 years of age • Provide tailored, on-demand reports • Data mart: a replicated subset of the data warehouse – A functional or regional area Data Mining • Fitting models to, or determining patterns from, warehoused data • Purposes: – Analyze large amount of data – Find critical points of knowledge – Perform automatic analyses Data Mining Terms • Data Visualization • Drill-down Analysis – Hierarchical structure – Leads to increasing level of detail • Expert System (ES) methodology – e.g., neural networks Applications • • • • • Finance - fraud detection Stock Market - forecasting Real estate - property evaluation Airlines - customer retention Retail - customer targeting Data Mining Example • What type customers are buying specific products? • When are the times customers will most likely shop? • What types of products can be sold together? Points to Remember • • • • • • Information processing Database Data Administrator The DBMS Distributing data Data warehousing and data mining Discussion Questions • How can a database help an organization? • Why normalization is very important for building a database? • Do you see any problem of the database in your organization? Discussion Questions • What kind of database model is most suitable for – School? – Department store? – Police? • Some organizations are hesitant to distribute data. These organizations feel that they may lose control. – Do they lose control? Why? – Could you suggest a “good” tactic? • Could Data Mining pose a threat to individual privacy? – Why or why not? – If so, how can we mitigate that threat? – Do the advantages outweigh the disadvantages? Assignment • • • • Review chapters 10 Read chapter 8, 9, and 11 Group assignment Research paper