Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehousing & Data Mining A review of Data Warehousing and Data Mining Aqeel Al-Jishi Nick Farley Masaki Osada Data Warehousing Defined: Data Warehouse – The main repository of an organization historical data, it corporate memory It contains the raw material for management’s decision support system Data Mining – (Knowledge Discovery) The process of analyzing data from different perspectives and summarizing it into useful information History In the late 1980s to early 1990s distinct computer databases were created These databases were designed to meet the data analysis needs that Operational Systems were failing to support Operational Systems failed for many reasons: Long report generation time Inability to handle loads and not optimized Many organizations had multiple Operation Systems which was a reporting nightmare Custom applications were required for reporting which slowed reporting and increased costs Goals of Data Warehousing Make information easily accessible Provide endless views and combinations of data (Slicing & Dicing) Query results returned with minimal wait time Be adaptive and resilient to change Designed with the correct users in mind (business users and management) Keep information secure but allow access to insiders Common Issues Problems with Data Acquisition may arise 80% of the time building a data warehouse will be spent on extracting, cleaning, and loading data Errors with data can be rampant: Incomplete Data (missing fields) Incorrect Data (wrong calculations) Readability Issues (strange formatting) Differing Design Views, Relational Relational Model - Bill Inmon The data warehouse is but one part of the Business Intelligence system An enterprise has one data warehouse and data marts source their information from it. Uses 3rd Normal Form to store information in the database Differing Design Views, Dimensional Dimensional model - Ralph Kimball Data warehouse is the conglomerate of all data marts within the enterprise Uses Star or Snowflake schema to emulate a multi-dimensional database Overall information is NOT normalized in the database Terminology Data Mart – A data mart is a subset of data from the data warehouse, typically used when the broad scope of the data warehouse isn’t needed Business departments commonly create, use, and alter their own data marts. Level of Granularity – The amount and level of data brought in to the data warehouse during acquisition Dimensional Model Dimension Table – A table with a single-part primary key and descriptive attribute columns. Describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products Dimensional Model Fact Table – A table with numeric performance measures (metrics) characterized by a composite key The elements for the composite key come from the foreign keys from a dimensional table What is Data Mining? Basic Concept Getting “Useful Information” out of a large amount of “Data” In Business World Getting “Business Intelligence” out of a large amount of “Information” What is the difference between “Business Intelligence” and “Information”? Evolutionary Step Business Question Enabling Technologies Characteristics Data Collection (1960s) “What was my total Computers, tapes, Retrospective, revenue in the last disks static data five years?” delivery Data Access (1980s) “What were unit sales in New England last March?” Relational Databases, Structured Query Language Retrospective, dynamic data delivery at record level Data Warehousing & Decision Support (1990s) “What were unit sales in New England last March? Drill down to Boston.” On-line analytic processing, multidimensional databases Retrospective, dynamic data delivery at multiple levels Data Mining (Emerging Today) “What’s likely to happen to Boston unit sales next month? Why?” Advanced algorithms, multiprocessor computers, massive databases Prospective, proactive information delivery Focus of Data Mining Predict the future trends and behaviors using the past information Prospective analysis Uniqueness of Data Mining Heuristic in nature Capable of finding hidden patterns users would never think of Modeling is the key technology to find patterns Self-guiding An example of Modeling Yesterday Today Tomorrow Static Information Known Known Known Dynamic Information Known Known Target Build a model based on known combinations of static and dynamic information to see if it indicates the result of a dynamic information in the future. Real world examples of Data Mining “Diaper and Beer” VISA Blockbuster Entertainment Personalized Recommendation Suppliers to Wal-mart Fraud Detection Sales prediction NBA Teams Strategy analysis Data Warehouse Data Mart ERP DM ERP SCM DM CRM DM Questions? Sources • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • BOOKS: Mastering Data Warehouse Design, Relational and Dimensional Techniques Claudia Imhoff Nicholas Galemmo Jonathan G Geiger © 2003, Wiley Publishing, Inc. Indianapolis, Indiana The Data Warehouse Toolkit second edition, The Complete Guide to Dimensional Modeling Ralph Kimball Margy Ross © 2002, Wiley and Songs, Inc. New York, NY Statistical Data Mining and Knowledge Discovery Edited by Hamparsum Bozdogan © 2004 CRC Press LLC, Boca Raton, Florida Data Mining and Business Intelligence: A Guide to Productivity Stephan Kudyba and Richard Hoptroff © 2001 Idea Group Publishing, London, England Modern Systems Analysis and Design, fourth edition Jeffrey A. Hoffer, et al © 2005, 2002 by Pearson Education, Inc. Upper Saddle River, New Jersey Database Systems An Application-Oriented Approach Michael Kifer, Arthur Bernstein, Philip Lewis © 2005 Pearson Education, Inc., New York, New York ELECTRONIC: http://www.1keydata.com/datawarehousing/inmon-kimball.html http://www.fortunecity.com/skyscraper/oracle/699/orahtml/dbmsmag/9807d05.html http://en.wikipedia.org/wiki/Data_warehouse http://en.wikipedia.org/wiki/Dimension_(data_warehouse) http://en.wikipedia.org/wiki/Data_mining http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm http://www.dwinfocenter.org http://www.thearling.com/text/dmwhite/dmwhite.htm http://www.learndatamodeling.com/dimension.htm http://www.learndatamodeling.com/fact.htm