Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Decision support systems for Ecommerce Working Definition of DSS A DSS is an integrated, interactive computer system, consisting of analytical tools and information management capabilities, designed to aid decision makers in solving relatively large, unstructured problems Decision Making samples what were the sales volumes by region and product category for the last year? How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years? Central Issue in DSS support and improvement of decision making Management Decision Making Strategic CEO, board of directors, top executives Develop overall strategies of organization Tactical Regional managers, plant managers, division supervisors Carry out strategic managers plans Operational Direct managers, team leaders Carry out tactical managers plans Different Technologies are invented to meet different Decision Making Goals! The Big Picture: DBs, Data Warehouse, & OLAP, Data Mining OLAP Server other sources Operational DBs Extract Transform Load Refresh Data Warehouse Data Storage Serve Analysis Query Reports Data mining OLAP Engine Front-End Tools Evolutionary Step Technologies Providers Data Collection (1960s) Computers, tapes, disks IBM, CDC Data Access (1980s) Relational databases, SQL, ODBC Oracle, Sybase, Informix, IBM, Microsoft Data Warehousing & Decision Support systems (1990s) On-line analytic Cognos, Arbor, Processing (OLAP), Pilot, Microstrategy, Multidimensional ORACLE, IBM databases (Cubes) Data Mining (Present) Statistics, Machine Learning, AI SAS, SPSS, IBM, ORACLE, Cognos, Microsoft Why Build a Data Warehouse? Separate transactional and analysis systems : to make Tactical or even Strategic decisions for Regional managers or CEOs Easy formulation of complex queries Access to historical data (not in operational systems) Improved data quality (fewer errors and missing values) Access to data from multiple sources, have a comprehensive data collection Potential Applications of Data Warehousing and Mining in EC Analysis of user access patterns and buying patterns Customer segmentation and target marketing Cross selling and improved Web advertisement Personalization Association (link) analysis Customer classification and prediction Time-series analysis Typical event sequence and user behavior pattern analysis Transition and trend analysis Data Warehousing The phrase data warehouse was coined by William Inmon in 1990 Data Warehouse is a decision support database that is maintained separately from the organization’s operational database Definition: A DW is a repository of integrated information from distributed, autonomous, and possibly heterogeneous information sources for query, analysis, decision support, and data mining purposes Characteristics (cont’d) Integrated No consistency in encoding, naming conventions, … among different application-oriented data from different legacy systems, different heterogeneous data sources When data is moved to the warehouse, it is consolidated converted, and encoded Characteristics (cont’d) Non-volatile New data is always appended to the database, rather than replaced The database continually absorbs new data, integrating it with the previous data In contrast, operational data is regularly accessed and manipulated a record at a time and update is done to data in the operational environment Characteristics (cont’d) Time-variant Operational database contain current value data. Operational data is valid only at the moment of access-capturing a moment in time. The time horizon for the data warehouse is significantly longer than that of operational systems. Data warehouse data is nothing more than a sophisticated series of snapshots, taken as of some moment in time. System Architecture End User Analysis, Query Reports, Data Mining Detector Detector Detector Legacy Flat-file ... Detector RDBMS OODBMS Data Warehouse Back-End Tools and Utilities Data extraction: Extract data from multiple, heterogeneous, and external sources Data cleaning (scrubbing): Detect errors in the data and rectify them when possible Data converting: Convert data from legacy or host format to warehouse format Transforming: Sort, summarize, compute views, check integrity, and build indices Refresh: Propagate the updates from the data sources to the warehouse On-Line Analytical Processing (OLAP) Front-end to the data warehouse. Allowing easy data manipulation Allows conducting inquiries over the data at various levels of abstractions Fast and easy because some aggregations are computed in advance No need to formulate entire query OLAP: Data Cube OLAP uses data in multidimensional format (e.g., data cubes) to facilitate query and response time. 2Qtr 3Qtr 4Qtr sum U.S.A Canada Mexico sum Country TV PC VCR sum 1Qtr Date Overall sales of TV’s in the US in 3rd quarter OLAP: Data Cube Operations Slicing: Selecting the dimensions of the cube to be viewed. Example: View “Sales volume” as a function of “Product ” by “Country “by “Quarter” Dicing: Specifying the values along one or more dimensions. Example: View “Sales volume” for “Product=PC” by “Country “by “Quarter” OLAP: Data Cube Operations Drilling down: from higher level aggregation to lower level aggregation or detailed data (Viewing by “state” after viewing by “region” ) Rolling-up: Summarize data by climbing up hierarchy or by dimension reduction (E.g., viewing by “region” instead of by “state”) Cube Operations Illustrated Drilling down Rolling up Actual Application Com.1 Query: “overall & detail production performance” • • • • manufacturer: Com1 products: all products date interval: 01-Jan-94 until 01-Jan-1999 source: USDA Com.1 Lot#1 Contract Number 1 Com.1 Lot#2 Contract Number 2 Com.1 Lot#3 Contract Number 3 Data Mining “Data Mining is the exploration and analysis by automatic or semi-automatic means, of large or small quantities of data in order to discover meaningful patterns, trends and rules.” Data Mining Data Analysis Statistics AI & ML Database Data Warehouse OLAP Data Analysis Classification Regression Clustering Association Sequence Analysis Data Analysis (cont.) Modeling X1 Numeric Y1 Numeric Regression 3, 4.5, 102, … Categorical f X2 hot, cold, high, low, … Crisp X3 Y2 Categorical Classification Y3 Crisp 0, 1, yes, no, … Input Variables or Independent Variables or Attributes or Descriptors Linear Models or Non-linear Models or A set of rules Output Variables or Dependent Variables or Classes or Targets Data Analysis (cont.) Clustering Association Income 1, chips, coke, chocolate 2, gum, chips 3, chips, coke 4, … Age Probability (chips, coke) ? Probability (chips, gum) ? Sequence Analysis …ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA… Xt-1 T Xt Data Analysis (cont.) Classification Linear Discriminant Analysis Naïve Bayes / Bayesian Network OneR Neural Networks Decision Tree (ID3, C4.5, …) K-Nearest Neighbors (IB) Support Vector Machines (SVM) … Clustering K-Mean Clustering Self Organizing Map Bayesian Clustering COBWEB … Regression Multiple Linear Regression Principal Components Regression Partial Least Square Neural Networks Regression Tree (CART, MARS, …) K-Nearest Neighbors (LWR) Support Vector Machines (SVR) … Association & Sequence Analysis A Priori Markov Chain Hidden Markov Models … Challenges Faster, more accurate and more scalable techniques Incremental, on-line and real-time learning algorithms Parallel and distributed data processing techniques Opportunities Data mining is a ‘top ten’ emerging technology Data mining is finding increasing acceptance in science and business areas which need to analyze large amounts of data to discover trends and patterns which they could not otherwise find. Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems.