Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
University of Houston – Clear Lake Management Information Systems ISAM 5931 – Summer 2006 DATA MINING BASICS Prepared for: Dr. Mohammad Rob Prepared By: Fouad Alibrahim & Mohammad Monakes ABSTRACT Data mining is the process of analyzing data from different perspectives and summarizing it into useful information that can be used to increase revenue, cuts costs, or both. Data mining software allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. June 22, 2006 1. INTRODUCTION Data Mining (DM) has the following two definitions in Wikipedia Encyclopedia. First, DM is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. Second, DM is the science of extracting useful information from large data sets or databases. [2] Data mining involves the process of analyzing data to show patterns or relationships. A simple example of data mining is its use in a retail sales department. If a store tracks the purchases of a customer and notices that a customer buys a lot of silk shirts, the data mining system will make a correlation between that customer and silk shirts. The sales department will look at that information and may begin direct mail marketing of silk shirts to that customer, or it may alternatively attempt to get the customer to buy a wider range of products. In this case, the data mining system used by the retail store discovered new information about the customer that was previously unknown to the company. Another widely used example is that of a very large North American chain of supermarkets. Through intensive analysis of the transactions and the goods bought over a period of time, analysts found that beers and diapers were often bought together. Though explaining this interrelation might be difficult, taking advantage of it, on the other hand, should not be hard (e.g. placing the high-profit diapers next to the high-profit beers). This technique is often referred to as Market Basket Analysis. [1] 2 2. WHAT IS DATA MINING? Like all decision support systems, data mining delivers information. Data warehouses with query tools are driven by the users. Data mining is a data-driven approach that accomplishes the discovery of knowledge by itself. Organizations gather enterprise data from operational systems, store the data in data warehouses. Data mining takes the process a giant step further. [3] Data mining (DM) and knowledge discovery are commonly seen as intelligent tools that help to accumulate and process data and make use of it [4]. DM bridges many technical areas, including databases, statistics, machine learning, and humancomputer interaction. The set of DM processes used to extract and verify patterns in data is the core of the knowledge discovery process [4]. The idea of learning from data is far from being new. However, likely due to developments in the database management and the huge increase of data volumes being accumulated in databases the interest in DM has become very intense. Numerous DM algorithms have recently 3 been developed to extract knowledge from large databases. Currently, most research in the DM area focuses on the development of new algorithms or the improvement of speed or accuracy of the existing ones [5]. 3. OLAP VERSUS DATA MINING With OLAP queries and analysis, users are able to obtain results from complex queries and derive interesting patterns. Data mining also enables users to uncover interesting patters, but there is an essential difference in the way the results are obtained. OLAP helps the user to analyze the past and gain insights. On the other hand, data mining helps the user predict the future. [3] Data mining and OLAP are complementary. You may say that data mining picks up where OLAP leaves off. The analyst drives the process while using OLAP tools. In data mining, the analyst prepares the data and “sits back” while the tools drive the process. [3] Features Data Granularity Summary data. Number of business dimension Number of dimension attributes Sizes of datasets for the dimensions Limited number Of dimensions. Small number Of attributes. Not large for each dimension. User-driven, Interactive analysis. Multidimensional Drill-Down Slice and dice. DATA MINING Predict the future based on why this is happening. Detailed Transaction-level data. Large number Of dimensions. Many dimension Attributes. Usually very large for each dimension. Data-driven automatic knowledge discovery. Prepare data Launch mining tool, and Sit back. Widely used. Still emerging. Information Request Analysis approach Analysis techniques State of the technology OLAP What is happening in the enterprise? 4 4. DATA MINING APPLICATIONS Customer Segmentation This is one of the most widespread applications. Businesses use data mining to understand their customers. Cluster detection algorithms discover clusters of customers sharing the same characteristics. Market Basket Analysis This is a very useful application for retail. Link analysis algorithms uncover affinities between products that are bought together. Other businesses such as upscale auction houses use these algorithms to find customers to whom they can sell higher-value items. Risk Management Insurance companies and mortgage businesses use data mining to uncover risks associated with potential customers. Fraud Detection Credit card companies use data mining to discover abnormal spending patterns of customers. Such patterns can expose fraudulent use of the cards. Delinquency Tracking Loan companies use the technology to track customers who are likely to default on repayments. Demand Prediction Retail and other businesses use data mining to match demand and supply trends to forecast demand for specific products. [3] 5 5. WHERE DATA MINING IS AND WHERE TO GO Wu notices that a new successful industry (as DM) can follow consecutive phases: (1) Discovering a new idea, (2) Ensuring its applicability, (3) Producing small-scale systems to test the market, (4) Better understanding of new technology and (5) Producing a fully scaled system. At the present moment there are several dozens of DM systems, none of which can be compared to the scale of a DBMS system. This fact according to Lin indicates that we are still in the 3rd phase in the DM area. [5] REFERENCES 1. UCLA Anderson School of Management Online at http://www.anderson.ucla.edu/ faculty/jason.frand/teacher/technologies/palace/datamining.htm. 2. Wikipedia Encyclopedia, Online at http://en.wikipedia.org/wiki/Data_mining. 3. Ponniah, P., 2001: Data Warehousing Fundamentals, 399-426. 4. Fayyad U. “Data Mining and Knowledge Discovery: Making Sense Out of Data”, IEEE Expert 11(5), 1996, pp.20-25. 5. Wu X., Yu P., Piatetsky-Shapiro G., et al. “Data Mining: How Research Meets Practical Development?” Knowledge and Inf. Systems 5(2), 2000, pp. 248 – 261. 6