Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr Steven Parker, Standard Chartered Knowledge Discovery Centre: CityU-SAS Partnership 1 The Art and Science of Data Mining Y V Hui City University of Hong Kong Knowledge Discovery Centre: CityU-SAS Partnership 2 The Driving Forces • Specialization and focus in business - To satisfy the needs of customers - To improve and develop specific business strategies and processes - Personalization through mass customization Knowledge Discovery Centre: CityU-SAS Partnership 3 The Driving Forces • Challenges - local and global competition - distributed business operations - product innovation • Technology development • Benefit, cost and risk on a product or customer basis Knowledge Discovery Centre: CityU-SAS Partnership 4 Data Mining • Also known as knowledge discovery in databases. Data mining digs out valuable information from large and messy data. (Computer scientist’s definition) • Data mining is a knowledge discovery process. It’s the integration of business knowledge, people, information, statistics and computing technology. Knowledge Discovery Centre: CityU-SAS Partnership 5 Data Mining is Hot • Ten Hottest Job, Time, 22 May, 2000 • 10 emerging areas of technology, MIT’s Magazine of Technology Review, Jan/Feb, 2001 Knowledge Discovery Centre: CityU-SAS Partnership 6 Data Mining Philosophy • A powerful enabler of competitive advantage. • Data mining is driven from business knowledge. • Data mining is about enabling people to discover actionable information about their business. • Return of profit isn’t about algorithms Knowledge Discovery Centre: CityU-SAS Partnership 7 Management’s Decision World Scope of Data Mining Interface Business outlook Industry conditions Product offering Customer analysis Strategic options Competitive actions etc Data Miner’s Analytical World Problem Project design development Data collection and and management preparation Model building Reporting Validation and evaluations Knowledge Discovery Centre: CityU-SAS Partnership 8 Project Management • Cross-functional team • System architecture Knowledge Discovery Centre: CityU-SAS Partnership 9 Successful applications • Business transaction - risks and opportunities • Customer relationship management - personalization, target marketing • Electronic commerce & web - web mining Knowledge Discovery Centre: CityU-SAS Partnership 10 Successful applications • • • • Science & engineering Health care Multi-media Others Knowledge Discovery Centre: CityU-SAS Partnership 11 Data Mining Process Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership 12 Understanding Your Business • Do we have a problem? - What is the current situation? Are there any undesirable situations that need attention? - Are there any conditions, processes, etc, that could be improved? - Are any problems foreseeable that could affect the business? - Are there any potential opportunities that the company may capitalize on? A problem is a learning opportunity Knowledge Discovery Centre: CityU-SAS Partnership 13 Understanding Your Problem • • • • • • • Operational or analytical Convention rule or knowledge discovery Product based or customer based Market research or data mining Ownership of the information Privacy Added value Knowledge Discovery Centre: CityU-SAS Partnership 14 Data Mining Process Collecting relevant information Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership 15 Collecting Relevant Information • • • • Data Data Data Data Search Collection Preparation Mining Database Knowledge Discovery Centre: CityU-SAS Partnership 16 Data Search • Exploring the problem space. Don’t let the data drive the problem. • Measurement • Exploring the data sources Knowledge Discovery Centre: CityU-SAS Partnership 17 Data Collection • • • • Data retrieval Data audit Data set assembly and data warehouse Survey Knowledge Discovery Centre: CityU-SAS Partnership 18 Data Preparation • • • • • • Data representation Data exploration Data normalization Data transformation Imputation of missing data Data tuning Knowledge Discovery Centre: CityU-SAS Partnership 19 Data Mining Database • Variable selection • Record selection • Data set partition Knowledge Discovery Centre: CityU-SAS Partnership 20 Data Mining Process Learning Collecting relevant information Model building Understanding of business Problem identification Knowledge Discovery Centre: CityU-SAS Partnership 21 Model Building • Model based vs non-model based y1,y2,…,yp=f(x1, …, xq) Inputs Outputs x1, …, xq y1, …, yp Knowledge Discovery Centre: CityU-SAS Partnership 22 Model Building • Parametric vs nonparametric Knowledge Discovery Centre: CityU-SAS Partnership 23 Model Building • • • • Estimation vs trial and error Directed vs undirected Multidimensional analysis Large data set vs small data set Knowledge Discovery Centre: CityU-SAS Partnership 24 Data Mining Algorithms Online Analytical Processing Discovery Driven Methods Description SQL Prediction Query Tools Classification Regressions Visualization Clustering Association Decision Trees Neural Networks Sequential Analysis Knowledge Discovery Centre: CityU-SAS Partnership 25 Online Analytical Processing • Query and reporting Example of SQL query: How many credit-card customers who made purchases of over $1,000 on sporting goods in December have at least $20,000 of available credit? • Manual and validation driven Knowledge Discovery Centre: CityU-SAS Partnership 26 Estimation and Prediction • Statistical models • Neural network Example: Housing price valuation model Knowledge Discovery Centre: CityU-SAS Partnership 27 Classification Algorithms • • • • • Statistical techniques Neural networks Genetic algorithms Nearest neighbor method Rule induction and decision tree Example: Customer segmentation and buying behavior description Knowledge Discovery Centre: CityU-SAS Partnership 28 Association Rules • Apriori algorithm Example: Market basket analysis, cross selling analysis Knowledge Discovery Centre: CityU-SAS Partnership 29 Sequential Analysis • Count-all algorithm • Count-some algorithm Example: Attached mailing, add-on sales Knowledge Discovery Centre: CityU-SAS Partnership 30 Algorithms Comparison • No single data mining algorithm can outperform any other. Try different algorithms and draw conclusions from the results. Use your business knowledge. • Neural networks do no better than statistical models when the underlying structure is known. However, neural networks detect hidden interactions and nonlinearity. Use the prior information if available. Knowledge Discovery Centre: CityU-SAS Partnership 31 Algorithms Comparison • Data mining algorithms cannot handle dependent records. Use the prior information. Statistical models help. • Data tuning and dimension reduction enhance data mining before and after the analysis. Statistical techniques help. Knowledge Discovery Centre: CityU-SAS Partnership 32 Data Mining Process Learning Collecting relevant data Model building Understanding of business Problem identification Business strategy and evaluation Action Knowledge Discovery Centre: CityU-SAS Partnership 33 Trends that Effect Data Mining • Data trends - data explosion - data types Knowledge Discovery Centre: CityU-SAS Partnership 34 Trends that Effect Data Mining • Hardware trends - memory - processing speed - storage Knowledge Discovery Centre: CityU-SAS Partnership 35 Trends that Effect Data Mining • Network trends - network connectivity - distributed databases • Wireless communication Knowledge Discovery Centre: CityU-SAS Partnership 36 Trends that Effect Data Mining • Scientific computing trends - theory, experiment and simulation Knowledge Discovery Centre: CityU-SAS Partnership 37 Trends that Effect Data Mining • Business trends - total quality management, - customer relationship management, - business process reengineering, - enterprise resources planning, - supply chain management, - business intelligence and knowledge management, - e – business and m – business Knowledge Discovery Centre: CityU-SAS Partnership 38 Trends that Effect Data Mining • Privacy and Security Knowledge Discovery Centre: CityU-SAS Partnership 39 Pot of Gold • The benefits of knowing one’s business and customers become so critical that technologies are coming together to support data mining. • Data mining is not a cybernetic magic that will turn your data into gold. It’s the process and result of knowledge production, knowledge discovery and knowledge management. Knowledge Discovery Centre: CityU-SAS Partnership 40