Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1/23/2010 Data Mining in Business Intelligence Professor Hui Xiong I/UCRC Center for Dynamic Data Analytics Rutgers University Data Mining Tasks Data Tid Refund Marital Status Taxable Income Cheat 1 125K Yes Single No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married No 7 Yes Divorced 220K 60K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 11 No Married 60K No 12 Yes Divorced 220K No 13 No Single 85K Yes 14 No Married 75K No 15 No Single 90K Yes 10 Milk Financial Fraud Detection • Inside Trading, Market Manipulation, Fraud 1 1/23/2010 Spoiled by One Very Rotten Apple ‐ Rogue Trader’s $7.14 Billion Loss • Biggest Bank Fraud in History – 2008: Bank Societe Generale – $7.14 Billion Loss • A single futures trader, Jerome Kerviel, who scheme of fictitious transactions • China Aviation Oil (CAO), Chen Jiulin, led to a loss of $550 million Business & Economic Networks Example: eBay bidding vertices: eBay users, links: represent bidder-seller or buyer-seller fraud detection: bidding rings Example: corporate boards vertices: corporations links: between companies that share a board member Example: corporate partnerships vertices: corporations links: represent formal joint ventures Example: goods exchange networks vertices: buyers and sellers of commodities links: represent “permissible” transactions A Sample Network of Board of Directors 2 1/23/2010 Financial Fraud Detection • Cross‐account/channel Fraud Detection – Money transfer (ring of traders, multiple accounts) – Price manipulations (in or outgoing stars, potentially with losses) • Fraud Risk Propagation in Corporation Networks Deliverables • First 6 months – Building a database of bankrupt companies with the information, such as board of directors • 12 months and associated knowledge transfer – A demo system for detecting fraud / short signals Cab Location Traces • 500 Taxi drivers • About 30‐day data in San Francisco Spatial‐temporal temporal sequence sequence • Spatial – Latitude – Longitude – Identifier of Business • 1 indicates with passenger • 0 indicates no passenger – Time stamp 3 1/23/2010 Profiling Driver Behaviors Profiling the driver behaviors to identify transportation related green knowledge i.e. highly effective use of energy; safety driving; the driving patterns affecting the gasoline consumption Method Driver Segmentation Trajectory Clustering Ref: Transecurity Understanding Cab Driver Behaviors Energy‐related Knowledge Discovery • Driver Segmentation based on their effective driving time – Ratio between driving time with customers and driving time without customers driving time without customers • • • • Clustering of effective pick‐up points Frequent trajectory with customers Frequent trajectory without customers Moving pattern of most profitable drivers 4 1/23/2010 Energy‐Efficient Mobile Recommender Systems • Recommend routes – Suggest a sequence of pick‐up points for cab drivers in a real‐time fashion based on the knowledge learnt from history data g y – Suggest to avoid area where may lead to less effective use of gasoline. • Knowledge for Safety Driving Training • Pattern for Cab Driver Coaching and Feedback Context‐Aware Customer Service Support Customer service support: an integral part of most companies Customer Service Problem Log • Structured attributes: limited information • Unstructured attributes • A Sample Problem Log Entry 5 1/23/2010 Context‐Aware Customer Service Support • User behaviors identified from problem logs • Demographic information of Customers • Multi‐focal Learning Multi‐focal Learning: An illustration • Multi‐focal learning: partition training data into several different focal groups and build prediction model within each focal group Deliverables • First 6 months – Context‐aware feature selection – Multi‐source demographic customer data collection • 12 months and associated knowledge transfer – A software package for context‐aware multi‐focal learning for customer service support 6 NSF Industry/University Center for Dynamic Data Analytics (CDDA) Project Summary Project Name: An Energy-Efficient Mobile Recommender System Project Investigators: Hui Xiong Description: The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for knowledge discovery in transportation systems. A particularly promising area is to extract energy-efficient transportation patterns (green knowledge), which can be used as the guidance for reducing inefficiencies in energy consumption of transportation sectors. However, extracting green knowledge from location traces is not a trivial task. Conventional data analysis tools are usually not customized for handling the massive quantity, complex, dynamic, and distributed nature of location traces. To that end, in this project, we will provide a focused study of extracting energy-efficient transportation patterns from location traces. Specifically, we have the initial focus on a sequence of mobile recommendations. As a case study, we will develop a mobile recommender system which has the ability in recommending a sequence of pick-up points for taxi drivers. The goal of this mobile recommendation system is to maximize the probability of business success. Experimental Plan : - Sept. 10: Data Preprocessing - Dec 10: Algorithm Design - Spring 11: Testing of algorithms - Fall 11: Performance Evaluation Related Work Elsewhere: How Ours Is Different: - Classic recommender systems are focused - Mobile recommender systems is underon traditional application domains, such as explored commercial item recommendation - Recommendation based on business success instead of user ratings Related Work in Center: Milestones: - Vision and data analysis applications - 2010-2011: Focus on algorithm - DHS work on camera networks development - 2011: Implementation of a Demo system and Evaluation of the performances of Energy-Efficient Mobile Recommendation Budget: $50,000 Deliverables: - Technical demonstration along with a technical report resulting in a publication; Potential Benefits to Member Companies: - Ideas for developing energy-efficient location based services NSF Industry/University Center for Dynamic Data Analytics (CDDA) Project Summary Project Name: Mobile Web Usage Profiling for System Performance Tuning Project Investigators: Hui Xiong Description: The objective of this proposed research is to profile the behaviors of mobile web users. Due to the differences in age, profession, gender, and cultural background, mobile users may exhibit a large degree of diversity in how they access the mobile Internet. Understanding this diversity as well as extracting similarity in the user patterns is thus critical to designing and developing future mobile applications which is centered on mobile search. In order to address this need, we have obtained web usage logs from a mobile service provider, and propose to perform a detailed analysis of the logs. Specifically, we propose to analyze the logs based on the method of user segmentation, which cluster users with similar behaviors based on their demographic data, search keywords, and click histories. This research poses challenges in, as well as advances the development of, both data mining and mobile computing. By the end of the project, we expect to develop a set of techniques that can effectively characterize users’ usage patterns and a list of observations that can be leveraged for improving the performance of the mobile Web sites. Experimental Plan : - Sept. 10: Data Preprocessing - Dec 10: Algorithm Design - Spring 11: Testing of algorithms - Fall 11: Performance Evaluation Related Work Elsewhere: - Customer Segmentation - Customer Profiling Related Work in Center: - Vision and data analysis applications How Ours Is Different: - Cross-information-source collaborative customer analysis Milestones: - 2010-2011: Focus on algorithm development - 2011: Testing of a demo system for customer analysis; evaluation of the performances Budget: $50,000 Deliverables: - Technical demonstration along with a technical report resulting in a publication; Potential Benefits to Member Companies: - Techniques for multi-source and context-aware customer analysis