Download 1. Data Mining in Business Intelligence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
1/23/2010
Data Mining in Business Intelligence
Professor Hui Xiong
I/UCRC Center for Dynamic Data Analytics Rutgers University
Data Mining Tasks Data
Tid Refund Marital
Status
Taxable
Income Cheat
1
125K
Yes
Single
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced 95K
Yes
6
No
Married
No
7
Yes
Divorced 220K
60K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
11
No
Married
60K
No
12
Yes
Divorced 220K
No
13
No
Single
85K
Yes
14
No
Married
75K
No
15
No
Single
90K
Yes
10
Milk
Financial Fraud Detection
• Inside Trading, Market Manipulation, Fraud
1
1/23/2010
Spoiled by One Very Rotten Apple ‐ Rogue Trader’s $7.14 Billion Loss
• Biggest Bank Fraud in History
– 2008: Bank Societe Generale – $7.14 Billion Loss
• A single futures trader, Jerome Kerviel, who scheme of fictitious transactions
• China Aviation Oil (CAO), Chen Jiulin, led to a loss of $550 million
Business & Economic Networks
™ Example: eBay bidding
‰ vertices: eBay users, links: represent bidder-seller or buyer-seller
‰ fraud detection: bidding rings
™ Example: corporate boards
‰ vertices: corporations
‰ links: between companies that share a board member
™ Example: corporate partnerships
‰ vertices: corporations
‰ links: represent formal joint ventures
™ Example: goods exchange networks
‰ vertices: buyers and sellers of commodities
‰ links: represent “permissible” transactions
A Sample Network of Board of Directors
2
1/23/2010
Financial Fraud Detection
• Cross‐account/channel Fraud Detection
– Money transfer (ring of traders, multiple accounts)
– Price manipulations (in or outgoing stars, potentially with losses)
• Fraud Risk Propagation in Corporation Networks
Deliverables
• First 6 months – Building a database of bankrupt companies with the information, such as board of directors
• 12 months and associated knowledge transfer
– A demo system for detecting fraud / short signals
Cab Location Traces
• 500 Taxi drivers • About 30‐day data in San Francisco
Spatial‐temporal
temporal sequence
sequence
• Spatial
– Latitude
– Longitude
– Identifier of Business
• 1 indicates with passenger • 0 indicates no passenger
– Time stamp
3
1/23/2010
Profiling Driver Behaviors
…
Profiling the driver behaviors to identify transportation related green knowledge …
…
i.e. highly effective use of energy; safety driving; the driving patterns affecting the gasoline consumption
Method
… Driver Segmentation
… Trajectory Clustering Ref: Transecurity
Understanding Cab Driver Behaviors
Energy‐related Knowledge Discovery
• Driver Segmentation based on their effective driving time
– Ratio between driving time with customers and driving time without customers
driving time without customers
•
•
•
•
Clustering of effective pick‐up points
Frequent trajectory with customers
Frequent trajectory without customers
Moving pattern of most profitable drivers
4
1/23/2010
Energy‐Efficient Mobile Recommender Systems
• Recommend routes
– Suggest a sequence of pick‐up points for cab drivers in a real‐time fashion based on the knowledge learnt from history data
g
y
– Suggest to avoid area where may lead to less effective use of gasoline.
• Knowledge for Safety Driving Training
• Pattern for Cab Driver Coaching and Feedback
Context‐Aware Customer Service Support
…
Customer service support: an integral part of most companies
Customer Service Problem Log
• Structured attributes: limited information
• Unstructured attributes
• A Sample Problem Log Entry
5
1/23/2010
Context‐Aware Customer Service Support
• User behaviors identified from problem logs
• Demographic information of Customers
• Multi‐focal Learning
Multi‐focal Learning: An illustration
• Multi‐focal learning: partition training data into several different focal groups and build prediction model within each focal group
Deliverables
• First 6 months – Context‐aware feature selection
– Multi‐source demographic customer data collection
• 12 months and associated knowledge transfer
– A software package for context‐aware multi‐focal learning for customer service support
6
NSF Industry/University Center for Dynamic Data Analytics (CDDA)
Project Summary
Project Name: An Energy-Efficient Mobile Recommender System
Project Investigators: Hui Xiong
Description:
The increasing availability of large-scale location traces creates unprecedent opportunities to
change the paradigm for knowledge discovery in transportation systems. A particularly
promising area is to extract energy-efficient transportation patterns (green knowledge), which
can be used as the guidance for reducing inefficiencies in energy consumption of transportation
sectors. However, extracting green knowledge from location traces is not a trivial task.
Conventional data analysis tools are usually not customized for handling the massive quantity,
complex, dynamic, and distributed nature of location traces.
To that end, in this project, we will provide a focused study of extracting energy-efficient
transportation patterns from location traces. Specifically, we have the initial focus on a
sequence of mobile recommendations. As a case study, we will develop a mobile recommender
system which has the ability in recommending a sequence of pick-up points for taxi drivers. The
goal of this mobile recommendation system is to maximize the probability of business success.
Experimental Plan :
- Sept. 10: Data Preprocessing
- Dec 10: Algorithm Design
- Spring 11: Testing of algorithms
- Fall 11: Performance Evaluation
Related Work Elsewhere:
How Ours Is Different:
- Classic recommender systems are focused - Mobile recommender systems is underon traditional application domains, such as
explored
commercial item recommendation
- Recommendation based on business
success instead of user ratings
Related Work in Center:
Milestones:
- Vision and data analysis applications
- 2010-2011: Focus on algorithm
- DHS work on camera networks
development
- 2011: Implementation of a Demo system
and Evaluation of the performances of
Energy-Efficient Mobile Recommendation
Budget: $50,000
Deliverables:
- Technical demonstration along with a
technical report resulting in a publication;
Potential Benefits to Member Companies:
- Ideas for developing energy-efficient location based services
NSF Industry/University Center for Dynamic Data Analytics (CDDA)
Project Summary
Project Name: Mobile Web Usage Profiling for System Performance Tuning
Project Investigators: Hui Xiong
Description:
The objective of this proposed research is to profile the behaviors of mobile web users. Due to
the differences in age, profession, gender, and cultural background, mobile users may exhibit a
large degree of diversity in how they access the mobile Internet. Understanding this diversity as
well as extracting similarity in the user patterns is thus critical to designing and developing
future mobile applications which is centered on mobile search. In order to address this need, we
have obtained web usage logs from a mobile service provider, and propose to perform a detailed
analysis of the logs. Specifically, we propose to analyze the logs based on the method of user
segmentation, which cluster users with similar behaviors based on their demographic data,
search keywords, and click histories. This research poses challenges in, as well as advances the
development of, both data mining and mobile computing. By the end of the project, we expect
to develop a set of techniques that can effectively characterize users’ usage patterns and a list of
observations that can be leveraged for improving the performance of the mobile Web sites.
Experimental Plan :
- Sept. 10: Data Preprocessing
- Dec 10: Algorithm Design
- Spring 11: Testing of algorithms
- Fall 11: Performance Evaluation
Related Work Elsewhere:
- Customer Segmentation
- Customer Profiling
Related Work in Center:
- Vision and data analysis applications
How Ours Is Different:
- Cross-information-source collaborative
customer analysis
Milestones:
- 2010-2011: Focus on algorithm
development
- 2011: Testing of a demo system for
customer analysis; evaluation of the
performances
Budget: $50,000
Deliverables:
- Technical demonstration along with a
technical report resulting in a publication;
Potential Benefits to Member Companies:
- Techniques for multi-source and context-aware customer analysis