Download Data Mining Methods - socialcomputing-iba

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Social Mining
Social Computing
2
Data Mining
 Data mining is an important new information technology used
to identify significant data from vast amounts of records
 It is also part of a process called knowledge discovery in
databases, which presents and processes data to obtain
knowledge.
3
Goal and Usefulness of Data
Mining
 Goals:
 Improve quality of interaction between the system and it’s users.
 Improve decision making
 Usefulness:
 An automatic analysis and discovery tool for extraction of useful
knowledge from huge amounts of valuable information.
4
Knowledge Discovery Process
5
Data Mining Tasks
1.
2.
3.
4.
Association Rules
Clustering
Classification
Forecast
Data Mining Methods
1. Decision trees and rules
2. Non-linear regression
Classification Methods
3. Example based methods
4. Probabilistic Graphical
Dependency Models
5. Relational Learning
Models
6
Statistical Inference vs Data
Mining
Formal statistical Inference is “assumption-driven.”
Hypothesis is first formed and then validated against data.
Data Mining is “discovery-driven.”
In the sense, patterns and hypothesis are automatically
extracted from data.
7
Data Mining – Practical Usage
 Direct Marketing;
 Fraud Control;
 Credit Analysis;
 Outlier Analysis.
8
Effective implementation of Data
Mining
A.
Development of a Data Warehouse
Data Warehouse - Functions in three layers:
staging, integration and access.
The functions are in the DW to meet the users' reporting needs.
 Staging is used to store raw data for use by developers
(analysis and support).
 Integration layer is used to integrate data and to have a level
of abstraction from users.
 Access layer is for getting data out for users.
9
Contd.
B. Ease and Simplicity of Data Mining Tools
Produce an automated real-time detection of patterns or
anomalies.
Decision Support Systems
Knowledge Discovery in Databases
Data Warehouse
10
Contd.
C. Knowledge of Data Analysis
Database specialists and computer scientists can contribute
the most in this area.
11
Three chief facilities of Search
Engines
1. Gather a set of Web Pages that form the universe from
where users can retrieve information.
2. Represent pages in this universe in a fashion that attempts
to capture their content.
3. They allow searchers to issue queries, employing
information retrieval algorithms that attempt to find most
relevant pages from the universe.
12
Data Mining and Web Search
Engines
 A customer service database stores two types of service
information:
1. Unstructured customer service reports.
2. Structured Data on Sales, Employees, and Customers.
Most search engines have advanced search capabilities that
will allow the user to specify additional search parameters to
obtain more refined results.
DBMS acts as an access to involve search engines in a data
warehouse environment.
13
Differences in Web Search and
Data Mining
 Web searches are usually started with some sort of query in
a search engine.
 While Data Mining does its searching based on the data
itself, data mining tools and specified output format.
14
Role of Social Scientists
 Contribute to;
1.
2.
3.
4.
Research.
Development of Rules for flagging anomalous behavior.
Identify and understand elements in the data sets.
Develop guidelines and methods to ascertain which data
mining techniques are the most effective in a particular case.
15
Assael’s Consumer Information
Acquisition and Processing Model
16
Conceptual Model of Information
and Source Utilization
17
Model of Information Needs
18
Consumer-Oriented Information
Search Model
19
Contd.
20
Contd.
21
Contd.
22
Examples from the Economist
 According to the Economist, there’s a big market for such
software.
 “By one estimate there are more than 100 programs for network
analysis, also known as link analysis or predictive analysis. The
raw data used may extend far beyond phone records to
encompass information available from private and governmental
entities, and internet sources such as Facebook. IBM, the
supplier of the system used by Bharti Airtel, says its annual
sales of such software, now growing at double-digit rates, will
exceed $15 billion by 2015. In the past five years IBM has spent
more than $11 billion buying makers of network-analysis
software. Gartner, a market-research firm, ranks the technology
at number two in its list of strategic business operations meriting
significant investment this year.”
23
For Example
 The article also touches on more sophisticated systems that
integrate additional information, including V.S. Subrahmanian’s
work on STOP:
 “Called SOMA(Stochastic Opponent Modeling Agents) is a
formal, logical-statistical reasoning framework that uses data
about past behavior of terror groups in order to learn rules about
the probability of an organization, community, or person taking
certain actions in different situations.)
 SOMA Terror Organization Portal, it analyses a wide range of
information about politics, business and society in Lebanon to
predict, with surprising accuracy, rocket attacks by the country’s
Hizbullah militia on Israel. Attacks tend to increase, for example,
as more money from Islamic charities flows into Lebanon.
Attacks decrease during election years, particularly as more
Hizbullah members run for office and campaign energetically. By
the middle of 2010 SOMA was sucking up data from more than
200 sources, many of them newspaper websites. The number of
sources will have more than doubled by the end of the year.”
24
References
 www.emeraldinsight.com/0264-0473.htm
 www.emeraldinsight.com/0263-5577.htm
 www.emeraldinsight.com/0968-5227.htm
 www.economist.com/node/16910031
 Journal of Financial Crime Vol.12 No.1
25
Thank You
Mohd. Ali Khan
Murtaza Marvi
Musa Bin Hamid
Syed Mohsin Hussain