Download Data Mining II - Computer Science Department

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Lluis Belanche + Alfredo Vellido
Data Mining II
DM2
2009/2010. Alfredo Vellido
An Introduction to Mining (1)
www.lsi.upc.edu/~avellido/teaching/data_mining.html
Despacho 319, Edificio Omega, BCN
Despacho 107, Edif. TR-2, Terrassa
[email protected]
skype avellido, gtalk
Tels. 934137796, 937398090
Contents of the course disclaimer:(but who knows)
1. Introduction to DM and its methodologies
2. Visual DM: Exploratory DM through visualization
3. Pattern recognition 1
4. Pattern recognition 2
Feature
5. Feature Selection
extraction & Extraction
6. Feature selection
SML
& Kernel Methods
7. Error estimation
8. Linear classifiers, kernels and SVMs
9. Probability in Data Mining
10. Nonlinear Dimensionality Reduction (NLDR)
11. Applications of NLDR: from medicine to ecology
12. DM Case studies
Sorry guys! … no fuzzy systems …
What is DATA MINING? (1)
“Data Mining is the process of discovering
actionable and meaningful patterns, profiles, and
trends by sifting through your data using pattern
recognition technologies (…) is a hot new
technology about one of the oldest processes of
human endeavour: pattern recognition (…) It is an
iterative process of extracting knowledge from
business transactions (…) DM is the automatic
discovery of usable knowledge from your stored
data.”
Jesús Mena: Data Mining your Website
(Digital Press, 1999, available @ books.google)
What is DATA MINING? (2)
“Data Mining, by its simplest definition, automates the
detection of relevant patterns in a database (…) For many
years, statisticians have manually “mined” databases (…)
DM uses well-established statistical and machine learning
techniques to build models that predict customer behaviour.
Today, technology automates the mining process, integrates it
with commercial data warehouses, and presents it in a
relevant way for business users (…) the leading DM products
address the broader business and technical issues, such as
their integration into complex IT environments.”
Berson, Smith, & Thearling: Building Data Mining
Applications for CRM (McGraw-Hill, 2000)
What is DATA MINING? (3)
WIKIPEDIA DIXIT:
“Data mining has been defined
as "The nontrivial extraction of implicit, previously
unknown, and potentially useful information from
data" (1) and "The science of extracting useful
information from large data sets or databases" (2).
Although it is usually used in relation to analysis of data,
data mining, like artificial intelligence, is an umbrella term
and is used with varied meaning in a wide range of
contexts.”
(1) W. Frawley and G. Piatetsky-Shapiro and C. Matheus, Knowledge
Discovery in Databases: An Overview. AI Magazine, 1992, 213-228.
(2) D. Hand, H. Mannila, P. Smyth: Principles of Data Mining. MIT Press, 2001.
wikipedia 2005: en.wikipedia.org/wiki/Data_mining
What is DATA MINING? (4)
WIKIPEDIA’06 DIXIT:
“Data mining (DM), also
called Knowledge-Discovery in Databases (KDD) or
Knowledge-Discovery and Data Mining, is the process
of automatically searching large volumes of data for
patterns such as association rules. It is a fairly recent topic
in computer science but applies many older
computational techniques from statistics, information
retrieval, machine learning and pattern recognition.
wikipedia 2006: en.wikipedia.org/wiki/Data_mining
What is DATA MINING? (5)
In 1996, in the proceedings of the 1st International Conference on
KDD, Fayyad gave one of the best-known definitions of Knowledge
Discovery from Data:
“The non-trivial process of identifying valid, novel, potentially useful,
and ultimately understandable patterns in data.”
KDD quickly gathered strength as an interdisciplinary research field
where a combination of advanced techniques from Statistics, Artificial
Intelligence, Information Systems, and Visualization are used to tackle
knowledge acquisition from large data bases. The term Knowledge
Discovery from Data appeared in 1989 referring to the:
“[...] overall process of finding and interpreting patterns from data,
typically interactive and iterative, involving repeated application of
specific data mining methods or algorithms and the interpretation of
the patterns generated by these algorithms.”
What is DATA MINING? (6)
WIKIPEDIA’08 DIXIT:
DIXIT “Data mining is the process of
sorting through large amounts of data and picking out
relevant information. It is usually used by business
intelligence organizations, and financial analysts, but is
increasingly being used in the sciences to extract information
from the enormous data sets generated by modern
experimental and observational methods. It has been described
as "the nontrivial extraction of implicit, previously unknown, and
potentially useful information from data" and "the science of
extracting useful information from large data sets or databases."
Data mining in relation to enterprise resource planning is the
statistical and logical analysis of large sets of transaction data,
looking for patterns that can aid decision making.”
wikipedia 2008: en.wikipedia.org/wiki/Data_mining
What is DATA MINING? (7)
WIKTIONARY’08 summarizes:
“a technique for searching large-scale databases for
patterns; used mainly to find previously unknown
correlations between variables that may be commercially
useful”
wiktionary 2008:
http://en.wiktionary.org/wiki/data_mining
Homework:
1. Look for DM in Wikipedia’09
2. Write your own WIKTIONARY entry for the term “Data Mining”
What to expect from a DM conference…
15-17 September’04: Wessex Institute of
Technology (W.I.T.)
What to find in a DM conference…
Sessions 1 & 2: Text Mining
Session 3: Web Mining
Session 4: Clustering Techniques
Session 5: Data Preparation Techniques
Session 6 & 7: Applications in Business, Industry and
Government
Session 8: Customer Relationship Management (CRM)
Session 9 & 10: Applications in Science and Engineering
What to find in a DM conference
(three years later)…
Session 1: Categorisation Methods
Session 2: Data Preparation
Session 3: Enterprise Information Systems
Session 4: Clustering Techniques
Session 5: National Security
Session 6: Data and Text Mining
Session 7: Mining Environmental and Geospatial Data
Session 8: Applications in Business, Industry and
Government
What to find these days …
What to find these days …
Investigative Data
Mining For Security And
Criminal Detection, First
Edition
Jesus Mena
Butterworth-Heinemann 2003
A different conference, a different take …
IEEE CIDM 2009
2009 Symposium on Computational Intelligence and Data Mining
• CI/probabilistic/statistical and other methods
• Data understanding, rule extraction, logical models
• Feature extraction, selection, aggregation, construction
• Multimedia data mining, recognition and interpretation of image and video
sequences
• Mining of signals and data streams
• Mining spatial and spatio-temporal data
• Mining of very large datasets, scalability
• Text, graph and web mining
• Meta-learning, predictive data mining
• Visual data mining
• Case studies.
• Applications to biometrics, biomedicine, chemistry, drug design, ecommerce, engineering, finance and marketing research, intelligence,
industry, remote sensing, scientific data mining, security, sensory
networks and others.
Starved for ca$h?: ask your TIA
The T.I.A.
“The Total Information Awareness (TIA) program may have
been killed by congressional decree, but key elements of
the program have survived at other intelligence agencies,
according to congressional, federal, and research officials.
TIA's goal was to employ data-mining to shift through
public and private databases to track terrorists, which
stirred up fears that the program would be used to spy on
millions of innocent Americans.”
“Congressional officials have not disclosed which TIA programs
were eliminated and which were retained, but insiders
report that TIA's Evidence Extraction and Link
Discovery projects, collectively encompassing 18 datamining initiatives, are among the surviving
components. “
“Despite the death of TIA, Capitol Hill is still paying for the
development of software designed to collect foreign
intelligence on terrorists: a $64 million research program
run by the Advanced Research and Development
Activity (ARDA), which has employed some of the same
researchers as TIA, was left untouched by Congress.”
www.darpa.mil/ipto/programs/programs.htm
What’s DATA MINING?: A historicist
viewpoint
What’s DATA MINING?: A historicist
viewpoint
)
$
'(
"!
# $%
!&! %
!
"!
What’s DATA MINING?: A historicist
viewpoint
$
'(
"!
# $%
!
!&! %
"!