Download Document

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
• Data mining
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
The overall goal of the data mining
process is to extract information from a
data set and transform it into an
understandable structure for further use
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
1
Even the popular book "Data mining:
Practical machine learning tools and
techniques with Java" (which covers
mostly machine learning material) was
originally to be named just "Practical
machine learning", and the term "data
mining" was only added for marketing
reasons
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
Neither the data collection, data
preparation, nor result interpretation
and reporting are part of the data
mining step, but do belong to the
overall KDD process as additional
steps.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
1
The related terms data dredging, data
fishing, and data snooping refer to the
use of data mining methods to sample
parts of a larger population data set
that are (or may be) too small for
reliable statistical inferences to be
made about the validity of any patterns
discovered. These methods can,
however, be used in creating new
hypotheses to test against the larger
data populations.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
Data mining interprets its data into real
time analysis that can be used to increase
sales, promote new product, or delete
product that is not value-added to the
company.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Etymology
Currently, Data Mining
and Knowledge Discovery
are used interchangeably.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Background
1
Data mining is the process of applying
these methods with the intention of
uncovering hidden patterns in large data
sets
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
The premier professional body in the
field is the Association for Computing
Machinery's (ACM) Special Interest
Group (SIG) on Knowledge Discovery
and Data Mining (SIGKDD). Since
1989 this ACM SIG has hosted an
annual international conference and
published its proceedings, and since
1999 it has published a biannual
academic journal titled "SIGKDD
Explorations".
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
Computer science
conferences on data mining
include:
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
DMKD Conference – Research
Issues on Data Mining and
Knowledge Discovery
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
ECDM Conference –
European
Conference on Data
Mining
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
ECML-PKDD Conference – European
Conference on Machine Learning and
Principles and Practice of Knowledge
Discovery in Databases
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
EDM Conference –
International Conference
on Educational Data
Mining
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
PAKDD Conference – The annual Pacific-Asia
Conference on Knowledge Discovery and Data
Mining
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
SSTD Symposium – Symposium
on Spatial and Temporal
Databases
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Research and evolution
1
Data mining topics are also present on
many data management/database
conferences such as the ICDE
Conference, SIGMOD Conference and
International Conference on Very Large
Data Bases
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Process
1
(5) Interpretation/Evaluation.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Process
1
It exists, however, in many variations on
this theme, such as the Cross Industry
Standard Process for Data Mining
(CRISP-DM) which defines six phases:
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Process
1
(5) Evaluation
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Process
1
or a simplified process such as (1) , (2) data mining,
and (3) results validation.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Process
1
Polls conducted in 2002, 2004, and 2007
show that the CRISP-DM methodology is
the leading methodology used by data
miners. The only other data mining
standard named in these polls was
SEMMA. However, 3-4 times as many
people reported using CRISP-DM.
Several teams of researchers have
published reviews of data mining
process models, and Azevedo and Santos
conducted a comparison of CRISP-DM
and SEMMA in 2008.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Pre-processing
1
Before algorithms can be used,
a target data set must be
assembled
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
Anomaly detection
(Outlier/change/deviation detection) –
The identification of unusual data
records, that might be interesting or
data errors that require further
investigation.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
1
Association rule learning (Dependency
modeling) – Searches for relationships
between variables. For example a
supermarket might gather data on
customer purchasing habits. Using
association rule learning, the
supermarket can determine which
products are frequently bought together
and use this information for marketing
purposes. This is sometimes referred to
as market basket analysis.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
Clustering – is the task of discovering
groups and structures in the data that are
in some way or another "similar", without
using known structures in the data.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
Classification – is the task of
generalizing known structure to apply
to new data. For example, an e-mail
program might attempt to classify an
e-mail as "legitimate" or as "spam".
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
1
Regression – Attempts to find a function which
models the data with the least error.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining
Summarization – providing a more
compact representation of the data
set, including visualization and report
generation.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Results validation
1
For example, a data mining algorithm
trying to distinguish "spam" from
"legitimate" emails would be trained on
a training set of sample e-mails
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Results validation
1
If the learned patterns do not meet the
desired , subsequently it is necessary to
re-evaluate and change the preprocessing and data mining steps. If the
learned patterns do meet the desired ,
then the final step is to interpret the
learned patterns and turn them into
knowledge.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Standards
There have been some efforts to define
standards for the data mining process, for
example the 1999 European Cross Industry
Standard Process for Data Mining (CRISPDM 1.0) and the 2004 Java Data Mining
standard (JDM 1.0). Development on
successors to these processes (CRISP-DM
2.0 and JDM 2.0) was active in 2006, but has
stalled since. JDM 2.0 was withdrawn without
reaching a final draft.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Standards
1
As the name suggests, it only covers
prediction models, a particular data
mining task of high importance to
business applications
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Games
for 3x3-chess) with any beginning
configuration, small-board dots-and-boxes,
small-board-hex, and certain endgames in
chess, dots-and-boxes, and hex; a new
area for data mining has been opened
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
1
If Walmart analyzed their point-of-sale
data with data mining techniques they
would be able to determine sales
trends, develop marketing
campaigns, and more accurately
predict customer loyalty
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
Once the results from data mining
(potential prospect/customer and
channel/offer) are determined, this
"sophisticated application" can either
automatically send an e-mail or a
regular mail
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
In order to maintain this quantity of
models, they need to manage model
versions and move on to automated data
mining.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
Data mining can also be helpful to
human resources (HR) departments in
identifying the characteristics of their
most successful employees. Information
obtained – such as universities attended
by highly successful employees – can
help HR focus recruiting efforts
accordingly. Additionally, Strategic
Enterprise Management applications
help a company translate corporate-level
goals, such as profit and margin share
targets, into operational decisions, such
as production plans and workforce
levels.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
1
If a clothing store records the purchases of
customers, a data mining system could
identify those customers who favor silk
shirts over cotton ones
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
Market basket analysis has also been
used to identify the purchase patterns of
the Alpha Consumer. Alpha Consumers
are people that play a key role in
connecting with the concept behind a
product, then adopting that product, and
finally validating it for the rest of society.
Analyzing the data collected on this type
of user has allowed companies to predict
future buying trends and forecast supply
demands.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
1
Data mining is a highly effective tool in the
catalog marketing industry. Catalogers
have a rich database of history of their
customer transactions for millions of
customers dating back a number of years.
Data mining tools can identify patterns
among customers and help identify the
most likely customers to respond to
upcoming mailing campaigns.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
1
Data mining for business applications
is a component that needs to be
integrated into a complex modeling
and decision making process.
Reactive business intelligence (RBI)
advocates a "holistic" approach that
integrates data mining, modeling, and
interactive visualization into an endto-end discovery and continuous
innovation process powered by human
and automated learning.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
1
The relation between the quality of a data mining
system and the amount of investment that the
decision maker is willing to make was formalized
by providing an economic perspective on the value
of “extracted knowledge” in terms of its payoff to
the organization This decision-theoretic
classification framework was applied to a realworld semiconductor wafer manufacturing line,
where decision rules for effectively monitoring and
controlling the semiconductor wafer fabrication line
were developed.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Business
Another implication is that on-line
monitoring of the semiconductor
manufacturing process using data mining
may be highly effective.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
1
In recent years, data mining has been
used widely in the areas of science and
engineering, such as bioinformatics,
genetics, medicine, education and
electrical power engineering.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
1
The data mining method that is used to
perform this task is known as multifactor
dimensionality reduction.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
In the area of electrical power
engineering, data mining methods
have been widely used for condition
monitoring of high voltage electrical
equipment
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
1
Data mining methods have also been
applied to dissolved gas analysis
(DGA) in power transformers. DGA, as
a diagnostics for power transformers,
has been available for many years.
Methods such as SOM has been
applied to analyze generated data and
to determine trends which are not
obvious to the standard DGA ratio
methods (such as Duval Triangle).
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
1
In this way, data mining can
facilitate institutional
memory.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
Other examples of application of data
mining methods are biomedical data
facilitated by domain ontologies, mining
clinical trial data, and traffic analysis using
SOM.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Science and engineering
In adverse drug reaction surveillance, the
Uppsala Monitoring Centre has, since 1998,
used data mining methods to routinely screen
for reporting patterns indicative of emerging
drug safety issues in the WHO global
database of 4.6 million suspected adverse
drug reaction incidents. Recently, similar
methodology has been developed to mine
large collections of electronic health records
for temporal patterns associating drug
prescriptions to medical diagnoses.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Human rights
Data mining of government records –
particularly records of the justice system
(i.e., courts, prisons) – enables the
discovery of systemic human rights
violations in connection to generation and
publication of invalid or fraudulent legal
records by various government agencies.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Medical data mining
1
In 2011, the case of Sorrell v. IMS Health,
Inc., decided by the Supreme Court of the
United States, ruled that pharmacies may
share information with outside companies.
This practice was authorized under the 1st
Amendment of the Constitution, protecting
the "freedom of speech."
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
So far, data mining and Geographic
Information Systems (GIS) have existed
as two separate technologies, each
with its own methods, traditions, and
approaches to visualization and data
analysis
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
1
Data mining offers great potential benefits
for GIS-based applied decision-making.
Recently, the task of integrating these two
technologies has become of critical
importance, especially as various public
and private sector organizations
possessing huge databases with thematic
and geographically referenced data begin
to realize the huge potential of the
information contained therein. Among
those organizations are:
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
1
offices requiring analysis or dissemination of
geo-referenced statistical data
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
public health
services searching
for explanations of
disease clustering
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
1
environmental agencies assessing the impact of
changing land-use patterns on climate change
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
1
geo-marketing companies doing customer
segmentation based on spatial location.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
1
Challenges in Spatial mining: Geospatial data
repositories tend to be very large
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
1
Developing and supporting geographic
data warehouses (GDW's): Spatial
properties are often reduced to simple
aspatial attributes in mainstream data
warehouses. Creating an integrated
GDW requires solving issues of spatial
and temporal data interoperability –
including differences in semantics,
referencing systems, geometry,
accuracy, and position.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
Geographic data mining methods
should recognize more complex
geographic objects (i.e., lines and
polygons) and relationships (i.e., nonEuclidean distances, direction,
connectivity, and interaction through
attributed geographic space such as
terrain)
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Spatial data mining
Geographic knowledge discovery using
diverse data types: GKD methods should
be developed that can handle diverse data
types beyond the traditional raster and
vector models, including imagery and georeferenced multimedia, as well as dynamic
data types (video streams, animation).
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Sensor data mining
1
By measuring the spatial correlation
between data sampled by different
sensors, a wide class of specialized
algorithms can be developed to
develop more efficient spatial data
mining algorithms.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Visual data mining
1
In the process of turning from analogical
into digital, large data sets have been
generated, collected, and stored
discovering statistical patterns, trends and
information which is hidden in data, in
order to build predictive patterns. Studies
suggest visual data mining is faster and
much more intuitive than is traditional data
mining. See also Computer vision.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Music data mining
1
Data mining techniques, and in particular
co-occurrence analysis, has been used to
discover relevant similarities among music
corpora (radio lists, CD databases) for the
purpose of classifying music into genres in
a more objective manner.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Surveillance
1
Data mining has been
used to fight
terrorism by the U.S
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Surveillance
1
In the context of combating terrorism,
two particularly plausible methods of
data mining are "" and "subject-based
data mining".
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Pattern mining
"Pattern mining" is a data mining
method that involves finding existing
patterns in data. In this context patterns
often means association rules. The
original motivation for searching
association rules came from the desire
to analyze supermarket transaction data,
that is, to examine customer behavior in
terms of the purchased products. For
example, an association rule "beer ⇒
potato chips (80%)" states that four out of
five customers that bought beer also
bought potato chips.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Pattern mining
1
In the context of pattern mining as a tool
to identify terrorist activity, the National
Research Council provides the following
definition: "Pattern-based data mining
looks for patterns (including anomalous
data patterns) that might be associated
with terrorist activity — these patterns
might be regarded as small signals in a
large ocean of noise." Pattern Mining
includes new areas such a Music
Information Retrieval (MIR) where
patterns seen both in the temporal and
non temporal domains are imported to
classical knowledge discovery search
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Subject-based data mining
"Subject-based data mining" is a data
mining method involving the search for
associations between individuals in data.
In the context of combating terrorism,
the National Research Council provides
the following definition: "Subject-based
data mining uses an initiating individual
or other datum that is considered, based
on other information, to be of high
interest, and the goal is to determine
what other persons or financial
transactions or movements, etc., are
related to that initiating datum."
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Knowledge grid
Knowledge discovery "On the Grid"
generally refers to conducting knowledge
discovery in an open environment using
grid computing concepts, allowing users to
integrate data from various online data
sources, as well make use of remote
resources, for executing their data mining
tasks
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Reliability / Validity
1
Data mining can be misused, and can also
unintentionally produce results which
appear significant but which do not
actually predict future behavior and cannot
be reproduced on a new sample of data.
See Data dredging.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
1
In particular, data mining government or
commercial data sets for national security
or law enforcement purposes, such as in
the Total Information Awareness Program
or in ADVISE, has raised privacy
concerns.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
1
This is not data mining per se, but a result
of the preparation of data before – and for
the purposes of – the analysis
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
1
It is recommended that an individual is made
aware of the following before data are
collected:
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
the purpose of the data
collection and any (known) data
mining projects
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
1
how the data will be
used
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
who will be able to
mine the data and
use the data and their
derivatives
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
1
the status of security surrounding
access to the data
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
In America, privacy concerns have
been addressed to some extent by the
US Congress via the passage of
regulatory controls such as the Health
Insurance Portability and
Accountability Act (HIPAA)
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Privacy concerns and ethics
Data may also be modified so as to
become anonymous, so that individuals
may not readily be identified. However,
even "de-identified"/"anonymized" data
sets can potentially contain enough
information to allow identification of
individuals, as occurred when journalists
were able to find several individuals based
on a set of search histories that were
inadvertently released by AOL.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
Carrot2: Text and search
results clustering
framework.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
Chemicalize.org: A chemical
structure miner and web search
engine.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
ELKI: A university research project
with advanced cluster analysis and
outlier detection methods written in
the Java language.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
GATE: a natural language
processing and language
engineering tool.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
KNIME: The Konstanz Information Miner, a user
friendly and comprehensive data analytics
framework.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
ML-Flex: A software package that enables
users to integrate with third-party machinelearning packages written in any
programming language, execute
classification analyses in parallel across
multiple computing nodes, and produce
HTML reports of classification results.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
NLTK (Natural Language Toolkit): A
suite of libraries and programs for
symbolic and statistical natural
language processing (NLP) for the
Python language.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
SenticNet API: A semantic and affective resource for
opinion mining and sentiment analysis.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
Orange: A component-based data
mining and machine learning
software suite written in the Python
language.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
R: A programming language and
software environment for statistical
computing, data mining, and
graphics. It is part of the GNU Project.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
UIMA: The UIMA (Unstructured
Information Management Architecture)
is a component framework for
analyzing unstructured content such as
text, audio and video – originally
developed by IBM.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Free open-source data mining software and applications
1
Weka: A suite of machine learning software
applications written in the Java programming
language.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
1
Angoss KnowledgeSTUDIO: data
mining tool provided by Angoss.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
BIRT Analytics: visual data mining and
predictive analytics tool provided by Actuate
Corporation.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
Clarabridge:
enterprise class text
analytics solution.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
IBM DB2 Intelligent Miner: in-database
data mining platform provided by IBM, with
modeling, scoring and visualization
services based on the SQL/MM - PMML
framework.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
1
LIONsolver: an integrated software
application for data mining, business
intelligence, and modeling that
implements the Learning and
Intelligent OptimizatioN (LION)
approach.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
1
NetOwl: suite of multilingual text and entity analytics
products that enable data mining.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Commercial data-mining software and applications
SAS Enterprise Miner: data
mining software provided by the
SAS Institute.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Marketplace surveys
1
Several researchers and organizations
have conducted reviews of data
mining tools and surveys of data
miners. These identify some of the
strengths and weaknesses of the
software packages. They also provide
an overview of the behaviors,
preferences and views of data miners.
Some of these reports include:
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Marketplace surveys
1
Forrester Research 2010 Predictive Analytics and
Data Mining Solutions report
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Marketplace surveys
1
Gartner 2008 "Magic
Quadrant" report
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Marketplace surveys
1
Haughton et al.'s 2003 Review of Data Mining
Software Packages in The American Statistician
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
M.S. Chen, J. Han, P.S. Yu (1996) "Data
mining: an overview from a database
perspective". Knowledge and data
Engineering, IEEE Transactions on 8
(6), 866-883
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Feldman, Ronen; and Sanger, James;
The Text Mining Handbook, Cambridge
University Press, ISBN 978-0-52183657-9
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Guo, Yike; and Grossman, Robert (editors)
(1999); High Performance Data Mining:
Scaling Algorithms, Applications and
Systems, Kluwer Academic Publishers
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Han, Jiawei, Micheline Kamber, and
Jian Pei. Data mining: concepts and
techniques. Morgan kaufmann, 2006.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Liu, Bing (2007); Web Data Mining:
Exploring Hyperlinks, Contents and
Usage Data, Springer, ISBN 3-54037881-2
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Murphy, Chris (16 May 2011). "Is Data Mining
Free Speech?". InformationWeek (UMB): 12.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Poncelet, Pascal; Masseglia, Florent;
and Teisseire, Maguelonne (editors)
(October 2007); "Data Mining
Patterns: New Methods and
Applications", Information Science
Reference, ISBN 978-1-59904-162-9
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Tan, Pang-Ning; Steinbach, Michael; and
Kumar, Vipin (2005); Introduction to Data
Mining, ISBN 0-321-32136-7
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Theodoridis, Sergios; and Koutroumbas,
Konstantinos (2009); Pattern Recognition,
4th Edition, Academic Press, ISBN 978-159749-272-0
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Weiss, Sholom M.; and Indurkhya, Nitin (1998);
Predictive Data Mining, Morgan Kaufmann
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Witten, Ian H.; Frank, Eibe; Hall, Mark A.
(30 January 2011). Data Mining:
Practical Machine Learning Tools and
Techniques (3 ed.). Elsevier. ISBN 9780-12-374856-0. (See also Free Weka
software)
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining Further reading
1
Ye, Nong (2003); The
Handbook of Data
Mining, Mahwah, NJ:
Lawrence Erlbaum
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions
1
Data Mining Extensions (DMX) is a query
language for Data Mining Models
supported by Microsoft's SQL Server
Analysis Services product.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions
1
DMX is used to create and train data mining
models, and to browse, manage, and predict
against them
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - DMX Queries
1
DMX Queries are formulated using the
SELECT statement. They can extract
information from existing data mining
models in various ways.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Data Definition Language
The Data Definition
Language (DDL) part
of DMX can be used
to
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Data Definition Language
1
Create new data mining models and mining
structures - CREATE MINING STRUCTURE,
CREATE MINING MODEL
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Data Definition Language
1
Delete existing data mining models and mining
structures - DROP MINING STRUCTURE, DROP
MINING MODEL
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Data Definition Language
Export and import mining
structures - EXPORT, IMPORT
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Data Manipulation Language
The Data
Manipulation
Language (DML)
part of DMX can be
used to
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Data Manipulation Language
Make predictions using
mining model - SELECT ...
FROM PREDICTION JOIN
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Example: a prediction query
1
This example is a singleton prediction
query, which predicts for the given
customer whether she will be
interested in home loan products.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Example: a prediction query
1
NATURAL PREDICTION JOIN
https://store.theartofservice.com/the-data-mining-toolkit.html
Data Mining Extensions - Example: a prediction query
1
18 AS [Total Years of Education]
https://store.theartofservice.com/the-data-mining-toolkit.html
OAuth - Abuse of OAuth for Internet data mining
A growing number of social
networking services promote OAuth
logins to the dominant social networks
(Facebook, Twitter, etc.) as the
primary authentication method, over
"traditional" email confirmation type
processes
1
https://store.theartofservice.com/the-data-mining-toolkit.html
OAuth - Abuse of OAuth for Internet data mining
The use of OAuth logins to social
networks for "authentication" permits
the application provider to
legitimately circumvent the often
significant restrictions on API use put
in place by social network providers
to prevent large-scale data extraction
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Social networking service - Data mining
1
Through data mining, companies are able to
improve their sales and profitability
https://store.theartofservice.com/the-data-mining-toolkit.html
United States Department of Homeland Security - Data mining (ADVISE)
The Associated Press reported on
September 5, 2007, that DHS had
scrapped an anti-terrorism data mining
tool called ADVISE (Analysis,
Dissemination, Visualization, Insight and
Semantic Enhancement) after the
agency's Privacy Office and Office of
Inspector General (OIG) found that pilot
testing of the system had been performed
using data on real people without having
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Multitenancy - Data aggregation/data mining
1
One of the most compelling reasons
for vendors/ISVs to utilize
multitenancy is for the inherent data
aggregation benefits
https://store.theartofservice.com/the-data-mining-toolkit.html
Machine learning - Machine learning and data mining
1
These two terms are commonly confused,
as they often employ the same methods
and overlap significantly. They can be
roughly defined as follows:
https://store.theartofservice.com/the-data-mining-toolkit.html
Machine learning - Machine learning and data mining
1
Machine learning focuses on prediction, based on
known properties learned from the training data.
https://store.theartofservice.com/the-data-mining-toolkit.html
Machine learning - Machine learning and data mining
1
Data mining focuses on the discovery of
(previously) unknown properties in the
data. This is the analysis step of
Knowledge Discovery in Databases.
https://store.theartofservice.com/the-data-mining-toolkit.html
Machine learning - Machine learning and data mining
1
Much of the confusion between these two
research communities (which do often
have separate conferences and separate
journals, ECML PKDD being a major
exception) comes from the basic
assumptions they work with: in machine
learning, performance is usually evaluated
with respect to the ability to reproduce
known knowledge, while in Knowledge
Discovery and Data Mining (KDD) the key
https://store.theartofservice.com/the-data-mining-toolkit.html
Surveillance - Data mining and profiling
1
Data mining is the application of statistical
techniques and programmatic algorithms
to discover previously unnoticed
relationships within the data.
https://store.theartofservice.com/the-data-mining-toolkit.html
Surveillance - Data mining and profiling
Economic (such as Creditcard
purchases) and social (such as
telephone calls and emails)
transactions in modern society create
large amounts of stored data and
records. In the past, this data was
documented in paper records, leaving a
paper trail, or was simply not
documented at all. Correlation of paperbased records was a laborious
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Surveillance - Data mining and profiling
1
But today many of these records are electronic,
resulting in an electronic trail
https://store.theartofservice.com/the-data-mining-toolkit.html
Surveillance - Data mining and profiling
1
Information relating to many of these
individual transactions is often easily
available because it is generally not
guarded in isolation, since the
information, such as the title of a
movie a person has rented, might not
seem sensitive
https://store.theartofservice.com/the-data-mining-toolkit.html
Surveillance - Data mining and profiling
1
In addition to its own aggregation and
profiling tools, the government is able
to access information from third
parties— for example, banks, credit
companies or employers, etc.— by
requesting access informally, by
compelling access through the use of
subpoenas or other procedures, or by
purchasing data from commercial
data aggregators or data brokers
https://store.theartofservice.com/the-data-mining-toolkit.html
Surveillance - Data mining and profiling
Under
[http://caselaw.lp.findlaw.com/scripts/getca
se.pl?court=usvol=425invol=435 United
States v. Miller] (1976), data held by third
parties is generally not subject to Fourth
Amendment to the United States
Constitution|Fourth Amendment warrant
requirements.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
There have been some concerns
expressed regarding the use of Facebook
as a means of surveillance and data
mining
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
The possibility of data mining by private
individuals unaffiliated with Facebook has
been a concern, as evidenced by the fact
that two Massachusetts Institute of
Technology (MIT) students were able to
download, using an automated script, over
70,000 Facebook profiles from four
schools (MIT, NYU, the University of
Oklahoma, and Harvard University) as part
of a research project on Facebook privacy
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
1
A second clause that brought criticism
from some users allowed Facebook
the right to sell users' data to private
companies, stating We may share your
information with third parties,
including responsible companies with
which we have a relationship. This
concern was addressed by spokesman
Chris Hughes, who said Simply put,
we have never provided our users'
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
Previously, third party applications had
access to almost all user information.
Facebook's privacy policy previously
stated: Facebook does not screen or
approve Platform Developers and cannot
control how such Platform Developers use
any personal information. However, that
language has since been removed.
Regarding use of user data by third party
applications, the 'Preapproved Third-Party
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
In the United Kingdom, the Trades
Union Congress (TUC) has encouraged
employers to allow their staff to
access Facebook and other socialnetworking sites from work, provided
they proceed with caution.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
1
In September 2007, Facebook drew a
fresh round of criticism after it began
allowing non-members to search for
users, with the intent of opening
limited public profiles up to search
engines such as Google in the
following months. Facebook's privacy
settings, however, allow users to block
their profiles from search engines.
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
Concerns were also raised on the
Watchdog (TV series)|BBC's Watchdog
program in October 2007 when Facebook
was shown to be an easy way in which to
collect an individual's personal information
in order to facilitate identity theft. However,
there is barely any personal information
presented to non-friends - if users leave
the privacy controls on their default
settings, the only personal information
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
1
A New York Times article in February 2008
pointed out that Facebook does not
actually provide a mechanism for users to
close their accounts, and raised the
concern that private user data would
remain indefinitely on Facebook's servers.
, Facebook gives users the options to
deactivate or delete their accounts.
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
1
Deactivating an account allows it to be
restored later, while deleting it will
remove the account permanently,
although some data submitted by that
account (like posting to a group or
sending someone a message) will
remain.
https://store.theartofservice.com/the-data-mining-toolkit.html
Criticism of Facebook - Data mining
A third party site, uSocial, was
involved in a controversy surrounding
the sale of fans and friends. uSocial
received a cease-and-desist letter
from Facebook and has stopped
selling friends.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data visualization - Data mining
Data mining is the process of sorting
through large amounts of data and picking
out relevant information. It is usually used
by business intelligence organizations,
and financial analysts, but is increasingly
being used in the sciences to extract
information from the enormous data sets
generated by modern experimental and
observational methods.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data visualization - Data mining
It has been described as the nontrivial
extraction of implicit, previously unknown,
and potentially useful information from
data and the science of extracting useful
information from large data sets or
databases. In relation to enterprise
resource planning, according to Monk
(2006), data mining is the statistical and
logical analysis of large sets of transaction
data, looking for patterns that can aid
decision making.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Mass surveillance in the United States - Data mining of subpoenaed records
The Federal Bureau of
Investigation|FBI collected nearly all
hotel, airline, rental car, gift shop,
and casino records in Las Vegas,
Nevada|Las Vegas during the last two
weeks of 2003
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining
It provides means for the creation,
management and operational deployment
of data mining models inside the database
environment.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Overview
1
These operations include functions to Data
Definition Language|create, apply, Test
method|test, and Data
manipulation|manipulate data mining
models
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Overview
1
In data mining, the process of using a
model to derive predictions or
descriptions of behavior that is yet to
occur is called scoring
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Overview
1
Most Oracle Data Mining functions
also allow text mining by accepting
Text (unstructured data) attributes as
input
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - History
1
Oracle Data Mining was first introduced in
2002 and its releases are named
according to the corresponding Oracle
database release:
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - History
1
* Oracle Data Mining 10gR1
(10.1.0.2.0 - February 2004)
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - History
1
* Oracle Data Mining 10gR2
(10.2.0.1.0 - July 2005)
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - History
1
Oracle Data Mining is a logical successor
of the Darwin data mining toolset
developed by Thinking Machines
Corporation in the mid-1990s and later
distributed by Oracle after its acquisition of
Thinking Machines in 1999. However, the
product itself
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - History
is a Rewrite (programming)|complete
redesign and rewrite from ground-up while Darwin was a classic GUI-based
analytical workbench, ODM offers a data
mining development/deployment platform
integrated into the Oracle database, along
with the Oracle Data Miner GUI.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - History
The Oracle Data Miner 11gR2 New
Workflow GUI was previewed at Oracle
Open World 2009. An updated Oracle
Data Miner GUI was released in 2012. It
is free, and is available as an extension to
Oracle SQL Developer 3.1 .
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
As of release 11gR1 Oracle Data Mining contains
the following data mining functions:
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
** Model exploration,
evaluation and
analysis.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
* Feature selection
(Attribute Importance).
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
** Support Vector
Machine (SVM).
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
** One-class Support Vector
Machine (SVM).
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
** Generalized linear
model (GLM) for
Multiple regression
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
** Orthogonal Partitioning
Clustering (O-Cluster).
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
* Association rule learning:
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
** Itemsets and
association rules (AM).
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
* Feature extraction.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Functionality
1
** Combined text and nontext columns of input data.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Input sources and data preparation
1
Most Oracle Data Mining functions
accept as input one relational table or
view. Flat data can be combined with
transactional data through the use of
nested columns, enabling mining of
data involving one-to-many
relationships (e.g. a star schema). The
full functionality of SQL can be used
when preparing data for data mining,
including dates and spatial data.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Input sources and data preparation
Oracle Data Mining distinguishes
numerical, categorical, and
unstructured (text) attributes. The
product also provides utilities for data
preparation steps prior to model
building such as outlier treatment,
discretization, Database
normalization|normalization and
binning (sorting in general speak)
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Graphical user interface: Oracle Data Miner
There is also an independent
interface: the Spreadsheet Add-In for
Predictive Analytics which enables
access to the Oracle Data Mining
Predictive Analytics PL/SQL package
from Microsoft Excel.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - PL/SQL and Java interfaces
Oracle Data Mining provides a native
PL/SQL package (DBMS_DATA_MINING)
to create, destroy, describe, apply, test,
export and import models. The code below
illustrates a typical call to build a Statistical
classification|classification model:
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - PMML
In Release 11gR2 (11.2.0.2), ODM
supports the import of externally-created
PMML for some of the data mining
models. PMML is an XML-based standard
for representing data mining models.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - Predictive Analytics MS Excel Add-In
The PL/SQL package
DBMS_PREDICTIVE_ANALYTICS
automates the data mining process
including data preprocessing, model
building and evaluation, and scoring of
new data
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
* T. H. Davenport, [
http://www.lbl.gov/BLI/BLI_Library/
assets/articles/OM/OM_PSDM_Com
peting_Analytics.pdf Competing on
Analytics], Harvard Business Review,
January 2006.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
* I. Ben-Gal,[
http://www.eng.tau.ac.il/~bengal/outlier.pdf
Outlier detection], In: Maimon O. and
Rockach L. (Eds.) Data Mining and
Knowledge Discovery Handbook: A
Complete Guide for Practitioners and
Researchers, Kluwer Academic
Publishers, 2005, ISBN 0-387-24435-2.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
1
* M. M. Campos, P. J. Stengard, and B.
L. Milenova, Data-centric Automated
Data Mining. In proceedings of the
Fourth International Conference on
Machine Learning and Applications
2005, 15–17 December 2005. pp8, ISBN
0-7695-2495-8
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
1
* M. F. Hornick, Erik Marcade, and Sunil
Venkayala. Java Data Mining: Strategy,
Standard, and Practice. MorganKaufmann, 2006, ISBN 0-12-370452-9.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
1
* B. L. Milenova, J. S. Yarmus, and M.
M. Campos. SVM in Oracle database
10g: removing the barriers to
widespread adoption of support vector
machines. In Proceedings of the 31st
international Conference on Very
Large Data Bases (Trondheim,
Norway, August 30 - September 2,
2005). pp1152–1163, ISBN 1-59593-1546.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
1
* B. L. Milenova and M. M. Campos. OCluster: scalable clustering of large
high dimensional data sets. In
proceedings of the 2002 IEEE
International Conference on Data
Mining: ICDM 2002. pp290–297, ISBN 07695-1754-4.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
1
* P. Tamayo, C. Berger, M. M. Campos, J.
S. Yarmus, B. L.Milenova, A. Mozes, M.
Taft, M. Hornick, R. Krishnan, S.Thomas,
M. Kelly, D. Mukhin, R. Haberstroh, S.
Stephens and J. Myczkowski. Oracle Data
Mining - Data Mining in the Database
Environment. In Part VII of Data Mining
and Knowledge Discovery Handbook,
Maimon, O.; Rokach, L. (Eds.) 2005,
p315-1329, ISBN 0-387-24435-2.
https://store.theartofservice.com/the-data-mining-toolkit.html
Oracle Data Mining - References and further reading
* Brendan Tierney, Predictive
Analytics using Oracle Data Miner:
for the data scientist, oracle analyst,
oracle developer DBA, Oracle Press,
McGraw Hill, Spring 2014.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Computational sociology - Data mining and social network analysis
1
Independent from developments in
computational models of social
systems, social network analysis
emerged in the 1970s and 1980s from
advances in graph theory, statistics,
and studies of social structure as a
distinct analytical method and was
articulated and employed by
sociologists like James Samuel
Coleman|James S
https://store.theartofservice.com/the-data-mining-toolkit.html
Department of Homeland Security - Data mining (ADVISE)
1
found that Pilot (experiment)|pilot testing
of the system had been performed using
data on real people without having done a
Privacy Impact Assessment, a required
privacy safeguard for the various uses of
real personally identifiable information
required by section 208 of the eGovernment Act of 2002
https://store.theartofservice.com/the-data-mining-toolkit.html
List of free and open-source software packages - Data mining
1
* Environment for DeveLoping KDDApplications Supported by IndexStructures|Environment for
DeveLoping KDD-Applications
Supported by Index-Structures (ELKI)
— data mining software framework
written in Java with a focus on
clustering and outlier detection
methods.
https://store.theartofservice.com/the-data-mining-toolkit.html
List of free and open-source software packages - Data mining
1
* Orange (software) — data visualization
and data mining for novice and experts,
through visual programming or Python
scripting. Extensions for bioinformatics
and text mining.
https://store.theartofservice.com/the-data-mining-toolkit.html
List of free and open-source software packages - Data mining
1
* RapidMiner — data mining software
written in Java, fully integrating Weka,
featuring 350+ operators for
preprocessing, machine learning,
visualization, etc.
https://store.theartofservice.com/the-data-mining-toolkit.html
List of free and open-source software packages - Data mining
* Scriptella|Scriptella ETL — Extract
transform load|ETL (Extract-TransformLoad) and script execution tool. Supports
integration with J2EE and Spring. Provides
connectors to CSV, LDAP, XML,
JDBC/ODBC and other data sources.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
List of free and open-source software packages - Data mining
1
* Weka (machine learning)|Weka — data
mining software written in Java featuring
machine learning operators for
classification, regression, and clustering.
https://store.theartofservice.com/the-data-mining-toolkit.html
List of open-source software packages - Data mining
1
* OpenNN — Open source neural networks software
library written in the C++ programming language.
https://store.theartofservice.com/the-data-mining-toolkit.html
Learning analytics - Differentiating Learning Analytics and Educational Data Mining
They go on to attempt to disambiguate
educational data mining from academic
analytics based on whether the process is
hypothesis driven or not, though Brooks C
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Learning analytics - Differentiating Learning Analytics and Educational Data Mining
1
Regardless of the differences between
the LA and EDM communities, the two
areas have significant overlap both in
the objectives of investigators as well
as in the methods and techniques that
are used in the investigation.
https://store.theartofservice.com/the-data-mining-toolkit.html
Customer analytics - Data mining
There are two types of categories of
data mining. Predictive models use
previous customer interactions to
predict future events while
segmentation techniques are used to
place customers with similar
behaviors and attributes into distinct
groups. This grouping can help
marketers to optimize their campaign
management and targeting processes.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining
'SIGKDD' is the Association for
Computing Machinery's Association for
Computing Machinery#Special Interest
Groups|Special Interest Group on
Knowledge Discovery and Data Mining. It
became an official ACM SIG in 1998. The
official web page of SIGKDD can be found
on www.KDD.org.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Conferences
SIGKDD has hosted an annual
conference - 'ACM SIGKDD Conference
on Knowledge Discovery and Data
Mining' ('KDD') - since 1995. KDD
Conferences grew from KDD
(Knowledge Discovery and Data
Mining) workshops at AAAI
conferences, which were started by
Wikipedia:Gregory I. PiatetskyShapiro|Gregory Piatetsky-Shapiro in
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Conferences
1
http://www.sigkdd.org/conferences.p
hp Conference papers of each
Proceedings of the SIGKDD
International Conference on
Knowledge Discovery and Data
Mining are published through
Association for Computing
Machinery|ACMhttp://dl.acm.org/e
vent.cfm?id=RE329
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Conferences
KDD-2012 took place in Beijing,
China,http://kdd2012.sigkdd.org/
KDD-2013 took place in Chicago,
USA., and KDD-2014 will take place in
New York City, USA., August 24–27,
2014. Here is a full list of past KDD
meetings.http://www.kdnuggets.com
/meetings/past-meetings-kdd.html
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - KDD-Cup
SIGKDD sponsors the
[http://www.kdd.org/kddcup/ KDD
Cup] competition every year in
conjunction with the annual
conference. It is aimed at members of
the industry and academia,
particularly students, interested in
KDD.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Awards
The group also annually recognizes
members of the KDD community with its
[http://www.kdd.org/sigkdd-innovationaward Innovation Award] and
[http://www.kdd.org/innovation-serviceawards Service Award]. Additionally, KDD
presents a Best Paper Award to recognize
the highest quality paper at each
conference.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - SIGKDD Explorations
SIGKDD has also published a
biannual academic journal titled
[http://www.kdd.org/explorations/
SIGKDD Explorations] since June,
1999.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Leadership
The new SIGKDD
leadership team took
office on July 1, 2013
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Leadership
* Wikipedia:Gregory I. PiatetskyShapiro|Gregory PiatetskyShapirohttp://www.kdnuggets.com/g
ps.html (2005-2008)
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Leadership
* David D.
Jensenhttp://kdl.cs.uma
ss.edu/people/jensen/
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Conference on Knowledge Discovery and Data Mining - Information Directors
*
[http://faculty.washi
ngton.edu/ankurt/
Ankur Teredesai]
(2011-)
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Quantitative structure–activity relationship - Data mining approach
Computer SAR models typically
calculate a relatively large number of
features. Because those lack
structural interpretation ability, the
preprocessing steps face a feature
selection problem (i.e., which
structural features should be
interpreted to determine the
structure-activity relationship).
Feature selection can be
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Quantitative structure–activity relationship - Data mining approach
A typical data mining based
prediction uses e.g. support vector
machines, decision trees, neural
networks for inductive
reasoning|inducing a predictive
learning model.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Quantitative structure–activity relationship - Data mining approach
Molecule mining approaches, a special
case of structured data mining
approaches, apply a similarity matrix
based prediction or an automatic
fragmentation scheme into molecular
substructures. Furthermore there exist
also approaches using Maximum common
subgraph isomorphism problem|maximum
common subgraph searches or graph
kernels.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology
1
Meteorology is the interdisciplinary
scientific study of the atmosphere. It
observes the changes in temperature,
air pressure, moisture and wind
direction. Usually, temperature,
pressure, wind measurements and
humidity are the variables that are
measured by a thermometer,
barometer, anemometer, and
hygrometer, respectively. There are
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology
1
Weather forecasts are made by collecting
quantitative data about the current state of
the atmosphere. The main issue arise in
this prediction is, it involves highdimensional characters. To overcome this
issue, it is necessary to first analyze and
simplify the data before proceeding with
other analysis. Some data mining
techniques are appropriate in this context.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - What is Data mining?
Consequently, data mining consists of
more than collecting and analyzing data, it
also includes analyze and predictions
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - What is Data mining?
1
The network architecture and signal
process used to model nervous
systems can roughly be divided into
three categories, each based on a
different philosophy.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - What is Data mining?
1
#Feedforward neural network: the input
information defines the initial signals into
set of output signals.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - What is Data mining?
1
#Feedback network: the input information
defines the initial activity state of a
feedback system, and after state
transitions, the asymptotic final state is
identified as the outcome of the
computation.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - What is Data mining?
1
#Neighboring cells in a neural network
compete in their activities by means of
mutual lateral interactions, and develop
adaptively into specific detectors of
different signal patterns. In this category,
learning is called competitive,
unsupervised learning or self-organizing.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - Self-organizing Maps
1
Self-Organizing Map (SOM) is one of the
most popular neural network models,
which is especially suitable for high
dimensional data visualization, clustering
and modeling
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - Self-organizing Maps
1
The Self-Organizing Map projects highdimensional input data onto a low
dimensional (usually two-dimensional)
space
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in meteorology - Self-organizing Maps
1
According to the first input of the input
vector, System chooses the output
neuron (winning neuron) that closely
matches with the given input vector
https://store.theartofservice.com/the-data-mining-toolkit.html
Police-enforced ANPR in the UK - Data mining
1
A major feature of the National ANPR Data
Centre for car numbers is the ability to
data mining|data mine. Advanced versatile
automated data mining software trawls
through the vast amounts of data
collected, finding patterns and meaning in
the data. Data mining can be used on the
records of previous sightings to build up
intelligence of a vehicle's movements on
the road network or can be used to find
https://store.theartofservice.com/the-data-mining-toolkit.html
Police-enforced ANPR in the UK - Data mining
1
We can use ANPR on investigations or we can
use it looking forward in a proactive,
intelligence way
https://store.theartofservice.com/the-data-mining-toolkit.html
Multifactor dimensionality reduction - Data mining with MDR
Another approach is to generate many
random permutations of the data to see
what the data mining algorithm finds when
given the chance to overfit
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining
1
Baker (2010) Data Mining for
Education
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Definition
Educational Data Mining refers to
techniques, tools, and research designed
for automatically extracting meaning from
large repositories of data generated by or
related to people's learning activities in
educational settings
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Definition
1
In other cases, the data
is less fine-grained
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - History
1
Educational Data Mining:
A Review of the State-ofthe-Art
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - History
As interest in EDM continued to
increase, EDM researchers
established an academic journal in
2009, the
[http://www.educationaldatamining.o
rg/JEDM/ Journal of Educational Data
Mining], for sharing and
disseminating research results. In
2011, EDM researchers established
the
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - History
With the introduction of public
educational data repositories in 2008,
such as the Pittsburgh Science of
Learning Centre’s (PSLC) DataShop
and the National Center for Education
Statistics (NCES), public data sets
have made educational data mining
more accessible and feasible,
contributing to its growth.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Goals
Baker and Yacef
identified the following
four goals of EDM:
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Goals
1
#'Predicting students' future learning
behavior' – With the use of student
modeling, this goal can be achieved
by creating student models that
incorporate the learner’s
characteristics, including detailed
information such as their knowledge,
behaviours and motivation to learn.
The user experience of the learner
and their overall
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Goals
1
#'Discovering or improving domain
models' – Through the various
methods and applications of EDM,
discovery of new and improvements to
existing models is possible. Examples
include illustrating the educational
content to engage learners and
determining optimal instructional
sequences to support the student’s
learning style.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Goals
1
#'Studying the effects of educational support'
that can be achieved through learning systems.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Goals
#'Advancing scientific knowledge
about learning and learners' by
building and incorporating student
models, the field of EDM research and
the technology and software used.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Users and Stakeholders
1
There are four main users and stakeholders
involved with educational data mining. These
include:
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Users and Stakeholders
JEDM-Journal of
Educational Data Mining 5.2
(2013): 102-126.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Users and Stakeholders
1
* 'Educators' - Educators attempt to
understand the learning process and
the methods they can use to improve
their teaching methods
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Users and Stakeholders
* 'Researchers' - Researchers focus on
the development and the evaluation of
data mining techniques for effectiveness. A
yearly international conference for
researchers began in 2008, followed by
the establishment of the
[http://www.educationaldatamining.org/JE
DM/index.php/JEDM Journal of
Educational Data Mining] in 2009. The
wide range of topics in EDM ranges from
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Users and Stakeholders
* 'Administrator
(business)|Administrators' Administrators are responsible for
allocating the resources for
implementation in institutions
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Phases of Educational Data Mining
1
As research in the field of educational
data mining has continued to grow, a
myriad of data mining techniques have
been applied to a variety of educational
contexts. In each case, the goal is to
translate raw data into meaningful
information about the learning process
in order to make better decisions about
the design and trajectory of a learning
environment. Thus, EDM generally
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Phases of Educational Data Mining
1
# The first phase of the EDM process (not
counting pre-processing) is discovering
relationships in data
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Phases of Educational Data Mining
1
# Discovered relationships must then
be Validity (statistics)|validated in
order to avoid overfitting.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Phases of Educational Data Mining
1
# Validated relationships are applied to
make predictions about future events in
the learning environment.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Phases of Educational Data Mining
1
# Predictions are used to support decisionmaking processes and policy decisions.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Phases of Educational Data Mining
During phases 3 and 4, data is often
visualized or in some other way distilled
for human judgment. A large amount of
research has been conducted in best
practices for Data visualization|visualizing
data.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Main Approaches
Of the general categories of methods
mentioned, prediction, Cluster
analysis|clustering and relationship mining
are considered universal methods across
all types of data mining; however,
'Discovery with Models' and 'Distillation of
Data for Human Judgment' are considered
more prominent approaches within
educational data mining.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Discovery with Models
1
In the Discovery with Model method, a
model is developed via prediction,
clustering or by human reasoning
knowledge engineering and then used as
a component in another analysis, namely
in prediction and relationship mining
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Discovery with Models
Key applications of this method
include discovering relationships
between student behaviors,
characteristics and contextual
variables in the learning environment.
Further discovery of broad and
specific research questions across a
wide range of contexts can also be
explored using this method.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Distillation of Data for Human Judgment
1
Humans can make inferences about
data that may be beyond the scope in
which an automated data mining
method provides. For the use of
education data mining, data is
distilled for human judgment for two
key purposes, Identification
(information)|identification and
Statistical
classification|classification.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Distillation of Data for Human Judgment
For the purpose of Identification
(information)|identification, data is
distilled to enable humans to identify
well-known patterns, which may
otherwise be difficult to interpret. For
example, the learning curve, classic
to educational studies, is a pattern
that clearly reflects the relationship
between learning and experience over
time.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Distillation of Data for Human Judgment
1
Data is also distilled for the purposes
of Statistical classification|classifying
features of data, which for educational
data mining, is used to support the
development of the prediction model.
Classification helps expedite the
development of the prediction model,
tremendously.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Distillation of Data for Human Judgment
The goal of this method is to
summarize and present the information
in a useful, interactive and visually
appealing way in order to understand
the large amounts of education data
and to support decision making
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
A list of the primary applications of EDM is
provided by Cristobal Romero and
Sebastian Ventura. In their taxonomy, the
areas of EDM application are:
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
* Providing feedback
for supporting
instructors
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
* Recommendations for students
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
* Predicting student
performance
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
* Detecting undesirable student
behaviors
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
* Constructing courseware - EDM can be
applied to course management systems
such as open source Moodle. Moodle
contains usage data that includes various
activities by users such as test results,
amount of readings completed and
participation in discussion forums. Data
mining tools can be used to customize
learning activities for each user and adapt
the pace in which the student completes
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
1
New research on Mobile phone|mobile
learning environments also suggests
that data mining can be useful. Data
mining can be used to help provide
personalized content to mobile users,
despite the differences in managing
content between mobile devices and
standard Personal computer|PCs and
web browsers.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Applications
New EDM applications will focus on
allowing non-technical users use and
engage in data mining tools and activities,
making data collection and processing
more accessible for all users of EDM.
Examples include statistical and
visualization tools that analyzes social
networks and their influence on learning
outcomes and productivity.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Courses
In October 2013, Coursera offered a
free online course on “Big Data in
Education” that teaches how and when
to use key methods for EDM. A course
archive is now available online.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Courses
1
Teachers College, Columbia University
offers a Learning Analytics focus as
part of its Cognitive Studies Masters.
http://catalog.tc.columbia.edu/tc/depart
ments/humandevelopment/cognitivestu
diesineducation/
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
1
Considerable amounts of EDM work are
published at the peer-reviewed
International Conference on Educational
Data Mining, organized by the
[http://www.educationaldatamining.org/
International Educational Data Mining
Society].
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
*
[http://www.educationaldatamining.org/ED
M2008 1st International Conference on
Educational Data Mining] (2008) -Montreal, Canada
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
*
[http://www.educationaldatamining.org/
EDM2009 2nd International Conference
on Educational Data Mining] (2009) -Cordoba, Spain
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
*
[http://www.educationaldatamining.o
rg/EDM2010 3rd International
Conference on Educational Data
Mining] (2010) -- Pittsburgh, USA
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
*
[http://www.educationaldatamining.o
rg/EDM2011 4th International
Conference on Educational Data
Mining] (2011) -- Eindhoven,
Netherlands
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
*
[http://www.educationaldatamining.org/ED
M2012 5th International Conference on
Educational Data Mining] (2012) -- Chania,
Greece
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
*
[http://www.educationaldatamining.org/ED
M2013 6th International Conference on
Educational Data Mining] (2013) -Memphis, USA
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
EDM papers are also published in the
[http://www.educationaldatamining.org/JE
DM/ Journal of Educational Data Mining]
(JEDM).
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
1
Many EDM papers are routinely published
in related conferences, such as Artificial
Intelligence and Education, Intelligent
Tutoring Systems, and User Modeling and
Adaptive Personalization.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Publication Venues
1
In 2011, Chapman Hall/CRC Press,
Taylor and Francis Group published
the first Handbook of Educational
Data Mining. This resource was
created for those that are interested in
participating in the educational data
mining community.
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Contests
In 2010, the Association for Computing
Machinery's
[http://www.kdd.org/kdd2010/kddcup.shtml
KDD Cup] was conducted using data from
an educational setting
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Costs and Challenges
1
Along with technological advancements
are costs and challenges associated with
implementing EDM applications
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Criticisms
1
Research also indicates that the field
of educational data mining is
concentrated in North America and
western cultures and subsequently,
other countries and cultures may not
be represented in the research and
findings
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Criticisms
As users become savvy in their
understanding of online privacy,
Business
Administrator|administrators of
educational data mining tools need to
be proactive in protecting the privacy
of their users and be transparent about
how and with whom the information
will be used and shared
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Criticisms
1
* 'Plagiarism' - Plagiarism detection is an
ongoing challenge for educators and
faculty whether in the classroom or online.
However, due to the complexities
associated with detecting and preventing
digital plagiarism in particular, educational
data mining tools are not currently
sophisticated enough to accurately
address this issue. Thus, the development
of predictive capability in plagiarismhttps://store.theartofservice.com/the-data-mining-toolkit.html
Educational data mining - Criticisms
* 'Adoption' - It is unknown how
widespread the adoption of EDM is and
the extent to which institutions have
applied and considered implementing an
EDM strategy. As such, it is unclear
whether there are any barriers that prevent
users from adopting EDM in their
educational settings.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Java Data Mining
1
JDM enables applications to integrate
data mining technology for
developing predictive analytics
applications and tools
https://store.theartofservice.com/the-data-mining-toolkit.html
Java Data Mining
Various data mining functions and
techniques like statistical classification and
association (statistics)|association,
regression analysis, data clustering, and
attribute importance are covered by the
1.0 release of this standard.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining
1
In Proceedings of the IADIS European Conference
on Data Mining 2008, pp 182-185.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
1
The lessons learned during the process
can trigger new, often more focused
business questions and subsequent data
mining processes will benefit from the
experiences of previous ones.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
1
;Business Understanding: This initial
phase focuses on understanding the
project objectives and requirements
from a business perspective, and then
converting this knowledge into a data
mining problem definition, and a
preliminary plan designed to achieve
the objectives.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
;Data Understanding: The data
understanding phase starts with an initial
data collection and proceeds with activities
in order to get familiar with the data, to
identify data quality problems, to discover
first insights into the data, or to detect
interesting subsets to form hypotheses for
hidden information.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
1
;Data Preparation: The data preparation
phase covers all activities to construct
the final dataset (data that will be fed
into the modeling tool(s)) from the
initial raw data. Data preparation tasks
are likely to be performed multiple
times, and not in any prescribed order.
Tasks include table, record, and
attribute selection as well as
transformation and cleaning of data for
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
1
;Modeling: In this phase, various modeling
techniques are selected and applied, and
their parameters are calibrated to optimal
values. Typically, there are several
techniques for the same data mining
problem type. Some techniques have
specific requirements on the form of data.
Therefore, stepping back to the data
preparation phase is often needed.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
1
At the end of this phase, a decision on the use of
the data mining results should be reached.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - Major phases
1
Depending on the requirements, the
deployment phase can be as simple as
generating a report or as complex as
implementing a repeatable data
mining process
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - History
1
CRISP-DM was conceived in 1996. In
1997 it got underway as a European
Union project under the European
Strategic Program on Research in
Information Technology|ESPRIT
funding initiative. The project was led
by five companies: SPSS Inc.|SPSS,
Teradata, Daimler AG, NCR
Corporation and OHRA, an insurance
company.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - History
1
This core consortium brought different
experiences to the project: ISL, later
acquired and merged into SPSS Inc.
The computer giant NCR Corporation
produced the Teradata data warehouse
and its own data mining software.
Daimler-Benz had a significant data
mining team. OHRA was just starting to
explore the potential use of data
mining.
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - History
and published as a step-by-step data
mining guide later that year.Pete
Chapman, Julian Clinton, Randy Kerber,
Thomas Khabaza, Thomas Reinartz, Colin
Shearer, and Rüdiger Wirth (2000);
[ftp://ftp.software.ibm.com/software/analyti
cs/spss/support/Modeler/Documentation/1
4/UserManual/CRISP-DM.pdf CRISP-DM
1.0 Step-by-step data mining guides].
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - History
1
Between 2006 and 2008 a CRISP-DM 2.0
SIG was formed and there were
discussions about updating the CRISP-DM
process model.Colin Shearer (2006);
[http://www.kdnuggets.com/news/2006/n1
9/4i.html First CRISP-DM 2.0 Workshop
Held] The current status of these efforts is
not known. However, the original crispdm.org website cited in the reviews, and
the CRISP-DM 2.0 SIG website are both
https://store.theartofservice.com/the-data-mining-toolkit.html
Cross Industry Standard Process for Data Mining - History
While many non-IBM data mining
practitioners use CRISP-DM, IBM is the
primary corporation that currently
embraces the CRISP-DM process model.
It makes some of the old CRISP-DM
documents available for download and it
has incorporated it into its SPSS Modeler
product.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture
1
'Data mining in agriculture' is a very
recent research topic. It consists in
the application of data mining
techniques to agriculture. Recent
technologies are nowadays able to
provide a lot of information on
agricultural-related activities, which
can then be analyzed in order to find
important information. A related, but
not equivalent term is precision
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Prediction of problematic wine fermentations
1
Wine is widely produced all
around the world
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Detection of diseases from sounds issued by animals
1
The detection of animal's diseases in
farms can impact positively the
productivity of the farm, because sick
animals can cause contaminations
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Sorting apples by watercores
For this reason, a computational
system is under study which takes Xray photographs of the fruit while they
run on conveyor belts, and which is
also able to analyse (by data mining
techniques) the taken pictures and
estimate the probability that the fruit
contains watercores.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Optimizing pesticide use by data mining
By data mining the cotton Pest
Scouting data along with the
meteorological recordings it was
shown that how pesticide use can be
optimized (reduced)
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Explaining pesticide abuse by data mining
Creating a novel Pilot Agriculture
Extension Data Warehouse followed by
analysis through querying and data
mining some interesting discoveries were
made, such as pesticides sprayed at the
wrong time, wrong pesticides used for the
right reasons and temporal relationship
between pesticide usage and day of the
week.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Literature
1
There are a few precision agriculture
journals, such as Springer's
[http://www.springerlink.com/conten
t/103317/ Precision Agriculture] or
Elsevier's
[http://www.sciencedirect.com/scien
ce/journal/01681699 Computers and
Electronics in Agriculture], but those
are not exclusively devoted to data
mining in agriculture.
https://store.theartofservice.com/the-data-mining-toolkit.html
Data mining in agriculture - Conferences
1
There are many conferences organized
every year on data mining techniques and
applications, but rather few of them
consider problems arising in the
agricultural field. To date, there is only one
example of a conference completely
devoted to applications in agriculture of
data mining. It is organized by Georg Ruß.
This is the conference [http://dmaworkshop.de/ web page].
https://store.theartofservice.com/the-data-mining-toolkit.html
Dependent variables - Data mining
In data mining tools (for multivariate
statistics and machine learning), the
depending variable is assigned a role as
'target variable' (or in some tools as label
attribute), while a dependent variable may
be assigned a role as regular
variable.[http://1xltkxylmzx3z8gd647akcdv
ov.wpengine.netdna-cdn.com/wpcontent/uploads/2013/10/rapidminer-5.0manual-english_v1.0.pdf English Manual
version 1.0] for RapidMiner 5.0, October
2013
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Learning algorithms - Machine learning and data mining
1
* Machine learning focuses on prediction, based
on known properties learned from the training
data.
https://store.theartofservice.com/the-data-mining-toolkit.html
Learning algorithms - Machine learning and data mining
* Data mining focuses on the
discovery (observation)|discovery of
(previously) unknown properties in
the data. This is the analysis step of
Knowledge discovery|Knowledge
Discovery in Databases.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Learning algorithms - Machine learning and data mining
1
Much of the confusion between these two
research communities (which do often
have separate conferences and separate
journals, ECML PKDD being a major
exception) comes from the basic
assumptions they work with: in machine
learning, performance is usually
evaluated with respect to the ability to
reproduce known knowledge, while in
Knowledge Discovery and Data Mining
(KDD) the key task is the discovery of
previously unknown knowledge
https://store.theartofservice.com/the-data-mining-toolkit.html
Activity recognition - Data mining based approach to activity recognition
They proposed a data mining
approach based on discriminative
patterns which describe significant
changes between any two activity
classes of data to recognize
sequential, interleaved and
concurrent activities in a unified
solution.
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Activity recognition - Data mining based approach to activity recognition
Gilbert et al.Gilbert A, Illingworth J,
Bowden R. Action Recognition using
Mined Hierarchical Compound Features.
IEEE Trans Pattern Analysis and
Machine Learning use 2D corners in
both space and time. These are grouped
spatially and temporally using a
hierarchical process, with an increasing
search area. At each stage of the
hierarchy, the most distinctive and
descriptive features are learned
efficiently through data mining (Apriori
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Covert surveillance - Data mining and profiling
1
Data mining is the application of statistical
techniques and programmatic algorithms
to discover previously unnoticed
relationships within the data
https://store.theartofservice.com/the-data-mining-toolkit.html
Covert surveillance - Data mining and profiling
Economic (such as credit card
purchases) and social (such as
telephone calls and emails)
transactions in modern society create
large amounts of stored data and
records. In the past, this data was
documented in paper records, leaving
a paper trail, or was simply not
documented at all. Correlation of
paper-based records was a laborious
1
https://store.theartofservice.com/the-data-mining-toolkit.html
Covert surveillance - Data mining and profiling
1
But today many of these records are electronic,
resulting in an electronic trail
https://store.theartofservice.com/the-data-mining-toolkit.html
For More Information, Visit:
• https://store.theartofservice.co
m/the-data-mining-toolkit.html
The Art of Service
https://store.theartofservice.com