Download O Online Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
427
Online Data Mining
O
Héctor Oscar Nigro
Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina
Sandra Elizabeth González Císaro
Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina
INTRODUCTION
Several approaches for intelligent data analysis are not
only available but also tried and tested. Online analytical
processing (OLAP) and data mining represent two of the
most important approaches. They mainly emphasize different aspects of the data and allow deriving of different
kinds of information. So far, these approaches have mainly
been used in isolation (Schwarz, 2002).
OLAP is a powerful data analysis method for multidimensional analysis of data warehouses. Data mining is the
extraction of interesting (i.e., nontrivial, implicit, previously unknown, and potentially useful) information or
patterns from data in large databases (Fayad, PiatetskyShiapiro, Smyth, & Uthurusamy, 1996; Han & Kamber, 2001).
OLAP is more frequently used for verification of the
existing hypothesis. Data mining tries to generate such a
hypothesis by uncovering hidden patterns. It is essentially an inductive process. It means that OLAP and data
mining should be used together to complement each other
(Schwarz, 2002).
Motivated by the popularity of OLAP technology,
Han developed an online analytical mining (OLAM)
mechanism for multidimensional data mining.
This paper is organized in the following way: First, the
background section, in which OLAP and data mining are
introduced. Second, the main section is divided into the
following subsections: Definition, Online Mining—Expected Characteristics, Olam Architecture, Olam and Complex Data Types, and OLAM System—Essential Functionality. Then, Future Trends in the area. Finally the
Conclusions, References, Key Terms.
BACKGROUND
Basically, an OLAP system must show multidimensional
data with dimensions, measures, and hierarchies through
the cube’s paradigm. The multidimensional model simplifies for users the process of making complex queries,
modifying information in a report, swapping between
aggregated and detailed data, selecting part of the data, and
so forth (Han & Kamber, 2001; Lenz & Shoshani, 1997).
Selected functionalities include the following:
•
•
•
•
•
drilling down
rolling up
dice & slice
pivoting
cubing
There are three types of OLAP systems:
•
•
•
Rolap: Relational OLAP works on normalized data;
the classic models are star and snowflake. Its
characteristic is having few storing spaces but
with the disadvantage of presenting the information more slowly.
Molap: Multidimensional OLAP works with
precalculated data for the optimization of response time. It is very fast but it uses more space
than Rolap.
Hybrid: Hybrid OLAP combines the benefits
of both.
Data mining is an especially interesting and very
complex task, as defined in the previous section, and it is
also a confluence of multiple disciplines (see Table 1).
A good data mining query language will support ad
hoc and interactive data mining. Ad hoc query-based data
mining is better when users want to examine various data
portions with different constraints. Database queries can
be answered intelligently using concept hierarchies, data
mining results, or data mining techniques.
The major functions of data mining follow (Han, Chee
Sonny, & Chiang Jenny, 1999; Han & Kamber, 2001):
1.
2.
Characterization: Generalizes a set of task-relevant
data into a generalized data cube, which can then be
used to extract different kinds of rules or be viewed
at multiple levels of abstraction from different angles.
In particular, it derives a set of characteristic rules,
which summarize the general characteristics of a set
of user’s specified data (called the target class).
Comparison: Mines a set of discriminant rules
that summarize the features, which distinguish the
Copyright © 2006, Idea Group Inc., distributing in print or electronic forms without written permission of IGI is prohibited.
Online Data Mining
Table 1. A list of topics related to data mining
•
•
•
•
•
•
•
•
3.
4.
5.
6.
7.
Database systems, data warehouse, and OLAP
Statistics
Machine learning
Visualization
Information science
High performance computing
Business and application domain knowledge expertise
Other disciplines: neuronal networks, mathematical modeling, information
retrieval, pattern recognition, and others (Han, 2000)
class being examined (the target class) from other
classes (called contrasting classes).
Classification: Analyzes a set of training data
(i.e., a set of objects whose class label is known)
and constructs a model for each class based on the
features in the data. A set of classification rules is
generated by such a classification process, which
can be used to classify future data and develop a
better understanding of each class in the database.
Association: Discovers a set of association rules at
multiple levels of abstraction from the relevant set(s)
of data in a database.
Prediction: Predicts the possible values of some
missing data or the value distribution of certain
attributes in a set of objects. This involves finding
the set of attributes relevant to the attribute of
interest (by some statistical analysis) and predicting the value distribution based on the set of data
similar to the selected object(s).
Cluster analysis: Groups a selected set of data in
the database or data warehouse into a set of clusters
to ensure the interclass similarity is low and the
intra-class similarity is high.
Time series analysis: Performs data analyses for
time-related data in databases or data warehouses,
including similarity analysis, periodicity analysis,
sequential pattern analysis, and trend and deviation analysis.
ONLINE DATA MINING:
MAIN FEATURES
Online Mining:
Expected Characteristics
The following features are important for successful OLAM
(Han, 1998a):
•
Ability to mine anywhere
•
•
•
Carve any portion of a database by drilling,
dicing, filtering, and so forth, for mining
Drill through to raw data and mining
Support of multifeature cubes and cubes with complex dimensions and measures
•
•
•
Discovery-driven exploration of multifeature cubes
Spatial and multimedia dimensions/measures
Cube-based mining method
•
•
Multimining tasks and multilevel mining
Selection and addition of data mining algorithms
•
•
•
Different algorithms may generate different results
Standard APIs allow users to develop their own
algorithms
Integration of data mining functions
•
First dicing, then classification, then association, and so forth
Definition
Online analytical mining (OLAM; also called OLAP
mining) is among the many different paradigms and
architectures for data mining systems. It integrates OLAP
with data mining and mining knowledge in
multidimensional databases. (Han, 1997)
•
•
•
•
•
428
Ad-hoc query-based (constraint-based) mining
Fast response and high performance online
Visualization tools
Data and knowledge visualization tools
Extensibility
4 more pages are available in the full version of this document, which may be
purchased using the "Add to Cart" button on the publisher's webpage:
www.igi-global.com/chapter/online-data-mining/11184
Related Content
Electronic Tools for Online Assessments: An Illustrative Case Study from Teacher Education
Jon Margerum-Leys and Kristin M. Bass (2009). Database Technologies: Concepts, Methodologies, Tools,
and Applications (pp. 1291-1308).
www.irma-international.org/chapter/electronic-tools-online-assessments/7973/
Modeling of Business Rules for Active Database Application Specification
Youssef Amghar, Madjid Meziane and Andre Flory (2002). Advanced Topics in Database Research,
Volume 1 (pp. 135-156).
www.irma-international.org/chapter/modeling-business-rules-active-database/4326/
Is Extreme Programming Just Old Wine in New Bottles: A Comparison of Two Cases
Hilkka Merisalo-Rantanen, Tuure Tuunanen and Matti Rossi (2005). Journal of Database Management (pp.
41-61).
www.irma-international.org/article/extreme-programming-just-old-wine/3341/
Location-Aware Query Resolution for Location-Based Mobile Commerce: Performance
Evaluation and Optimization
James E. Wyse (2006). Journal of Database Management (pp. 41-65).
www.irma-international.org/article/location-aware-query-resolution-location/3357/
On the Versatility of Fuzzy Sets for Modeling Flexible Queries
P Bosc, A Hadjali and O Pivert (2008). Handbook of Research on Fuzzy Information Processing in
Databases (pp. 143-166).
www.irma-international.org/chapter/versatility-fuzzy-sets-modeling-flexible/20352/