Download DBMiner [1]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
DBMiner [1]:
A data mining tool for large relational databases [1]
DBMiner, a data mining system for interactive mining of multiple-level knowledge in
large relational databases, has been developed based on our years-of-research. The
system implements a wide spectrum of data mining functions, including
generalization, characterization, discrimination, association, classification, and
prediction. By incorporation of several interesting data mining techniques, including
attribute-oriented induction, progressive deepening for mining multiple-level rules, and
meta-rule guided knowledge mining, the system provides a user-friendly, interactive
data mining environment with good performance.
Project Overview [1]
A data mining system, DBMiner, has been developed for interactive mining of
multiple-level knowledge in large relational databases. It is based on studies of data
mining techniques and experience in the development of an early system prototype,
DBLearn. The system implements a wide spectrum of data mining functions, including
generalization, characterization, association, classification, and prediction. By
incorporation of several interesting data mining techniques, including attribute-oriented
induction, statistical analysis, progressive deepening for mining multiple-level
knowledge, and meta-rule guided mining, the system provides a user-friendly,
interactive data mining environment with good performance.
Project Description [1]
Figure: General architecture of DBMiner
The system has the following distinct features:

It incorporates several interesting data mining techniques, including
attribute-oriented induction, progressive deepening for mining multiple-level
rules and meta-rule guided knowledge mining, etc., and implements a wide
spectrum of data mining functions including generalization, characterization,
association, classification, and prediction.

It performs interactive data mining at multiple concept levels on any
user-specified set of data in a database using an SQL-like Data Mining Query
Language, DMQL, or a graphical user interface. Users may interactively set
and adjust various thresholds, control a data mining process, perform roll-up or
drill-down at multiple concept levels, and generate different forms of outputs,
including generalized relations, generalized feature tables, multiple forms of
generalized rules, visual presentation of rules, charts, curves, etc.

Efficient implementation techniques have been explored using different
data structures, including generalized relations and multiple-dimensional data
cubes, and being integrated with relational database techniques. The data
mining process may utilize user- or expert-defined set-grouping or schemalevel concept hierarchies which can be specified flexibly, adjusted dynamically
based on data distribution, and generated automatically for numerical
attributes.

Both UNIX and PC (Windows/NT) versions of the system adopt a
client/server architecture. The latter communicates with various commercial
database systems for data mining using the ODBC technology.
Major functional modules [1]:
Figure: Knowledge discovery modules of DBMiner
DBMiner characterizer
The characterizer generalizes a set of task-relevant data into a generalized relation
which can then be viewed at multiple concept levels from different angles. In
particular, it derives a set of characteristic rules which summarize the general
characteristics of a set of user-specified data (called the target class). For example,
the symptoms of a specific disease can be summarized by a characteristic rule.
DBMiner discriminator
A discriminator discovers a set of discriminant rules which summarize the features
that distinguish the class being examined (the target class) from other classes (called
contrasting classes). For example, to distinguish one disease from others, a
discriminant rule summarizes the symptoms that discriminate this disease from
others.
DBMiner association rule finder
An association rule finder discovers a set of association rules (in the form of
"
") at multiple concept levels from the relevant set(s)
of data in a database. For example, one may discover a set of symptoms frequently
occurring together with certain kinds of diseases and further study the reasons
behind them.
DBMiner data classifier
A classifier analyzes a set of training data(i.e., a set of objects whose class label is
known) and constructs a model for each class based on the features in the data. A set
of classification rules is generated by such a classification process, which can be
used to classify future data and develop a better understanding of each class in the
database. For example, one may classify diseases and provide the symptoms which
describe each class or subclass of diseases.
DBMiner predictor
A predictor predicts the possible values of some missing data or the value
distribution of certain attributes in a set of objects. This involves finding the set of
attributes relevant to the attribute of interest (by some statistical analysis) and
predicting the value distribution based on the set of data similar to the selected
object(s). For example, an employee's potential salary can be predicted based on the
salary distribution of similar employees in the company.
DBMiner meta-rule guided miner
A meta-rule guided miner is a data mining mechanism which takes a user-specified
meta-rule form, such as "
" as a pattern to confine the
search for desired rules. For example, one may specify the discovered rules to be in
the form of "" in order to find the relatinships between a student's major and his/her
gpa in a university database.
DBMiner evolution evaluator
A data evolution evaluator evaluates the data evolution regularities for certain
objects whose behavior changes over time. This may include characterization,
classification, association, or clustering of time-related data. For example, one may
find the general characteristics of the companies whose stock price has gone up over
20% last year or evaluate the trend or particular growth patterns of certain stocks.
DBMiner deviation evaluator
A deviation evaluator evaluates the deviation patterns for a set of task-relevant data
in the database. For example, one may discover and evaluate a set of stocks whose
behavior deviates from the trend of the majority of stocks during a certain period of
time. The module contains the following three functions:
1.
recognizes or identifies the general trend and/or behavior for data in
the database,
2.
detects the set of data which deviates from such a trend or behavior,
and
3.
summarizes the general characteristics of deviation data.
DBMiner user interfaces
Three user interfaces, UNIX-based, Windows/NT-based, and WWW/netscape-based
GUIs have been developed to allow users to interactively discover multiple-level
knowledge in large relational databases, it integrates well with existing commercial
database systems with high performance, and is robust at handling noise and
exceptional data.
Further Development of DBMiner [1]
The DBMiner system is currently being extended in several directions, as illustrated
below.

Further enhancement of the power and efficiency of data mining in relational
database systems, including the improvement of system performance and rule
discovery quality for the existing functional modules, and the development of
techniques for mining new kinds of rules, especially on time-related data.

Integration, maintenance and application of discovered knowledge, including
incremental update of discovered rules, removal of redundant or less interesting rules,
merging of discovered rules into a knowledge-base, intelligent query answering using
discovered knowledge, and the construction of multiple layered databases.

Extension of data mining technique towards advanced and/or special purpose
database systems, including extended-relational, object-oriented, text, spatial, temporal,
and heterogeneous databases. Currently, two such data mining systems, GeoMiner and
WebMiner, for mining knowledge in spatial databases and the Internet information-base
respectively, are being under design and construction.
Methodology [2,3]
We have developed a list of 14 criteria for evaluating DBMiner. These criteria can be put into
four categories: Capability, Learnability/Usability, Interoperability, and Flexibility. Capability
measures what a desktop tool can do, and how well it does it; Learnability/Usability means
how easy a tool is to learn and use; Interoperability means a tool’s ability to interface with
other computer applications; and Flexibility is the ease with which one can alter critical
guiding parameters, or create a customized environment.
Results [2,3]
We have used FoodMart database, which comes with MS SQL server for testing. The
FoodMart is made up of two cubes, Sales and Warehouse. The sales cube consists of 13
dimensions such as "Customers", "Educational Level", etc. and 7 fact tables (measurements)
such as “Profit”, “Sales Average”, “Sales Count”, etc. Warehouse cube consists of 7
dimensions and 7 measurements. The database is loaded with enough data sufficient for our
evaluations.
We have used DBMiner on Pentium 166 MHZ with 64 MB RAM, running Windows 2000.
Table 1 summarizes the results.
Capability
The criteria for capability we have selected are whether it is scalable to larger databases, has
programming language for automation, provides useful output reports, and if it has
visualization capabilities.
Given the training set of data we found that the scalability factor of the software was efficient.
Furthermore, The software does not use any programming language for automation, however it
has many wizards, which guides the end-user to get the tasks done. The software uses DMQL
(Data Mining Query Language) for its own task, however the user is not able to manipulate the
DMQL.
The visualization part of the software uses many graphics including ball graph, ball chart, grid,
and frequent item sets for visualizing Association, Classification, and Clustering, however pie
charts and correlation plots were missing. In addition, tree browsing was in graph view, which
was confusing. There is another part of visualization, OLAP browser, which uses MS Excel
2000 visualization capabilities. The OLAP browser depends on MS Excel 2000 without which
the OLAP browser is unable to function.
DBMiner shows the statistics report, which it calls it “mining results statistics.” This statistic
mentions the number of items identified; however, it does not mention the characteristics of the
results as well as analyzing the statistical results. In addition, we were not able to print any of
results from Associations, Classifications, and Clustering, as well as the statistics results, the
page was blank!
Learnability/Usability
There are six criteria for this category, namely tutorials, wizards, easy to learn, user’s manual,
online help, and interface.
DBMiner is not a complex program for people familiar with data mining. However, if you are
new to data mining the software does not include a tutorial to walk you through with an
example.
Wizards are built in for automating the tasks of data mining. The wizards let the user select
appropriate options for the tasks.
The user interface is very simple and standard. The menus are appropriate and so are the tool
bars. However, we found that some tool bars did not perform very well when enabled, such as
the tools in the visualization pane and the magnifier. In addition, some of the commands under
menus do not have any function associated with them, such as the “Export” command under
the file menu.
The user’s manual is well constructed for a user to find appropriate way to explore, however
the style of the user’s manual is old, not web fashioned. Furthermore, The user’s manual does
not contain links to other relevant topics. In addition, DBMiner has an average on line help.
Overall, we think the software is easy to learn and interact with given the user’s knowledge of
data mining techniques.
Interoperability
We use three criteria for this category: importing data, exporting data, and whether it has links
to other applications.
DBMiner does not support importing and exporting of data. However, it communicates with
MS OLAP Server and has MS Excel 2000 embedded as a visualization tool for OLAP
browsing.
Flexibility
Two criteria can be defined to explain the flexibility of the application namely if the work
environment is customizable and whether it is possible to write or change the code.
DBMiner uses DBQL for its internal functionality, however it is not possible to change or
write DBQL.
DBMiner has the flexibility to let the user change the values of settings after each task is done.
For example, it is possible to increase/decrease the support threshold or the confidence
threshold if the user is not happy with the current level.
Other Limitations
DBMiner depends only on MS SQL Server as its back-end and uses MS Excel 2000 as its
visualization tool for OLAP browsing. Other unavailable functional modules are data
dispersion module, time-serial analysis module, and prediction module.
Conclusions
DBMiner is a good data-mining tool as it reflects a user-friendly environment for users of all
category. The discussion above about the software substantiates our evaluation about the
software though there is a wide scope of improvement for the commercial version.
References
[1] Copied and pasted from “DBMiner: A data mining tool for large relational databases,”
http://db.cs.sfu.ca/sections/projects/dbminer.html
[2] Bhavani Thuraisingham, Data Mining: technologies, techniques, tools, and trends, CRC
press, 1999
[3] John F. Elder IV & Dean W. Abbott Elder Research, A Comparison of Leading Data Mining Tools, 1998
Appendix A
Table 1: Capability, Learnability/Usability, Interoperability, and Flexibility
Excellent
Scalability
Good
Average
Needs Improvement
Poor


Has programming language

Provides useful output reports
Visualization
Does Not Exist

Wizards

Easy to learn

User’s manual

Online help

Interface

Importing data

Exporting data


Links to other applications
Customizable work environment


Ability to write or change codes
Overall
