Download Definitions of Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
The manager of any company may ask his workers what our costumers mostly buy in Gaza
and in Kahn Younis. Probably this kind of question needs to discover the knowledge which is
stored in the database and requires a complex SQL statement.
Definitions of Data Mining
 The discovery of new information in terms of patterns or rules from vast amounts of
data is called Data Mining. It employs one or more computer learning techniques to
automatically analyze and extract knowledge from data on order to find interesting
structure in data. Data mining is actually one step of a larger process known as
knowledge discovery in databases (KDD).
 The KDD process model comprises six phases
 Data selection
 Data cleansing
 Enrichment
 Data transformation or encoding
 Data mining
 Reporting and displaying discovered knowledge
Data Warehousing
The data warehouse is a historical database designed for decision support. The relation
between data warehousing and data mining that data mining can be applied to the data in a
warehouse to help with certain types of decisions based on some rules or hidden patterns.
Data Mining Applications
1. Marketing
 Marketing strategies and consumer behavior
2. Finance
 Fraud detection, creditworthiness and investment analysis
3. Manufacturing
 Resource optimization
4. Health
 Image analysis, side effects of drug, and treatment effectiveness
5. Security
 Identifying the criminals and detecting similar crimes
Types of Discovered Knowledge
1. Association Rules: they are frequently used to generate rules from market-basket
data. Algorithms such as Apriori algorithm, sampling algorithm and FrequentPattern Tree are frequently used.
2. Classification Hierarchies: it is the process of learning a model that is able to describe
different classes of data. Learning is supervised as the classes to be learned are
predetermined.
3. Sequential Patterns:
4. Patterns Within Time Series
5. Clustering: Unsupervised learning or clustering builds models from data without
predefined classes. The goal is to place records into groups where the records in a
group are highly similar to each other and dissimilar to records in other groups. The kMeans algorithm is a simple yet effective clustering technique.