Download UHCL MIS - University of Houston

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
University of Houston – Clear Lake
Management Information Systems
ISAM 5931 – Summer 2006
DATA MINING BASICS
Prepared for: Dr. Mohammad Rob
Prepared By: Fouad Alibrahim & Mohammad Monakes
ABSTRACT
Data mining is the process of analyzing data from different perspectives and summarizing
it into useful information that can be used to increase revenue, cuts costs, or both. Data
mining software allows users to analyze data from many different dimensions or angles,
categorize it, and summarize the relationships. Technically, data mining is the process of
finding correlations or patterns among dozens of fields in large relational databases.
June 22, 2006
1. INTRODUCTION
Data Mining (DM) has the following two definitions in Wikipedia Encyclopedia.
First, DM is the nontrivial extraction of implicit, previously unknown, and potentially
useful information from data. Second, DM is the science of extracting useful
information from large data sets or databases. [2]
Data mining involves the process of analyzing data to show patterns or relationships.
A simple example of data mining is its use in a retail sales department. If a store
tracks the purchases of a customer and notices that a customer buys a lot of silk shirts,
the data mining system will make a correlation between that customer and silk shirts.
The sales department will look at that information and may begin direct mail
marketing of silk shirts to that customer, or it may alternatively attempt to get the
customer to buy a wider range of products. In this case, the data mining system used
by the retail store discovered new information about the customer that was previously
unknown to the company. Another widely used example is that of a very large North
American chain of supermarkets. Through intensive analysis of the transactions and
the goods bought over a period of time, analysts found that beers and diapers were
often bought together. Though explaining this interrelation might be difficult, taking
advantage of it, on the other hand, should not be hard (e.g. placing the high-profit
diapers next to the high-profit beers). This technique is often referred to as Market
Basket Analysis. [1]
2
2. WHAT IS DATA MINING?
Like all decision support systems, data mining delivers information. Data warehouses
with query tools are driven by the users. Data mining is a data-driven approach that
accomplishes the discovery of knowledge by itself. Organizations gather enterprise
data from operational systems, store the data in data warehouses. Data mining takes
the process a giant step further. [3]
Data mining (DM) and knowledge discovery are commonly seen as intelligent tools
that help to accumulate and process data and make use of it [4]. DM bridges many
technical areas, including databases, statistics, machine learning, and humancomputer interaction. The set of DM processes used to extract and verify patterns in
data is the core of the knowledge discovery process [4]. The idea of learning from
data is far from being new. However, likely due to developments in the database
management and the huge increase of data volumes being accumulated in databases
the interest in DM has become very intense. Numerous DM algorithms have recently
3
been developed to extract knowledge from large databases. Currently, most research
in the DM area focuses on the development of new algorithms or the improvement of
speed or accuracy of the existing ones [5].
3. OLAP VERSUS DATA MINING
With OLAP queries and analysis, users are able to obtain results from complex
queries and derive interesting patterns. Data mining also enables users to uncover
interesting patters, but there is an essential difference in the way the results are
obtained. OLAP helps the user to analyze the past and gain insights. On the other
hand, data mining helps the user predict the future. [3]
Data mining and OLAP are complementary. You may say that data mining picks up
where OLAP leaves off. The analyst drives the process while using OLAP tools. In
data mining, the analyst prepares the data and “sits back” while the tools drive the
process. [3]
Features
Data Granularity
Summary data.
Number of
business dimension
Number of
dimension attributes
Sizes of datasets for
the dimensions
Limited number
Of dimensions.
Small number
Of attributes.
Not large for each
dimension.
User-driven,
Interactive analysis.
Multidimensional
Drill-Down
Slice and dice.
DATA MINING
Predict the future based on
why this is happening.
Detailed
Transaction-level data.
Large number
Of dimensions.
Many dimension
Attributes.
Usually very large for each
dimension.
Data-driven automatic
knowledge discovery.
Prepare data
Launch mining tool, and
Sit back.
Widely used.
Still emerging.
Information Request
Analysis approach
Analysis techniques
State of the technology
OLAP
What is happening in the
enterprise?
4
4. DATA MINING APPLICATIONS
Customer Segmentation
This is one of the most widespread applications. Businesses use data mining to
understand their customers. Cluster detection algorithms discover clusters of
customers sharing the same characteristics.
Market Basket Analysis
This is a very useful application for retail. Link analysis algorithms uncover
affinities between products that are bought together. Other businesses such as
upscale auction houses use these algorithms to find customers to whom they can
sell higher-value items.
Risk Management
Insurance companies and mortgage businesses use data mining to uncover risks
associated with potential customers.
Fraud Detection
Credit card companies use data mining to discover abnormal spending patterns of
customers. Such patterns can expose fraudulent use of the cards.
Delinquency Tracking
Loan companies use the technology to track customers who are likely to default
on repayments.
Demand Prediction
Retail and other businesses use data mining to match demand and supply trends to
forecast demand for specific products. [3]
5
5. WHERE DATA MINING IS AND WHERE TO GO
Wu notices that a new successful industry (as DM) can follow consecutive phases:
(1) Discovering a new idea,
(2) Ensuring its applicability,
(3) Producing small-scale systems to test the market,
(4) Better understanding of new technology and
(5) Producing a fully scaled system.
At the present moment there are several dozens of DM systems, none of which can be
compared to the scale of a DBMS system. This fact according to Lin indicates that we
are still in the 3rd phase in the DM area. [5]
REFERENCES
1. UCLA Anderson School of Management Online at http://www.anderson.ucla.edu/
faculty/jason.frand/teacher/technologies/palace/datamining.htm.
2. Wikipedia Encyclopedia, Online at http://en.wikipedia.org/wiki/Data_mining.
3. Ponniah, P., 2001: Data Warehousing Fundamentals, 399-426.
4. Fayyad U. “Data Mining and Knowledge Discovery: Making Sense Out of Data”,
IEEE Expert 11(5), 1996, pp.20-25.
5. Wu X., Yu P., Piatetsky-Shapiro G., et al. “Data Mining: How Research Meets
Practical Development?” Knowledge and Inf. Systems 5(2), 2000, pp. 248 – 261.
6