Download Chapter 1 - Data Miners Inc

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Chapter 1
Why & What is Data Mining?
Note: Included in this Slide Set is both Chapter 1
material and additional material from the instructor.
Data Mining is a subset of
Business Intelligence (BI)
2
Topics to Discuss in Session #1
•
•
•
•
•
•
•
What is Data Mining (DM)?
Who uses DM?
Why DM
Where DM
When DM
How DM
Why study DM
3
What, Who
Data Mining – Definition & Goal
• Definition
– DM is the exploration and analysis of large quantities
of data in order to discover meaningful patterns and
rules.
• Goal
– To allow an “enterprise”* to IMPROVE its ______
through better understanding of its ______ .
– Potential for Competitive Advantage.
* Synonyms include: corporation, firm, non-profit organization, government agency
4
Foundations of Data Mining
 Data mining is the process of using “raw” data to infer
important “business” relationships.
 Despite a consensus on the value of data mining, a
great deal of confusion exists about what it is.
 Data Mining is a collection of powerful techniques
intended for analyzing large amounts of data.
 There is no single data mining approach, but rather a set
of techniques that can be used stand alone or in
combination with each other.
5
Why, Where, When
Data Mining – Why now?
1. Data are being produced
2. Data are being warehoused
3. Computing power is more affordable
4. Competitive pressures are enormous
5. Data Mining software is available
6
How
Customer Relationship Management (CRM)
7
Customer Relationship
Management (CRM)
How
In order to form a learning relationship with its
customers, an enterprise (firm) must be able to:
1. Notice – what its customers are doing
2. Remember – what it and its customers have
done over time
3. Learn – from what it has remembered
4. Act On – what it has learned to make
customers more profitable
8
How
Based on “Transaction” Data
9
How
Based on “Transaction” Data
10
Identifying and Remembering
Relationships is the Key!
How
11
Group Exercise #1
• Time Box = 15 minutes
• Teams of 4 or less
• Discuss DM situations among yourselves and
pick one to report to the class
• What to report (verbally – 5 minute max):
– Describe the DM situation
– How does it help the enterprise?
• Presentations…another 15 to 30 minutes
12
Why Study Data Mining?
Open discussion to identify these
13
Topics to Discuss in Session #2
• Data Mining History
• Data Warehouse
• Data Mart
14
Data Mining History
• The approach has roots in practice dating back
over 40 years.
• In the early 1960s, data mining was called
statistical analysis, and the pioneers were
statistical software companies such as SAS and
SPSS.
• By the late 1980s, the traditional techniques had
been augmented by new methods such as fuzzy
logic, heuristics and neural networks.
15
Definitions of a Data Warehouse
“A subject-oriented, integrated, time-variant and
1.
non-volatile collection of data in support of
management's decision making process”
- W.H. Inmon
2.
“A copy of transaction data, specifically
structured for query and analysis”
- Ralph Kimball
16
Data Warehouse
• For organizational learning to take place, data
from many sources must be gathered together
and organized in a consistent and useful way –
hence, Data Warehousing (DW)
• DW allows an organization (enterprise) to
remember what it has noticed about its data
• Data Mining techniques make use of the data in
a DW
17
Data Warehouse
Enterprise
“Database”
Customers
Orders
Transactions
Etc…
Vendors
Etc…
Data Miners:
• “Farmers” – they know
• “Explorers” - unpredictable
Copied,
organized
summarized
Data
Warehouse
Data Mining
18
Data Warehouse
 A data warehouse is a copy of transaction data
specifically structured for querying, analysis and
reporting – hence, data mining.
 Note that the data warehouse contains a copy of the
transactions which are not updated or changed later by
the transaction system.
 Also note that this data is specially structured, and may
have been transformed when it was copied into the data
warehouse.
19
Data Mart
• A Data Mart is a smaller, more focused
Data Warehouse – a mini-warehouse.
• A Data Mart typically reflects the business
rules of a specific business unit within an
enterprise.
20
Data Warehouse to Data Mart
Data
Warehouse
Data Mart
Decision
Support
Information
Data Mart
Decision
Support
Information
Data Mart
Decision
Support
Information
21
Data Warehouse & Mart
• Set of “Tables” – 2 or more dimensions
• Designed for Aggregation
22
Group Exercise #2
• Time Box = 15 minutes
• Teams of 4 or less
• Discuss Data Warehouse to Data Mart situations
among yourselves and pick one to report to the
class
• What to report (verbally – 5 minute max):
– Describe the DW to Data Mart situation
– How does it help the enterprise’s “business” unit?
• Presentations…another 15 to 30 minutes
23
Topics to Discuss in Session #3
• Data Mining Flavors
• Data Mining Examples
• Data Mining Tasks
• Data Mining’s Biggest Challenge
• What does all of this mean?
24
Data Mining Flavors
• Directed – Attempts to explain or
categorize some particular target field
such as income or response.
• Undirected – Attempts to find patterns or
similarities among groups of records
without the use of a particular target field
or collection of predefined classes.
25
Data Mining Examples in Enterprises
• US Government
– FBI – track down criminals (SD Police also)
– Treasury Dept – suspicious int’l funds transfer
• Phone companies
• Supermarkets & Superstores (Vons, Albertsons, WalMart, Costco)
• Mail-Order, On-Line Order (L.L. Bean, Victoria’s Secret,
Lands End)
• Financial Institutions (BofA, Wells Fargo, Charles
Schwab)
• Insurance Companies (USAA, Allstate, State Farm)
• Tons of others…
26
Data Mining Tasks
• Classification – example: Fr, So, Jr, Sr
• Estimation – example: household income
• Prediction – example: predict credit card
balance transfer average amount
• Affinity Grouping – Example: people who buy
X, often buy Y also with probability Z%
• Clustering – similar to classification but no
predefined classes
• Description and Profiling – behavior begets an
explanation such as “More guys prefer In-n-Out
Burger than do gals.”
27
Data Mining’s Biggest Challenge
• The largest challenge a data miner may face is
the sheer volume of data in the data warehouse.
• It is quite important, then, that summary data
also be available to get the analysis started.
• A major problem is that this sheer volume may
mask the important relationships the data miner
is interested in.
• The ability to overcome the volume and be able
to interpret the data is quite important.
28
What Does All of This Mean?
• On a regular basis, “farmers” and “explorers” utilize their
data warehouses to give guidance for and/or answer a
limitless variety of questions.
• Nothing is free, however, and the benefits do come with
a cost.
• The value of a data warehouse and subsequent data
mining is a result of the new and changed business
processes it enables – competitive advantage also.
• There are limitations, though - A Data Warehouse cannot
correct problems with its data, although it may help to
more clearly identify them.
29
End of Chapter 1
30