Download Data mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Wharehousing OLAP
Data Mining
S. Costantini
Università degli Studi di L’Aquila
[email protected]
Ringraziamenti
(Acknowledgment)
• Part of this material is taken from: Database
Systems: The Complete Book, by Hector
Garcia-Molina, Jeff Ullman, and Jennifer
Widom, edited by Prentice-Hall.
• URL: http://www-db.stanford.edu/~ullman/dscb.html
S. Costantini / Data Wharehousing
2
Cos’è in sostanza un Data
Wharehouse?
• E’ una vista materializzata
• Aggiornata a intervalli stabiliti (a
seconda dell’applicazione)
• E’ un cosiddetto “sistema di integrazione
di dati” perché può contenere dati
provenienti da vari database (detti
“sorgenti”)
S. Costantini / Data Wharehousing
3
Perché i Data Warehouse?
• Perché le query di analisi statistica ed
esame dei dati per estrarne varie
informazioni (dette query “OLAP”, vedi
seguito) sono pesanti e diminuiscono
troppo la performance del sistema. Però
non necessitano della versione più
aggiornata dei dati.
S. Costantini / Data Wharehousing
4
Perché i Data Wharehouse
• Allora conviene separare le query usuali
dalle query OLAP, creando per queste
ultime un Data Wharehouse
• Per le query OLAP il modello relazionale
non è ottimale, quindi nel creare un Data
Wharehouse il modello dei dati viene
modificato.
S. Costantini / Data Wharehousing
5
Observation
• Traditional database systems are tuned
to many, small, simple queries.
• Some new applications use fewer, more
time-consuming, complex queries.
• New architectures have been developed
to handle complex “analytic” queries
efficiently.
S. Costantini / Data Wharehousing
6
The Data Warehouse
• The most common form of data
integration.
– Copy sources into a single DB (warehouse)
and try to keep it up-to-date.
– Usual method: periodic reconstruction of
the warehouse, perhaps overnight.
– Frequently essential for analytic queries.
S. Costantini / Data Wharehousing
7
OLTP
• Most database operations involve OnLine Transaction Processing (OTLP).
– Short, simple, frequent queries and/or
modifications, each involving a small number
of tuples.
– Examples: Answering queries from a Web
interface, sales at cash registers, selling
airline tickets.
S. Costantini / Data Wharehousing
8
OLAP
• Of increasing importance are On-Line
Application Processing (OLAP) queries.
– Few, but complex queries --- may run for
hours.
– Queries do not depend on having an
absolutely up-to-date database.
S. Costantini / Data Wharehousing
9
OLAP Examples
1. Amazon analyzes purchases by its
customers to come up with an individual
screen with products of likely interest
to the customer.
2. Analysts at Wal-Mart look for items
with increasing sales in some region.
S. Costantini / Data Wharehousing
10
Data Warehouses
• Doing OLTP and OLAP in the same database
system is often impractical
– Different performance requirements
– Analysis queries require data from many sources
• Solution: Build a “data warehouse”
–
–
–
–
Copy data from various OLTP systems
Optimize data organization, system tuning for OLAP
Transactions aren’t slowed by big analysis queries
Periodically refresh the data in the warehouse
S. Costantini / Data Wharehousing
11
Common Architecture
• Relational Databases handle OLTP.
• Local databases copied to a central
warehouse overnight.
• Analysts use the warehouse for OLAP.
S. Costantini / Data Wharehousing
12
Definition of data warehousing
A data warehouse is a subject-oriented,
integrated, time-variant and nonvolatile collection of data in support of
management’s decision making process.
S. Costantini / Data Wharehousing
13
Loading the Data Warehouse
Data is periodically
extracted
Data is cleansed and
transformed
Data Staging Area
Source Systems
(OLTP)
Users query the data
warehouse
Data Warehouse
S. Costantini / Data Wharehousing
14
Data Mining
•
•
Data mining is a popular term for
queries that summarize big data sets in
useful ways.
Examples:
1. Clustering all Web pages by topic.
2. Finding characteristics of fraudulent
credit-card use.
S. Costantini / Data Wharehousing
15
Data Warehouse
Enterprise
“Database”
Customers
Orders
Transactions
Etc…
Vendors
Data Miners:
• “Farmers” – they know
• “Explorers” - unpredictable
Etc…
Copied,
organized
summarized
Data
Warehouse
Data Mining
S. Costantini / Data Wharehousing
16
Market-Basket Data
• An important form of mining from
relational data involves market baskets =
sets of “items” that are purchased
together as a customer leaves a store.
• Summary of basket data is frequent
itemsets = sets of items that often
appear together in baskets.
S. Costantini / Data Wharehousing
17
Data Mining Flavors
• Directed – Attempts to explain or
categorize some particular target field
such as income or response.
• Undirected – Attempts to find patterns or
similarities among groups of records
without the use of a particular target field
or collection of predefined classes.
S. Costantini / Data Wharehousing
18
Data Mining Examples in
Enterprises
• Government
– Track down criminals (Police also)
– Treasury Dept – suspicious int’l funds
transfer
• Phone companies
• Supermarkets & Superstores
• Mail-Order, On-Line Order
S. Costantini / Data Wharehousing
19
Data Mining Examples in
Enterprises
•
•
•
•
Financial Institutions
Insurance Companies
Web sites
Many others…
S. Costantini / Data Wharehousing
20