Download Set 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSE 636
Data Integration
Overview
Data Warehouse Architecture

Users

Applications
Relational Database
(Warehouse)
OLAP / Decision Support
Data Cubes / Data Mining
ETL Tools
(Extract-Transform-Load)
Data Cleaning
Data
Source
Data
Source
Data
Source
2
Virtual Integration Architecture
• Leave the data in the sources
• When a query comes in:
– Determine the relevant sources to the query
– Break down the query into sub-queries for the sources
– Get the answers from the sources, filter them if needed
and combine them appropriately
• Data is fresh
• Otherwise known as
On Demand Integration
3
Virtual Integration Architecture
Design-Time

End Users

Applications
Global
Schema
Schema
Mappings
Local
Data
Schema
Source
Local
Data
Schema
Source
Sources can be:
• Relational DBs
• Excel Files
• Web Sites
• Web Services
Local
Data
Schema
Source
4
Schema Mappings
• Differences in:
– Names in schema
– Attribute grouping
Inventory Database A
Inventory Database B
Books
Title
ISBN
Price
DiscountPrice
Edition
ISBN
FirstName
LastName
BookCategories
ISBN
Category
BooksAndMusic
Title
Author
Publisher
ItemID
ItemType
SuggestedPrice
Categories
Keywords
Authors
CDCategories
ASIN
Category
CDs
Album
ASIN
Price
DiscountPrice
Studio
– Coverage of databases
– Granularity and format of attributes
Artists
ASIN
ArtistName
GroupName
5
Issues for Schema Mappings
Design-Time

End Users

Applications
Global
Schema
Schema
Mappings
Local
Data
Schema
Source
Local
Data
Schema
Source
• What formalisms to
express them?
• How to create them?
• Can we discover them
somehow?
• How do we use them?
Local
Data
Schema
Source
6
Virtual Integration Architecture
Run-Time
Query
Result
Mediator
Reformulation
Optimization
Global
Schema
Execution
Wrapper
Local
Data
Schema
Source
Wrapper
Local
Data
Schema
Source
Local
Data
Schema
Source
7
Issues for Query Processing
Reformulation
Query
Mediator
Reformulation
Global
Schema
Local
Data
Schema
Source
Local
Data
Schema
Source
• User queries refer to
the global schema
• Data is stored in the
sources in a local
schema
• Rewriting algorithms
Local
Data
Schema
Source
8
Issues for Query Processing
Reformulation
Global Schema
Books
Title
ISBN
Price
DiscountPrice
Edition
SELECT ISBN, Price
FROM Books
WHERE Title = ‘on the road’
Local Schema A
BooksAndMusic
Title
Author
Publisher
ItemID
ItemType
SuggestedPrice
Categories
Keywords
SELECT ItemID, SuggestedPrice
FROM BooksAndMusic
WHERE Title = ‘on the road’
AND ItemType = ‘Books’
9
Issues for Query Processing
Query Translation
Query
• Different query
languages
Mediator
Reformulation
Optimization
Global
Schema
Execution
Wrapper
Local
Data
Schema
Source
Local
Data
Schema
Source
Local
Data
Schema
Source
10
Issues for Query Processing
Query Translation
Global Schema
Books
Title
ISBN
Price
DiscountPrice
Edition
SELECT ISBN, Price
FROM Books
WHERE Title = ‘on the road’
Local Source A
http://www.amazon.com/homepage.html?ItemType=Books&Title=on+the+road
11
Issues for Query Processing
Data Translation
Query
• Different data models
Mediator
Reformulation
Optimization
Global
Schema
Execution
Wrapper
Local
Data
Schema
Source
Local
Data
Schema
Source
Local
Data
Schema
Source
12
Issues for Query Processing
Data Translation
Global Schema
Books
Title
ISBN
Price
DiscountPrice
Edition
Title
ISBN Price
On the Road 123 10.86
Local Result A
…
…
…
…
<table>
<tr>
<td>
<a href=/details?isbn=123>
<b>On the Road</b>
</a>
-- by Jack Kerouac; Paperback
<br>
<a href=/details?isbn=123>
Buy new
</a>
:<b class=price>$10.86</b>
</td>
</tr>
</table>
13
Issues for Query Processing
Query Execution
Query
Mediator
Reformulation
Optimization
Global
Schema
Execution
Wrapper
Local
Data
Schema
Source
• Access as many data
sources as needed
• Duplicate/redundant
and irrelevant data
• Limited query
capabilities
Wrapper
Local
Data
Schema
Source
Local
Data
Schema
Source
14
Issues for Query Processing
Limited Query Capabilities
Global Schema
Books
E
DiscountPrice
8.86
Local Schema A
Local Schema B
BooksAndMusic
DiscountBooks
Title
B
Title
ISBN
ISBN Price
Price
123
10.86
DiscountPrice
Edition
SELECT ISBN, Price, DiscountPrice
FROM Books
WHERE Title = ‘on the road’
ItemID
Author
SuggestedPrice
ItemID
123ItemType 10.86
Title
D
GreatPrice
Edition
ISBN
8.86
GreatPrice
SuggestedPrice
A
SELECT ItemID, SuggestedPrice
FROM BooksAndMusic
WHERE Title = ‘on
? the road’
C
SELECT GreatPrice
FROM DiscountBooks
WHERE ISBN = 123
?
15
Issues for Query Processing
Query Answering
Query
Result
Mediator
Reformulation
Optimization
Global
Schema
Execution
Wrapper
Local
Data
Schema
Source
• Combine the results and
further process them if
needed
• Mainly union and merge
• Inconsistencies
Wrapper
Local
Data
Schema
Source
Local
Data
Schema
Source
16
Issues for Query Processing
Query Answering (Union)
ISBN
Price
123
10.86
456
8.86
ItemID
SuggestedPrice
ISBN
GreatPrice
123
10.86
456
8.86
17
Issues for Query Processing
Query Answering (Merge)
Primary
Key
ISBN
Title
Edition Price
123
On the Road
2nd
8.86
ItemID
Title
ISBN
Edition
Price
123
On the Road
123
2nd
8.86
Primary
Key
Primary
Key
18
Issues for Query Processing
Query Answering (Inconsistencies)
Primary
Key
ISBN
Title
Edition Price
123
On the Road
???
8.86
ItemID
Title
Edition
ISBN
Edition
Price
123
On the Road
1st
123
2nd
8.86
Primary
Key
Primary
Key
19
Peer-Based Integration
Query
Peer 4
Query
Peer 5
Peer 2
Peer 1
Peer 3
21
Peer-Based Integration
•
•
•
•
No need for a central mediated schema
Peers serve as mediators for other peers
A peer can be both a server and a client
Semantic relationships are specified locally
(between small sets of peers)
• Queries are posed using the peer’s schema
• Answers come from anywhere in the system
• This is not P2P file sharing.
– Data has rich semantics
22
References
• Information integration
– Maurizio Lenzerini
– Eighteenth International Joint Conference on Artificial
Intelligence, IJCAI 2003
– Invited Tutorial
• Data Integration: a Status Report
– Alon Halevy
– German Database Conference (BTW), 2003
– Invited Talk
23
Related documents