Download Databases Unplugged: Challenges in Ubiquitous Data Management

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IEEE 1355 wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Transcript
Databases Unplugged:
Challenges in Ubiquitous
Data Management
Michael Franklin
UC Berkeley
“Gazillions of Gizmos”


“In ten years, billions of people will be using the
Web, but a trillion "gizmos" will also be connected
to the Web.” Asilomar Rep. on DB Research, Dec. 1998
You’ve heard it before…
 Smartphones,
PDAs, Smartcards, badges, wearables,
lightswitches, toasters, …
 Worldwide sales of Internet-enabled appliances
projected to grow from 5.9M units in 1998 to
55.7M units in 2002. IDC via H&Q report
M. Franklin, 12/17/99
2
An Explosion in Scale
(Picture is by way of Randy Katz)
More
Many people
per computer
One person
per computer
Information
Appliances
Scaled down
PCs, desktop
metaphor
PC + Network
Distribution
WS/Server
Time Sharing
Batch
Less
M. Franklin, 12/17/99
Less
Many computers
per person
RJE
Personalization
3
More
Technical Challenges

Disconnection/Weak Connection
 Standard

Limited resources
 Memory,

Mobile apps use current and future locations.
Scale
 Number

CPU, Power, User Interface, Bandwidth
Movement/Location
 Killer

distributed database techniques break down.
and diversity of devices.
Reliability - Palm Pilots don’t bounce.
M. Franklin, 12/17/99
4
But, is Mobile Data Mgmt Needed?


“Fundamentally, the ability to access all information
from anywhere and have ONE unified and
synchronized information repository is critical to
making appliances useful.”
Hambrecht and Quist, iWord , March 1999
“All these information appliances have internal data
that "docks" with other data stores. Each gizmo is
a candidate for database system technology,
because most will store and manage some
information.” Asilomar Report
M. Franklin, 12/17/99
5
Road Map

Motivation

Alternative scenarios for mobile Databases

Technical/Research challenges

Some solutions
 Consistency
 Data
Dissemination
 Data Recharging

Conclusions
M. Franklin, 12/17/99
6
How Will it Happen?
Alternatives

SQL engine on the device (largely standalone)

Extension of enterprise infrastructure

Data Collection (device to infrastructure)

Data Dissemination (infrastructure to device)

PIM-driven information assistant
M. Franklin, 12/17/99
7
SQL Engine on the Device

Reasonable for Palmtop — but probably not
the toaster or light-switch…

Stand-alone with occasional synchronization.

Footprint versus functionality
 Engine
can be made surprisingly small (10-100s KB).
 Sybase uses “take what you need” library approach

All major vendors are playing in this space:
 Oracle
Lite, Sybase SQL Anywhere,
Informix/Cloudscape, DB2 for the Workpad,
SQL Server for Windows CE
But, what is the killer app???

M. Franklin, 12/17/99
8
Extension of Enterprise

Logical Progression?
 Mainframe->Desktop->Palm
 ERP->



Palm
Device becomes the endpoint of the enterprise
infrastructure (queries and updates).
This is happening but must take into account
fundamental limitations of the mobile platforms.
Again, examples exist, but the killer app
has not yet emerged here.
M. Franklin, 12/17/99
9
Data Collection Devices





Inventory Management/Tracking/Sensors/Census
Examples: Symbol technologies --- Palm with a bar
code scanner; more futuristic: smart dust.
Asymmetric (device to server) data flow/usage
dictates system architecture.
Many applications exist, but no clear need for full
function DBMS on the device.
Server-side DB must handle data streams
M. Franklin, 12/17/99
10
Data Dissemination

Many Potential Apps
 stock
and sports tickers
 traffic information systems
 software distribution
 news and/or entertainment delivery


Asymmetric (server to devices) data flow/usage
dictates system architecture.
No clear need for full function DBMS on the
device, but intelligent caching and filtering
on device is crucial.
M. Franklin, 12/17/99
11
Personal Information Management



PIM is the killer app for mobile devices.
So, use PIM to drive the data management
architecture.
Example: IBM’s Active Calendar
 Calendar
provides semantic information on what
information will be needed when (and where).
 Use this information to pre-stage information from
the fixed infrastructure.

This seems to be the most promising approach
for driving device DB functionality.
M. Franklin, 12/17/99
12
Research Issues

Transactions (not likely) and Consistency.

Distribution of function
 how
to split query functionality?
 adaptive??

New Querying and Access Models
 info
filtering and dissemination
 location centric/movement
 triggers/pervasive (invasive?) computing
 Evidence Accrual – killer app: dating game

Availability and Recovery
M. Franklin, 12/17/99
13
Data Caching and Consistency



How to keep distributed data consistent?
Centralized algorithms require connectivity at
specific times.
Alternative: Epidemic Algorithms (Peer-to-peer)
 Conflict
detection: timestamps, version vectors,…
 Conflict Handling (update commitment):
Optimistic (resolution) - Manual except in limited
domains,
 Pessimistic (avoidance) - primary copy,
write-all or voting-based.

M. Franklin, 12/17/99
14
Epidemic Protocol Illustration
(Picture is by way of Ugur Cetintemel)
M. Franklin, 12/17/99
15
Deno - Cetintemel and Keleher
Pessimistic, Asynchronous (epidemic), voting-based
“Bounded” weighted-voting:
Each replica is assigned a currency ci s.t. 0  ci  1.0
 Total currency in the system is bounded, i.e., ci=1.0
 Currency can be re-distributed for optimization or planned
disconnection.

An update’s life:

Sites issue tentative updates
Updates and votes are propagated in a pair-wise fashion
 Updates gather votes as they pass through sites
 An update commits when it gathers plurality of votes

M. Franklin, 12/17/99
16
Decentralized Update Commitment


An update u wins an election with
plurality
A site s maintains:
votes(u): the sum of votes u
gained so far
 unknown: the sum of votes
unknown to s
(i.e., 1.0 –  votes(u), for u)


u commits iff for all u’ <> u,
votes(u) > votes(u') + unknown and
votes(u) > unknown
Issues: time to commit; abort rates
M. Franklin, 12/17/99
17
s1
Oi
(s(s
u1uu)1))
,, 0.20,
1, 110.20,
(s
0.20,
(s
,
0.20,
u11)
1
(s
u=1uu)0.20
(s
,, 0.20,
5, 40.20,
2)
votes(u
)
(s
0.20,
1
(s
u
)
44, 0.20,
22)
)
=
(svotes(u
,
0.15,
u
)
1 2 0.20
6
(s
uu0.40
votes(u
6,, 0.25,
3)
(s
0.25,
1) =
unknown
=
0.80
6
3)0.20
)
=
(svotes(u
,
0.15,
u
)
1
2
1
unknown
=2)0.80
votes(u
(s2, 0.25,
u0.40
1) =
unknown
=
votes(u12) =0.60
0.20
votes(u12) = 0.55
0.15
votes(u12) = 0.20
unknown
= 0.60
votes(u
0.15
2) =
unknown
=
0.45
votes(u
)
=
0.25
0.45
23
unknown
= 0.30
votes(u
3) =
unknown
= 0.25
0.35
unknown = 0.10
uu1 commits!
2 commits!
Semantic Caching - Dar et al.


Idea: Maintain description of cache contents as a
set of logical predicates rather than a list of items.
Potential advantages:
 Less
overhead with no need for static clustering
(reduces bandwidth requirements).
 Describe missing items with logical remainder query.
 Application/Environment specific replacement
functions --- e.g. considering direction and velocity.

Issues:
 controlling
complexity of cache descriptions
 interacting with real database systems
M. Franklin, 12/17/99
18
Dissemination-Based Info Sys (DBIS)
1) Push vs. Pull is just one dimension along which to
compare data delivery mechanisms.
- We’ve identified three.
2) Different mechanisms for data delivery can (and
should) be applied at different points in the system.
- Select components from toolkit.
Franklin and Zdonik - Framework in OOPSLA 97,
Toolkit description and demo in SIGMOD 99.
M. Franklin, 12/17/99
19
DBIS Framework


An architecture that combines data delivery
techniques for responsive client access.
3 types of nodes:
 Data sources
 Clients
 Information brokers

Any data delivery mode can be used.
 Network

(can add value)
transparency
Possibly dynamic.
M. Franklin, 12/17/99
20
Delivery Options
Push
Pull
Aperiodic
Unicast 1-to-n
request/
response
request/
response
w/snoop
M. Franklin, 12/17/99
Periodic
Aperiodic
Periodic
Unicast 1-to-n
Unicast 1-to-n Unicast 1-to-n
polling
Email
lists
polling
w\snoop
publish/
subscribe
21
publish/
subscribe
Email
list
digests
Broadcast
disks
Network Transparency
Clients
Brokers
Sources
The type of a link matters only to nodes on each end
M. Franklin, 12/17/99
22
DBIS Example
Proxy
cache
An example:
Unicast
pull
DB
Server
Proxy
cache
Can vary
dynamically
Proxy
cache
M. Franklin, 12/17/99
23
Unicast
pull
1-to-n push
Unicast
pull
DBIS Research Issues

Each data delivery mechanism has unique aspects
 Broadcast
Disks - sched., caching, prefetching,updates
 On-demand Broadcast -scheduling, data staging
 Publish/Subscribe-large-scale filtering, channelization

Security/Fault-tolerance/Reliability

End-to-End network design and control

Fundamental performance tradeoffs

Exploiting existing and emerging technologies
M. Franklin, 12/17/99
24
“Data Recharging”

Mobile devices require 2 resources: power and data
 It
is impractical to be continuously connected to
fixed sources of these.

Devices cope with disconnection using caching:
 Power
cached in rechargeable batteries
 Data cached in hot-synched memory

Ideal: make recharging data as simple as power:
 Anywhere
(with adapters), anytime, flexible
connection duration

Joint work w/ Mitch Cherniack and Stan Zdonik
getting underway
M. Franklin, 12/17/99
25
Data Recharging - Research Agenda

Profile Definition and Maintenance

Update Storage and Preparation

Efficient integration of "recharge" updates with
existing cached data.
 Recharge,
Trickle Charge, Jump Start...

Consistency Guarantees

Global Data Staging

Approaches will be driven by (mostly PIM)
applications.
M. Franklin, 12/17/99
26
Conclusions

Lots of plausible/useful Mobile data architectures.
 For
many, the applications exist today
 Each has its own set of fascinating research
opportunities.

PIM is the killer app for mobile data access.
 It
can be used to drive the integration with
enterprise and Internet data sources.

Successful MDA work lies at the intersection of
communications and data management rather
than exclusively in either camp.
M. Franklin, 12/17/99
27