Download The Lowell Database Research Self Assessment

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
The Lowell Database
Research Self Assessment
2005-09-21
淡江大學 周清江
Summary
 Senior database researcher Meeting
 Senior database researchers have gathered every
few years to assess the state of database research
and to recommend problems and problem areas
deserve additional focus.





Laguna Beach, Calif. in 1989 [1]
Palo Alto, Calif. (“Lagunita”) in 1990 [2] and 1995 [3]
Cambridge, Mass. in 1996 [4]
Asilomar, Calif. in 1998 [5]
Lowell, Mass . In 2003
2
Focus
 information storage, organization,
management, and access
 it is driven by new applications, technology
trends, new synergies with related fields, and
innovation within the field itself
3
Sources of information and
information-processing demands
 Internet and web
 Cross enterprise vs. intra-enterprise
 Require stronger facilities for security and information
integration
 Science
 Large and complex data sets
 Pipeline of data products produced by data analysis
 Storing and querying “ordered” data
 Integrating with the world-wide data grid
 eCommerce
 To come: cheap micro-sensor technology that will
enable most things to report their status in real time
4
Major changes in the traditional
DBMS topics
 Technology advances require us to re-assess:
 Data models, access methods, query processing
algorithms, concurrency control, recovery, query
language, user interface
 Ex: Storage is improving in capacity and cost. Thus,
storage management and query-processing
algorithms have to be re-assessed.
 Cache-aware
 Maturation of related technologies, like data mining,
web search engines, artificial intelligence (speech,
natural language, reasoning with uncertainty,
machine learning)
 Personal information manager
5
Next Generation Infrastructure
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Integration of Text, Data, Code, and Streams
Information Fusion
Sensor Data and Sensor Networks
Multimedia Queries
Reasoning about Uncertain Data
Personalization
Data Mining
Self Adaptation
Privacy
Trustworthy Systems
New User Interfaces
One-Hundred-Year Storage
Query Optimization
6
Integration of Text, Data, Code, and Streams
 The Web has demonstrated the importance of more
sophisticated data types, like text, temporal, spatial,
sound, image, or video data.
 Make the following “fist-class citizen” of DBMS:







Uncertainty management (like information retrieval)
User-defined procedure data
Text, space, time image, and multimedia data
Structured data
Triggers (scalable)
Data streams (from micro-sensor devices) and queues
Scientific dataset
 XML Schema and XQuery are too complex to be the
basis for this sort of new architecture
7
Information Fusion
 Typical approach to information integration is by an





extract-transform-load (ETL) tool to build data
warehouse and data marts for a single cooperation
With the internet enabled integration integration
among different enterprises, data must stay at the
sources and be accessed at query time
ETL tool cannot be applied to sensor dataset
Web semantic heterogeneity solution still elusive
At web scale, query execution must move to
probabilistic evidence accumulation
When integration among autonomous enterprises,
each query processing must reveal only the minimal
information necessary in conformance with security
8
Sensor Data and Sensor Networks
 Self-powered, wireless device
 Draws more power when communicating than
when computing
 It is preferable to distribute query computation
to the individual nodes
 Query execution on sensor networks requires
the ability to adapt to rapidly changing
configurations
 How to deduce high-level fact from very lowlevel signals
9
Multimedia Queries
 How to create easy ways to analyze,
summarize, search, and view the “electronic
shoebox” of a person’s multimedia
information

Ex: how to prepare a multimedia presentation
about a child
10
Reasoning about Uncertain Data
 Non-business data is essentially uncertain or
imprecise



Scientific measurements have standard errors
GPS data involves uncertainty in current position
Sequence, image, and text similarity are approximate
metrics
 The “lineage” of the data must be tracked
 Query processing must move to a stochastic model,
where evidence accumulation is performed to obtain
a better and better answer
 Must handle imprecise queries
 Must be able to characterize the accuracy offered
11
Personalization
 Query answers should depend on personal
profiles
 Relevance and relevance feedback should
also depend on the person and the context
 A framework for including and exploiting
appropriate metadata for mass
personalization
 Personalization and uncertainty leave one
with a need to verify that the answer is
“correct”
12
Data Mining
 Historically, data mining focuses on efficient
ways to discover models of existing data sets
 Data warehouse users have only one data
mining query: “something interesting”
 Need to develop algorithms and structures to
look for “unexpected pearls”, while running in
the background and consuming excess
system resources
 Need to integrate data mining with querying,
optimization, and other DB facilities such as
triggers
13
Self Adaptation
 Modern DBMSs are too complex
 To simplify DB administration
1. It should be possible to perform tuning using
a combination of a rule-based system and a
database of knob settings and configuration
data. This needs more sophisticated models
of user behaviors and workloads.
2. DBMSs need to recognize internal
malfunctions and malfunctions of
communicating components, identify data
corruption, detect application failures, and do
something about them
14
Privacy
 Data-oriented security needs to be revitalized
 Need to address the concerns, policies and
mechanisms to support multiple individual
options and controls on information held by
third parties
 Access decisions should be based not only
on who is requesting the data but also on to
what use it will be put.
15
Trustworthy Systems
 Safely store data, protect it from unauthorized
disclosure, protect it from loss, and make it
always available to authorized users.
 Digital rights management
 Ensuring the correctness of query results and
data-intensive computation for embedded
systems
 Use logical inference technology in validating
correctness
16
New User Interfaces
 Sophisticated visualization systems
 Keyword-based query and browsing
 Use speech or natural language to query
through semantic web and ontology
17
One-Hundred-Year Storage
 A need for indefinite electronic storage of
information


Requires mechanisms for migration and for
emulation
Requires metadata for lineage and context
18
Query Optimization
 Optimization of information integrators, for
semi-structured query language like XQuery,
for stream processors, for sensor networks,
and other domains
 Inter-query optimization involving large
numbers of queries
19
Next steps and Discussions
 Generate a test bed and collection of
integration tasks

Classroom scheduling
 At which level should information integration
occur

DB or application
 Will web services make any progress on
addressing semantic heterogeneity?
20