Download Database Research

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Department of Computer Science, National Tsing Hua University
Database Research:
The Past, The Present, and The Future
Yi-Shin Chen
Department of Computer Science
National Tsing Hua University
[email protected]
http://www.cs.nthu.edu.tw/~yishin/
Outline
 Motivation
 The Past

Evolution of Data Management
[Gray 1996]
 The Lowell Database Research Self
Assessment Report


Where did it come from?
What does it say?
 The Present
 The Future
Motivation

Database research is driven by new applications, technology trends, new
synergies with related fields, and innovation within the field itself.
New Stuff
The Database
Community
Evolution of Data Management
Cons:
• The transaction errors cannot
be detected on time
• The business did not know
the current state
1950: Univac had developed a
magnetic tape
1951: Univac I delivered to the
US Census Bureau
Manual
Record
Managers
1900
Punched-Card
Record Managers
Con:
• Navigational programming
interfaces are too low-level
• Need to use very primitive and
procedural database operations
Programmed Record
Managers
• Birth of high-level
programming
languages
• Batch processing
1955
On-line Network
Databases
• Indexed sequential
records
• Data independence
• Concurrent Access
1965 -1980
Evolution of Data Management (Contd.)
E.F. Codd outlined the
relational model
• Give Database users
high-level set-oriented
data access operations
Relational Databases && ClientServer Computing
• Uniform representation
• 1985: first standardized of SQL
• Unexpected benefit
• Client-Server
•Because of SQL, ODBC
• Parallel processing
•Relational operators naturally
support pipeline and partition
parallelism
• Graphical User Interface
•Easy to render a relation
• Oracle, Informix, Ingres
1970
1980
Multimedia Databases
• Richer data types
• OO databases
• Unifying procedures and data
• (Universal Server)
• Projects that push the limits
• NASA EOS/DIS projects
1995
2000
Research Self Assessment
 A group of senior database researchers gathers every
few years to access the state of database research and
point out some potential research problems





Laguna Beach, Calif. in 1989
Palo Alto, Calif. in 1990 and 1995
Cambridge, Mass. in 1996
Asilomar, Calif. in 1998
Lowell, Mass. in 2003
 The sixth ad-hoc meeting




Last for two days
25 senior database researchers
Output: the Lowell database research self assessment report
More information: http://research.microsoft.com/~gray/lowell/
Attendees

Serge Abiteboul, Martin Kersten, Rakesh Agrawal, Michael Pazzani, Phil
Bernstein, Mike Lesk, Mike Carey, David Maier, Stefano Ceri, Jeff Naughton,
Bruce Croft, Hans Schek, David DeWitt, Timos Sellis, Mike Franklin, Avi
Silberschatz, Hector Garcia Molina, Rick Snodgrass, Dieter Gawlick, Mike
Stonebraker, Jim Gray, Jeff Ullman, Laura Haas, Gerhard Weikum, Alon
Halevy , Jennifer Widom, Joe Hellerstein, Stan Zadonik, Yannis Ioannidis
Photos captured from http://www.research.microsoft.com/~gray/lowell/Photos.htm
The Main Driving Forces
 The focus of database research

Information storage, organization, management, and access
 The main driving forces

Internet
 Particularly by enabling “cross enterprise” applications
 Require stronger facilities for security and information integration

Sciences
 Generate large and complex data sets
 Need support for information integration, managing the pipeline of data
product produced by data analysis, storing and querying “ordered” data,
and integrating with the world-wide data grid
The Main Driving Forces (Contd.)

Traditional DBMS topics
 Technology keeps changing the rules  reassessment
 E.g.: The ratios of capacity/bandwidths change  reassess
storage management and query-processing algorithms


E.g., data-mining technology  DB component, NLP querying
Maturation of related technologies, for example:
 Data mining technology  DB component
 Information retrieval  integrate with DB search techniques
 Reasoning with uncertainty  fuzzy data
Next Generation Infrastructure

Discuss the various infrastructure components that
require new solutions or are novel in some other way
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Integration of Text, Data, Code and Streams
Information Fusion
Sensor Data and Sensor Networks
Multimedia Queries
Reasoning about Uncertain Data
Personalization
Data Mining
Self Adaptation
Privacy
Trustworthy Systems
New User Interfaces
One-Hundred-Year Storage
Query Optimization
Integration of Text, Data, Code and Streams
 Rethink basic DBMS architecture supporting:







Structured data
Text
Space and time
image and multimedia data
Procedural data
Triggers
Data streams and queues
 traditional DBMS
 information retrieval
 spatial and temporal DB
 image retrieval/multimedia DB
 user-defined functions
 make facilities scalable
 Data stream management
Integration of Text, Data, Code and Streams
 Rethink basic DBMS architecture supporting:







Structured data
Text
Space and time
image and multimedia data
Procedural data
Triggers
Data streams and queues
 traditional DBMS
 information retrieval
 spatial and temporal DB
 image retrieval/multimedia DB
 user-defined functions
 make facilities scalable
 Data stream management
 Start with a clean sheet of paper

SQL, XML Schema, XQuery
 Too complex
 Venders will pursue the extend-XML/SQL strategies
 Research community should explore a reconceptualization
Information Fusion
 The typical approach
 Because of Internet


Millions of information sources
Some data can only be
accessed at query time
 Perform information integration
on-the-fly

Extracttransformload tool (ETL)
 Work with the “Semantic Web”
people
 Other challenges

Data Warehouse
Need semantic-heterogeneity
solution


Security policy: Information in
each database is not free
Probabilistic world of evidence
accumulation
Web-scale
Sensor Data and Sensor Networks
 Characteristics



Draw more power when
communicating than when
computing
Rapidly changing
configurations
Might not completely
calibrated
Multimedia Queries
 Challenges

Create easy ways to:





Analyze
Summarize
Search
View
Require better facilities
for managing
multimedia information
Reasoning about Uncertain Data
 Traditional DBMS have no
facilities for either approximate
data or imprecise queries

(Almost) all data are uncertain or
imprecise
 DBMSs need built-in support for
data imprecision


The “lineage” of the data must be
tracked
Query processing must move to a
stochastic one
 The query answers will get better
 The system should characterize the
accuracy offered
Personalization
 Query answers should
depend on the user

Relevance feedback
should also depend on the
person and the context
 A framework for including
and exploiting
appropriate metadata for
personalization is needed

Need to verify the
information systems is
producing a “correct”
answer
Data Mining
 Focus on efficient ways to
discover models of
existing data sets

Developed algorithms are:
classification, clustering,
association-rule discovery,
summarization…etc.
 Challenges:


Data-mining research to
develop algorithms for
seeking unexpected
“ pearls of wisdom”
Integrate data mining with
querying, optimization, and
other database facilities
such as triggers
Self Adaptation
 Modern DBMSs are more complex


Must understand disk partitioning, parallel
query execution, thread pools, and userdefined data types
Shortage of competent database
administrators
 Goals


Perform tuning using a combination of a
rule-based system, a database of knob
settings, and configuration data
No knobs: all tuning decision are made
automatically
 Need user behaviors and workloads

Recognize internal malfunctions, identify
data corruption, detect application failures,
and do something about them
Privacy
 Security systems

Revitalize data-oriented
security research
 Specify the purpose of the
data request

Access decisions should
be based on
 Who is requesting the
data
 To what use it will be put
Trustworthy Systems
 Trustworthy systems





Safely store data
Protect data from unauthorized disclosure
Protect data from loss
Make it always available to authorized users
Ensure the correctness of query results and dataintensive computations
 Digital rights management


Protect intellectual property rights
Allow private conversation
New User Interfaces
 How best to render data visually?



During the 1980’s, we have QBE,
VisiCalc
Since then, nothing….
Need new better ideas in this area
 Query languages


SQL and XQuery are not for end
users
Possible choices?
 Keyword-based query  InformationRetrieval community
 Browsing  increasingly popular
 Ontology + speech on NL  semantic
Web +NLP
One-Hundred-Year Storage
 Archived information is disappearing



Capture on a deteriorating medium
Capture on a medium requiring obsolete devices
Application can interpret the information no longer works
 A DBMS system can




Content remains accessible in a useful form
Automate the process of migrating content between formats
Maintain he hardware and software that each document needs
Manage the metadata long with the stored document
Query Optimization
 Optimization of information integrators



For semi-structured query languages, e.g.,
XQuery
For stream processors
For sensor network
 Inter-Query optimization involving large
numbers of queries
Next Steps
 A test bed from Information-integration research
 Revisit the solved problems  Sea changes
 Avoid drawing too narrow a box around what we
do  Explore opportunities for combining
database and related technologies
Department of Computer Science, National Tsing Hua University
Thank You.
Any Question?
Reference
 Jim Gray. "Evolution of Data Management." Computer
v29 n10 (October 1996):38-46.
 http://www.research.microsoft.com/~gray/lowell/