Download Database Systems - Department of Computer Engineering

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Systems
“Breaking Out of the Box”
Avi Silberschatz
Bell Laboratories
Stan Zdonik
Brown University
July 7, 1997
Mehmet Uner
1
The Paper’s Theme (Strategic Directions)
1) Database Research should be devoted to the
problems of data management no matter where
and in what form the data might be found.
2) Database management skills should be applied
to new data management environments that
potentially require radically new software
architectures.
Mehmet Uner
2
Outline
 Introduction
 Background
 Our Skills
 Scenarios
 Barriers
 Research
 Conclusions
 References
Mehmet Uner
3
Introduction
 The field of database systems research and development has




been very successful over its 30 year history.
It has led to $10 billion industry that touches virtually every
major company in the world.
Unthinkable to manage large volume of valuable information
that keeps corporations runing without support from commercial
database management systems (DBMS).
DBMS is a very complex system incorporating a rich set of
technologies.
Suited for solving problems of large-scale data management in
the corporate setting.
Mehmet Uner
4
DBMS
DBMS Requirements:
 Execution Overhead.
 High level of expertise to install and
maintain.
 Only manages data in fairly specific file
formats.
Mehmet Uner
5
Solution
At the same time:
 Data is changing rapidly.
 Data is stored in different places (e.g. files)
 Data is obtained in large volumes from external sources
like sensors.
Solution:
 Not full-blown DBMS, a lighter-weight solution
 Instead of using an existing tool in a new application, it is
better to embed reusable components.
 Use database system components, techniques and
experience in new ways.
Mehmet Uner
6
Examples
 Some examples that could benefit from data
management techniques but that typically do
not make heavy use of database products:
–
–
–
–
World Wide Web
Personal Information Systems (e-mail)
News Services
Scientific Applications
Mehmet Uner
7
Background
 Database field born with release of IMS in 60’s.
– IBM Product
– Managed data as hierarchies
– Data has value, manage independently of application
 Codasyl, most well known successor
– Based on graph-based structure.
 Ted Codd published a paper in 1970
– Suggested relational model.
Mehmet Uner
8
Background
 Object Oriented Principles in 80’s
– Allow users to create their own application-specific
types that can be managed by the DBMS.
 Hybrid model in 90’s
– Embeds object-oriented features in a relational context.
Mehmet Uner
9
Our Skills
 Database Management Systems have been
concerned with the following problems:
–
–
–
–
High Performance
Correctness
Maintainability
Reliability
 From point of view of slow-memory devices that
must be shared by multiple concurrent users
 This approach leads to a set of skills and
techniques that can be applied and extended to
other problems.
Mehmet Uner
10
Skills and Techniques
 Data Modeling
– Language for defining structure of database
– Language for manipulating those structures.
 Query Languages
– High-level language to retrieve data from the
database. (SQL)
 Query Optimization and evaluation
 State-based views
– Restricted and reorganized view of database.
Mehmet Uner
11
Skills and Techniques
 Data Management
– Automatic maintenance of data structures
– Efficient Movement of data
 Transactions
– A response to correctness problems introduced by
concurrent access and update
 Distributed Systems
 Scalable Systems
– Database systems have been tuned to efficiently and
reliably handle data volumes that exceed the size of the
the physical memory by several orders of magnitude.
Mehmet Uner
12
Scenarios
 The way for future data management systems
 The technology that would support these scenarios
constitutes a research agenda for the next decade.
1) Instant Virtual Enterprise
2) Personal Information Systems
Mehmet Uner
13
Instant Virtual Enterprise
 An “instant virtual enterprise” (IVE) is a group of
companies, that do not routinely function as a unit.
 Come together to respond to a customer order or request
for proposal.
 Computer integrated manufacturing (CIM) is an example
of an environment requiring IVE cooperation.
 Engineering side
– Design, Production, Quality Assurance
 Administrative side
– Planning, Production Control, Resource Management
Mehmet Uner
14
Instant Virtual Enterprise
 Companies in IVE needs to exchange and
manage large amounts of data
 Companies will have many heterogeneous
databases
 Sharing and exchanging data with
coordinating information is critical
Mehmet Uner
15
IVE Scenario
Building an oil pipeline
Company A
Company Q
Engineering Firm (IVE)
License their design
Company R
Engineering Analysis
Company S
Mehmet Uner
16
IVE Scenario
Actual Fabrication
Company T
Company U
Casting
Design file conversion service
Company V
Documentation and Archiving
Company W
Mehmet Uner
17
IVE Scenario
 Database Capabilities Needed:
–
–
–
–
Executing a query for the design
Data translation services for engineering analysis
Coordination and configuration management
Changes to an object in one subsystem require changes
to one or more related objects in other subsystems.
– Security and access control over the information
– Archiving of information, even after the IVE disbands
Mehmet Uner
18
Personal Information Systems Scenario
 Provides information to an individual
 Uses PID (Personal Information Device)
– PDA
– Handheld PC
– Laptop
 Equipped with wireless network connection
 Access to internet Anywhere, Anytime.
Mehmet Uner
19
Personal Information Systems Scenario
 Tightly integrated with individual’s activities.
From morning to bed time.
 In the morning
–
–
–
–
–
–
Local Weather Report
List of Reminders
List of Morning Meetings
Best Route from home to work
Personalized Headlines
Personalized Investment Report
Mehmet Uner
20
Personal Information Systems Scenario
 Throughout the day
– Tasks for the day
– List of customers to contact
– Summary of breaking news
– Best Driving Routes in the city
 At the end of the day
– Next day’s activities
– Appointments
Mehmet Uner
21
Personal Information Systems Scenario
 PID must continuosly query remote
databases and monitor broadcast
information
 PID will magnify today’s client-server
performance, scalibility and reliability
problems
 Where should data reside, PID or Server?
Mehmet Uner
22
Barriers
 DBMS provides a tightly controlled and
highly uniform environment
 For the new applications, database
functionality should be provided outside of
the limits of a DBMS.
 For the vision represented in the scenarios,
a number of technical barriers must be
removed.
Mehmet Uner
23
Barriers
 Overhead
– System requirements, expertise, planning, monetary cost
– Builder of personalized newspaper service do not use DBMS
because there is no need for many of the advanced features.
– A subset of the traditional database services are needed by many
new applications
 Scale
– Greater volume of data (petabytes)
– Hundreds of servers, client population even larger
Mehmet Uner
24
Barriers
 Schema Organization
– First create a schema to describe the structure of the database and
populate the database
– Many applications currently create data independently of a
database system. (scientific applications, web sites)
– Schema is incomplete or inconsistent.
– Schema management facilities is needed to adapt the dynamic
nature of foreign data.
 Data Quality
– Information accessed form a WAN may be of varying quality.
– Future information systems must be able to react to the quality of
the data source.
Mehmet Uner
25
Barriers
 Heterogeneity
– Data exists in many forms
– These dissimilar formats must be integrated to allow
applications to access data in a high-level and uniform
way
 Query Complexity
– Different characteristics in future environments
• Conventional, minimize number of disk access
• Future, minimize total “information bill”
Mehmet Uner
26
Barriers
 Ease of Use
– Highly-trained, full-time staff is assumed to manage a
DBMS
– Yet most users have no training in database tech.
– Simple set of interfaces needed.
 Security
– As the amount of shared information grows, the need to
restrict access to specific users of for specific use
arises.
Mehmet Uner
27
Barriers
 Guaranting Acceptable Outcomes
– Transacation managemnet, a barrier to both system
performance and ability to specify acceptable outcomes
– New or enchanced transaction technology is needed
– Making data unavaliable is not acceptable
– Aborting transactions is unacceptable
 Technology Transfer
– Barrier between research and industry
• Insufficient knowledge of each other
Mehmet Uner
28
Research
 In order to achieve the vision and overcome these
barriers, a number of central research topics must
be addressed:
–
–
–
–
–
–
–
–
–
Extensibility and Componentization
Imprecise Results
Schemaless Databases
Ease-of Use
New transaction Model
Query Optimization
Data Movement
Security
Database Mining
Mehmet Uner
29
Research
– Extensibility and Componentization
• DBMS in a modular way
• Lighter-weight applications
– Imprecise Results
• In the web search engines do not provide 100%
accuracy
• A general theory of imprecision must be developed
– Schemaless Databases
• Able to work with unstructured data
Mehmet Uner
30
Research
– Ease-of-use
• Better database interfaces are required.
– New transaction Models
• Overcome blocking.
• Provides Correctness.
– Query Optimization
• New indexing methods, query processing strategies.
• Cheaper but slower response time.
• Sensitive to bandwidth and power considerations.
Mehmet Uner
31
Research
– Data Movement
• In a distributed environment, the cost of moving data can be
extremely high
• Asymmetric communication channels, (low bandwidth lines)
– Security
• Formulation of an authorization model
• Interoperability between differen security policies
– Database Mining
• Machine Learning
• Statistical Analysis
• Database Technologies
Mehmet Uner
32
Conclusions
 Database research must be broadly defined.
 Database community must apply its experience and
expertise to new areas and new solution packet must be
found.
 The vision is an integration that supports the application of
database functionality in small modules that give just the
right capability.
 These modules should also represent a unified theory of
information that allows for the querying information of all
types without having to switch languages or paradigms.
Mehmet Uner
33
References
 E. F. Codd, “A relational Model for Large Shared Databanks”,




Communications of the ACM, 13:6,(June 1970), pp. 377-387.
J. Gray,http://www.cs.washington.edu/homes/lazowska/cra/database.html
A. Silberschatz, M. Stonebraker, and J. Ullman, “Database Systems:
Achievements and Opportunities,” SIGMOD Record, 19:4, pp.6-22.
A. Silberschatz, M. Stonebraker, and J. Ullman, “Database Systems:
Achievements and Opportunities Into the 21st Century”,
http://www.cs.stanford.edu/pub/papers/lagii.ps
J. Toole and P. Young, http://www.hpcc.gov/cic/forum/CIC_Cover.html
Mehmet Uner
34
Thanks!
Any Questions?
Mehmet Uner
35