Download 1 File Processing Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
COMP 378
Database Systems
Notes for Chapter 1 of Database System Concepts
Introduction
A database management system (DBMS) is a collection of data and an integrated set of programs
that access that data. The collection of data is often referred to as the database.
Example: Dickinson keeps information about each student: name, Banner ID number, standing, address
...
• this data makes up the database
• programs (or parts of programs) are used to add new students, to change a student’s address or
standing, to retrieve information about a student ...
Goals of a DBMS:
• manage large bodies of information
• provide convenient and efficient ways to store and access information
• secure information against system failure or tampering
• permit data to be shared among multiple users
1
File Processing Systems
Why are DBMSs needed? Alternative: store information in files, and write programs as needed to modify
these files and retrieve information.
A file processing system is a collection of files and programs that access/modify these files. Typically,
new files and programs are added over time (by different programmers) as new information needs to be
stored and new ways to access information are needed.
Problems with file processing systems:
• data redundancy and inconsistency
• difficulty of accessing data
• problems with concurrent access
Example: assume I’m paying for groceries with my MAC card at the same time my pay check is being
deposited (and my bank uses a file processing system):
deposit program
withdrawal program
1. read balance from checking account
file as $51
2. read balance from checking account file as $51
3. subtract $50 (for groceries)
4. update checking account file
(new balance: $1)
5. add $100 (my salary)
6. update checking account file (new balance: $151)
It is difficult to prevent such problems unless programs (example: withdrawal and deposit) are coordinated or integrated.
• atomicity problems - ensuring that a system failure during a database update does not leave the
database in an inconsistent state
1
• security problems
– not all users should have access to all data
– example: bank payroll personnel shouldn’t know my checking account balance
– difficult to enforce security in an ad hoc system
• integrity problems
– data may need to satisfy certain conditions, called consistency constraints
– example: account balances should never fall below $0
– difficult to enforce/add/change consistency constraints in a file processing system
DBMSs were developed to remedy these problems.
2
Data Abstraction
abstraction:
• (Random House College Dictionary) the act of taking away or separating
• hiding details that aren’t needed for a particular purpose
• data abstraction - hiding details of how data is stored and maintained
Levels of data abstraction (from low to high):
• physical level
– describes how data is actually stored on physical media
– example: files of records or objects
– needed by implementors and maintainers of DBMSs
• logical level
– describes what data is stored and how data is interrelated
– example: data is seen as tables or a graph or tree structure
– needed by database administrators (and “power users”) who must see all of the data
• view level
– describes some part or subset of the database
– simplifies the database for users who only need part
– useful for controlling access to data
– example: payroll personnel have a view that lets them see employee data, but not customer data
– can have as many views as needed. example: one for payroll personnel, one for tellers, one for
officers ..
– views can overlap
– the logical level is like a view that encompasses the entire database
2
view 1
...
view 2
view N
logical
level
physical
level
3
Instances and Schemas
The data stored in the database at any given time is an instance of the database. The overall design of the
database is the database schema.
Example:
name
Linda Null
Tim Wahls
account number
102
497
balance
8725.38
0.50
is an instance of a database with schema (name, account number, balance)
Database systems have schemas at each level of abstraction:
• the physical schema is the design at the physical level, i.e. as a file of records of a particular type
• the logical schema is the design at the logical level. Example: (name, account number, balance) is a
logical schema.
• a subschema is a design at the view level. For example, (name, account number) is a subschema of
(name, account number, balance)
As with levels of abstraction, there is:
• one physical schema
• one logical schema
• as many subschemas as needed
4
Data Models
A data model is a collection of conceptual tools for describing:
• data
• data relationships
• data semantics
• consistency constraints
Data models:
3
• provide a way of thinking about data that isn’t linked to the implementation of the database
• are used at the logical and view levels (roughly)
Data models of current or historical interest include:
• the network and hierarchical data models (obsolete)
• the relational model
• the entity-relationship model
• the object-oriented and object-relational models
• the semistructured data model (XML)
4.1
The Entity-Relationship and Object-Oriented Models
In the entity-relationship model:
• data is viewed as sets of entities that represent things in the real world, and sets of relationships among
entities
• entities in the system are distinct and uniquely identifiable
• example: customer entities can be uniquely identified by social security number
• data constraints can be specified explicitly
The object-oriented model is similar, except that:
• objects play the role of entities
• relationships among objects are not emphasized
• constructing new kinds of objects from old kinds (via inheritance) is emphasized
4.2
The Relational Model
In the relational model:
• sets of entities and sets of relationships between entities are represented as tables
• each table has multiple (named) columns
• each data item is represented as a fixed format record stored in a row of such a table
For example, a table storing customer information could have columns for the customer name, address and
social security number. Each row in such a table is then a record representing one customer.
Databases are often designed using the entity-relationship model, and then the design is translated to
the relational model for implementation.
The object-relational model is a combination of the object and relational models. Objects of a class can
be stored in a table with one column for each field (data member) of the class.
4
5
Database Languages
5.1
Data Definition Language
A data definition language (DDL) is the language used to define and modify the logical schema of the
database.
• often used to define/modify subschemas (views) and to specify consistency constraints
• the definition of the logical schema (written in the DDL) is compiled into a file or set of tables called
the data dictionary
• the data dictionary is consulted for schema information whenever data is read or modified
• a separate language called the data storage and definition language is used to access the physical
schema of the database
5.2
Data Manipulation Language
A data manipulation language (DML) is the language used at the logical and view levels to retrieve,
insert, delete and modify information stored in the database. A query is a request for information retrieval.
For example (in English): list all customers who have an account balance over $500. A query language
is the part of the DML used for retrieving information. The terms query language and data manipulation
language are often used interchangeably. Example: SQL is a DML (and also a DDL), but is usually called
a query language.
DMLs are classified by the kind of query language they include:
• procedural DMLs require the user to specify what data is needed and to provide an algorithm for
getting the data. The algorithm is written in the query language.
• nonprocedural DMLs require only that the user specify what data is needed.
DDLs manipulate schemas, while DMLs manipulate instances
6
Data Storage and Querying
The implementation of a DBMS is typically divided into two functional components: the storage manager
and the query processor.
6.1
The Storage Manager
The storage manager is the component that provides the interface between the physical level and the
logical level.
Components of the storage manager:
• the file manager
• the buffer manager, which attempts to minimize file reads and writes. Note that disk access is much
slower than main memory access.
• the authorization and integrity manager, which:
– checks that consistency constraints are satisfied, and takes some action when they aren’t
– prevents unauthorized access to data, typically through a password and security classification
system
• the transaction manager, which:
– logs database updates during normal operation
5
– detects when information in the database or data dictionary is lost or corrupted due to disk crash,
power failure, software errors ...
– restores the database to a previous consistent state using the log
The database manager for a small system typically does not implement all of these functions.
6.2
The Query Processor
Components of the query processor:
• the DDL interpreter, which handles DDL statements and records the effect in the data dictionary
• the DML compiler, which translates DML statements into low-level commands to the query evaluation engine. The DML compiler also does query optimization - evaluating alternative query
evaluation plans, and choosing the plan with the lowest cost (typically in terms of disk accesses).
• the query evaluation engine, which executes the query evaluation plan produced by the DML
compiler.
7
Transaction Management
A transaction is a sequence of DML commands that forms a logical unit of work. Example: transferring
money from one bank account to another.
Critical properties of transactions:
• atomicity - a transaction must execute completely or not at all (in terms of the final effect on the
database state).
• consistency - once a transaction completes successfully, the database must be in a consistent state.
The database may be in an inconsistent state while a transaction is executing. For example, during a
transaction that transfers money from account A to account B, the total funds held by the bank may
be inconsistent (after account A has been debited, but before account B has been credited). Note that
such consistency is largely the responsibility of the application programmer.
• isolation - a transaction must not be affected by other transactions that are executing concurrently.
• durability - once a transaction completes successfully, its effect must persist even in the presence of
system failures.
These properties are known as the ACID properties.
8
Data Mining
Data mining is the process of analyzing large volumes of data to find useful patterns (for example, that
young males who purchase video game systems are also likely to purchase HD televisions). Roughly speaking,
this process has three steps:
1. preprocessing data (often from multiple databases) to put it into the format expected by the data
mining algorithm
2. running the data mining algorithm, which typically uses statistical techniques to find patterns in the
data
3. manually inspecting the patterns found to determine which are useful
The SQL:1999 standard specifies constructs to support data mining, and many commercial tools for data
analysis and visualization are available.
6
9
Database Architecture
Modern database systems typically employ a client-server architecture:
• the database system itself runs on a server machine
• users interact with client machines, which connect to the server via a network
Application programs typically employ either a two-tier or a three-tier architecture:
• two-tier architecture
– the application program resides partially or entirely on the client machine
– the program communicates with the server using standards such as ODBC and JDBC
• three-tier architecture
– the application program communicates with an application server (Oracle WebLogic, IBM Websphere, JBoss), but contains no direct database calls
– the application server communicates with the database server
As the application server contains reusable business logic and Web interface components, this is an
effective strategy for developing large applications.
10
10.1
Database Users and Administrators
Database Users
• application programmers
– application programmers write application programs in some programming language (called
the host language) which permits the use of DML commands
– application programs are used to ease common or complicated interaction with the DBMS
– examples: generating paychecks, doing account transfers, ...
– techniques for executing DML commands from within host languages:
∗ using an application programming interface (API) that provides procedures/methods for sending DML commands to the database. Examples:
· Open Database Connectivity (ODBC) - an API defined by Microsoft for the C programming language
· Java Database Connectivity (JDBC)
∗ extending the host language syntax to embed DML calls within the program.
· special syntax is used to separate DML commands from host language commands
· a DML precompiler handles embedded DML commands before the program is compiled
∗ specialized languages that combine imperative control structures (loops, if statements, procedures) with DML statements
· often called rapid application development (RAD) tools or fourth-generation languages
· usually provided with the DBMS
· usually allow rapid GUI development with forms and reports
· examples: Visual Basic for Applications (from Microsoft) and PL/SQL (from Oracle)
• sophisticated users: interact with the DBMS using the DML directly.
• specialized users: interact with the DBMS in specialized ways:
7
– CAD
– expert systems
– graphical or audio data
– temporal data
– ...
• naive users: interact with the database through application programs (including Web interfaces).
10.2
Database Administrator
The database administrator (DBA) is the person(s) in control of the DBMS. One goal of DBMSs is to
permit centralized control.
Typical responsibilities of the database administrator:
• schema definition
• storage structure and access method definition
• schema and physical organization modification
• granting authorization for data access
• integrity constraint specification
• maintenance and backups
8