Download TYPES OF DATABASES...…

Document related concepts

Oracle Database wikipedia , lookup

Microsoft Access wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
LECTURE THREE
Database System
Environment
1
TYPES OF DATABASES
• Hierarchical databases: The earliest
DBMS were based on hierarchical method
of storing data. The earlier systems were
an extension of the COBOL file structure.
This method begins by claiming that
business data exhibits a hierarchical
relationship. For example, a small office
without computers might store data in
filling cabinets.
2
TYPES OF DATABASES..…
• The cabinets would be organized by
customer. Each customer section would
contain folders for individual orders, and
each order would list items being
purchased. To store or retrieve data, the
database must start at the top with a
customer in this example. When the
database stores the customer data, it
stores with the rest of the hierarchical data
with it.
3
TYPES OF DATABASES.…
4
TYPES OF DATABASES..…
• Hierarchical database is relatively fast as
long as you only want to access the data
from the top. The serious problem is when
one is searching for items from the bottom
or middle. E.g. to find all customers who
ordered a specific item, the database
would have to inspect each customer,
every order and each item. Many of this
earlier database approaches still survive,
partly because it is difficult to throw away
applications that still work.
5
TYPES OF DATABASES...…
• Network databases: This database has
nothing to do with physical networks, (e.g.
Local Area Networks (LANs). The network
model is named from the network of
connections between the data elements.
The primary goal of the network model
was to solve the hierarchical problem of
searching for data from different
perspective. The following figure illustrates
this.
6
TYPES OF DATABASES..…
7
TYPES OF DATABASES…..
• First notice that the items are now
physically separated and they are
connected by arrows. There are also entry
points which are predefined items that can
be searched. The purpose of the arrows is
to show that once you enter the database,
the DBMS can follow the arrows to find
and display the matching data.
8
TYPES OF DATABASES…..
• Although this approach solves the search
problem, it is very complex costly. The
developer must anticipate every question
a user might ask about the data because
the arrows (indexes) have to be built
physically before the question is asked.
Building and maintaining Indexes requires
huge amounts of time and storage space.
9
TYPES OF DATABASES…
• Relational databases: The relational
database originated in the 1970s. The
key is that the tables (called “relations”)
are sets of data. Each table stores
attributes in columns that describe
specific entities. This tables are not
physical connected to each other. The
connections exist through the matching
data stored in each table. The
illustration is on the next slide
10
TYPES OF DATABASES..…
11
TYPES OF DATABASES.…
• The strength of the relation approach is
that the designer/developer does not need
to know the which questions might be
asked of data. If the data is carefully
defined, the database can answer virtually
any question efficiently. This flexibility and
efficiency is the primary reason for the
dominance of the relational model. The
focus of the course will be on building
applications for relational databases.
12
TYPES OF DATABASES…
• Object-Oriented databases: This is a
new and evolving method of organizing
data. It began as a new method of
developing programs. The goal is to
create objects that can be reused in
many programs, thus saving on time
and reducing errors. An object has
three major components: Name, Set of
properties (attributes) and Methods
(Functions).
13
TYPES OF DATABASES..…
• The properties describe the object just
as attributes describe an entity in a
relational database.
• The methods are the true innovations of
the oo approach.
They are short
programs that define the actions that
each object can take. For example, a
code to add a customer could be stored
in an object “Customer”.
14
TYPES OF DATABASES…
Object-Oriented DBMS
Order
Customer
OrderID
CustomerID
…
NewOrder
DeleteOrder
…
CustomerID
Name
…
Add Customer
Drop Customer
Change Address
ContactName
ContactPhone
…
OrderItem
Item
NewContact
OrderID
ItemID
…
OrderItem
DropOrderItem
…
ItemID
Description
…
New Item
Sell Item
Buy Item …
Commercial
Customer
15
TYPES OF DATABASES…
• There are two approaches to handling true
object oriented data:
1.Extend the relation model so that it can
handle OO features
2.Create a new object –oriented DBMS
Most commercial successful database
systems follow the first approach by
adding object features to the relational
model
16
Examples of Commercial Systems
•
•
•
•
•
•
•
•
Oracle
Informix (Unix)
DB2, SQL/DS (IBM)
Access (Microsoft)
SQL Server (Microsoft +)
Many older (Focus, IMS, ...)
mySQL
ProgresSQL
17
Database System Environment
Stored Data Manager
• The database and the database catalogue
are stored on disk
• Access to the disk is handled by the
Operating System.
• A higher-level stored data manager
controls access to DBMS information that
is stored on disk, whether part of the
database or the catalogue.
18
Database System Environment..…
• The stored data manager may use
basic OS services for carrying out
low-level data transfer, such as
handling buffers.
• Once data is in buffers, the other
DBMS modules, as well as other
application programs can process it.
19
Database System Environment…
Data
Definition
Language
(DDL)
Compiler
• Processes the schema definitions and
stores the descriptions (meta-data) in the
catalogue.
Runtime Database Processor
• Handles database access at runtime.
• Received retrieval or update operations
and carries them out on the database.
20
Database System Environment…
• Access to the disk goes through the
stored data manager.
Query Compiler
• Handles high-level queries entered
interactively.
• Passes, analyzes and interprets a
query, then generates calls to the
runtime processor for execution.
21
Database System Environment…
Precompiler
• Extracts Data Manipulation Language
(DML) commands from an application
program written in a host language.
• Commands are sent to DML compiler
for compilation into code for database
access. The rest is sent to the host
language compiler.
22
Database System Environment…
Client Program
• Accesses the DBMS running on a
separate computer from the computer
on which the database resides. It is
called the client computer, and the
other is the database server. In some
cases a middle level is called the
application server.
23
Database System Utilities
DBMSs have database utilities that help
the DBA manage the system. Functions
include:
Loading - used to load existing
text/sequential files into the database.
Source format and desired target file are
specified to this utility, and the utility
reformats the data to load into a table.
24
Database System Utilities…
• Backup – creates a backup copy of the
database, usually by dumping database
onto tape. Can be used to restore the
database in case of failure. Incremental
backup can be used which records only
the changes since the last backup.
• File
Reorganization
–
reorganize
database
files
into
different
file
organizations to improve performance.
25
Database System Utilities..…
• Performance Monitoring – monitors
database
usage
and
provides
statistics to the DBA. DBA uses the
statistics for decision-making.
26
Data Dictionary
• Data dictionary system – stores catalog
information
about
schemas
and
constraints, as well as design decisions,
usage standards, application program
descriptions, user information. Also called
an information repository. Can be
accesses directly by DBA or users when
needed.
27
Application development
• Application development environments
– (i.e. JBuilder) provide environment for
developing database applications, and
include facilities to help in database
design, GUI development, querying and
updating and application development.
• CASE Tools – used in the design phase
to help speed up the development
process.
28
Communication Facilities
• Communication software – allow
users at remote locations to access the
database through computer terminals,
workstations or personal computers.
Connected to the database through
data communications hardware such as
phone lines, local area networks etc.
29
Centralized DBMS Architecture
• Used mainframes to provide main
processing for user application
programs, user interface programs
and DBMS functionality
• User accessed systems via ‘dumb’
computer terminals that only provided
display
capabilities,
with
no
processing capabilities.
30
Centralized DBMS Architecture…
• All processing was performed remotely
on the computer system, and only
display information was sent to the
terminals, connected via a network.
• Dumb terminals were replaced with
workstations, which lead to the
client/server architecture.
31
Centralized DBMS Architecture
Terminals
Display
Display
Display
Monitor
Monitor
Monitor
Network
Mainframe
SOFTWARE
(Application Programs, DBMS, Text Editors,
Compilers etc)
HARDWARE
(CPU, Controller, Memory, Disk, IO
Devices)
32
Client-Server
Server
Server
Shared
Database
Front-end
User Interface
Clients
Clients
33
Client Server Architecture
• Define specialized servers with
specific functionalities (file servers,
print servers, web servers, database
servers)
• Many client machines can access
resources provided by specialized
server.
34
Client Server Architecture….
• Some machines are client sites, with
client software installed and other
machines are dedicated servers.
• Client machines provide user with the
appropriate interfaces to utilize
servers, as well as with local
processing power to run local
applications.
35
Client Server Architecture….
• Client – a user machine that provides
user interface capabilities and local
processing.
• Server – machine that provides
services to client machines such as
file access, printing, and database
access.
36
Two Tier Client/Server
Architecture for DBMSs
• In relational DBMSs, user interfaces
and application programs were first
moved to the client side.
• SQL provided a standard language,
which was a logical dividing point
between client and server.
37
Two Tier Client/Server
Architecture for DBMSs…
• Query and transaction functionality
remained on server side. In this
architecture, the server is called a
query server, or transaction server.
• In relational DBMSs, the server is
called an SQL server, because most
RDBMSs use SQL.
38
Two Tier Client/Server
Architecture for DBMSs…
• In such systems, the user interface
and application programs run on the
client, when DMBS access is needed,
the program establishes a connection
to the DBMS on the server side.
Once the connection is created, the
client can communicate with the
DBMS.
39
Two Tier Client/Server
Architecture for DBMSs…
• ODBC (Open Database Connectivity)
is a standard that provides an
application processing interface which
allows client side programs to call the
DBMS as long as both sides have the
required software. Most database
vendors provide ODBC drivers for
their systems.
40
Two Tier Client/Server
Architecture for DBMSs…
• Client programs can connect to several
RDBMS and send query and transaction
requests using the ODBC API
• Query requests are sent from the client
to the server, and the server processes
the request and sends the result to the
client.
41
Two Tier Client/Server
Architecture for DBMSs…
• A related Java standard is JDBC,
which allows Java programs to
access the DBMS through a standard
interface.
• These systems are called two tier
architectures because the software
components are distributed over two
systems, the client and server.
42
Three-Tier Client Server Architecture
for Web Applications
• Many web applications use three-tier
architecture, which adds an intermediate
layer between the client and the database
server.
• The middle tier is called the application
server, or the web server. Plays an
intermediate role, by storing business
rules (procedures/constraints) used to
access data from database.
43
Three-Tier Client Server Architecture
for Web Applications….
• Can improve database security by
checking the clients credentials
before
forwarding
request
to
database server.
• Clients contain GUI interfaces and
application specific rules.
44
Three-Tier Client Server Architecture
for Web Applications….
• The intermediate server accepts the
requests from the client, processes
the request and sends the database
commands to the db server, then
passes the data from the database
server to the client, where it may be
processes further and filtered.
45
Three-Tier Client Server Architecture
for Web Applications….
• The three tiers are: user interface,
application rules, and data access.
GUI Web Interface
Application Programs, Web Pages
DBMS
46
Three-Tier Client-Server
• Server Databases
• Client front-end
• Middle
– Locate databases
– Business rules
– Program code
Databases.
Transactions.
Legacy applications.
Database links.
Business rules.
Program code.
Application.
Front-end.
User Interface.
Database
Servers
Middleware
Client
47
Distributed Databases
• A distributed database consists of
multiple independent databases that
operate on two or more computers
which are connected. The databases
are usually in different physical
locations.
Each database is
controlled by an independent DBMS,
which is responsible in maintaining
the integrity of its own databases.
48
Distributed Databases…..
• In extreme situations, the databases
might be installed on different hardware,
use different operating systems, and
event use DBMS from different vendors.
This is a complex environment. Most
current distributed database function
better if all of the environments are
running DBMS software from the same
vendor.
49
Distributed Database Definition
• Multiple independent databases
– Each DBMS is a complete
DBMS (engine, queries,
locking, transactions, etc.)
Database
– Usually on different
Zeus
machines.
England
– Usually in different locations.
• Connected by a network.
• Might be different environments
– Hardware
– Operating System
– DBMS Software
Database
Apollo
France
Database
Athena
United States
50
Distributed Databases…..
• In the above example, a company could
have offices in England, France and USA.
Workers in USA would rarely need to see
the daily operations of workers in France.
On the other hand, workers in France and
England could be working on a large
international project. The network and
distributed databases would enable them
to share data and treat the project as if all
information were in one place.
51
Distributed Databases....
• Distributed databases can have different
configurations. The most popular method
is the client/server approach. The server
computers is more powerful and provides
data for client computers which could be
PCs with a GUI. The role of the client
would be to provide interface to the user,
collect and display data, and return data to
the appropriate server.
52
Distributed Databases....
• An important rule of distributed
databases is that the user should not
know or care that the database is
distributed. A user should be able to
create and run queries as if the
database was on one computer.
Behind the scenes, the DBMS might
connect to several computers, collect
data, format the results. The user does
not need to know of these steps.
53
Distributed Database Rules
• The user should not know or care that
the database is distributed.
–Local autonomy.
–No reliance on a central site.
–Continuous operation.
–Location independence.
–Fragmentation independence (physical
storage).
–Replication independence.
54
Distributed Database Rules.....
–Distributed query processing.
–Distributed transaction
management.
–Hardware independence.
–Operating system independence.
–Network independence.
–DBMS independence
55
Distributed Databases....
• The main advantage of distributed
database approach is that it matches
the organizations function. Business
operations are distributed across
different locations. Most updates and
queries are performed locally. Each
office retains local control and
responsibility of data. Yet the system
enables anyone with proper authority to
retrieve data from any portion of the
company when the need arises.
56
Distributed Databases....
The advantages of distributed databases
are as follows:
• Each database can continue to run
even if a portion fails.
• Data and hardware can be moved
without affecting operations or users.
–Expanding operations.
–Performance issues.
57
Distributed Databases....
System expansion and upgrades.
–Add new section without affecting
others.
–Upgrade hardware, network and
DBMS.
58
Advantages of Distributed
Databases
• Business operations are
often distributed
local
transactions
– Work and data are segmented
by department.
– Work and data are segmented
by geographical location.
• Improved performance
– Most updates and queries are
performed locally.
– Maintain local control and
responsibility over data.
• Can still combine data
across the system.
• Scalability and expansion
– Add on, not replacement.
future
expansion
59
Data Warehousing
• A data warehouse is where information
is organized for quick retrieval. Data is
got from different sources (usually
databases) set up for different
purposes
60
Differences to Traditional Database
• Data is organized around major subjects
rather than individual transactions
• Summarized data is used rather than
detailed data
• Data is framed for long time decision
making
• They are organized for quick queries not
so much for efficient storage
61
• Optimized for complex queries known
as OLAP (online analytical processing).
Allows managers to look at a database
at different dimensions
• Allows easy access via data mining
(swift ware) that searches for patterns
and is able to identify relationships
62
• Include multiple databases that have been
processed so that data is uniform (clean
data)
• They include data from outside sources and
the one generated internally
• Building a warehouse is complex. An
analyst gathers information from a variety of
sources, translates it into a common form
e.g. a database of gender could be “male”
“female”, another one could have “M” and
63
“F” while a third one could have “0” and “1”
• Once clean, the analyst has to decide how
to summarize data and predict the type of
queries that might be asked (details are
usually lost during summarization). The
warehouse is then designed both logically
and physically
• Note: the analyst must know a lot about
the business.
• Because of its size,
expensive
a warehouse is
64
Data Mining
• Data mining can identify patterns that
human is unable to detect The data
mining algorithms search data
warehouses for patterns. It is known
by another name Knowledge Data
Discovery (KDD).
65
• It is the process of discovering patterns and
trends which are meaningful by sifting through
large amounts of data stored in repositories,
using pattern recognition technologies as well
as statistical and mathematical techniques.
 These patterns and trends extracted as
information can then be applied to prediction
or classification models by identifying relations
within the data records or between databases.
 They can then guide decision making and
forecast the effects of those decisions. E.g.
predicting buying habits of customers based
66
on past patterns
Software for Data Mining
Known decision aids include:
• Statistical analysis software
• Neural networks
• Fuzzy networks
• Intelligent argents
• Logic and data visualization
67
• Patterns that decision makers try to
identify include:
• Associations:
Patterns that
occur
together at the same time. For example, a
person who buys milk usually buys bread
• Sequences: Actions that take place over
a period of time, e.g. if a family buys a
house this year, they will most likely buy a
fridge and cooker next year.
68
• Clustering: A pattern that develops
among a group of people. e.g.
Customers who live in a particular area
tend to buy a particular product
• Trends: Patterns that are noticed over
a period of time. E.g. Customers may
move from buying processed food to
natural foods (herbal products) or
African attires
69
Classification
• Classification: Given a set of items that have several
classes, and given the past instances (training
instances) with their associated class, Classification is
the process of predicting the class of a new item.
• Therefore to classify the new item and identify to
which class it belongs
• Example: A bank wants to classify its Home Loan
Customers into groups according to their response to
bank advertisements. The bank might use the
classifications “Responds Rarely, Responds
Sometimes, Responds Frequently”.
• The bank will then attempt to find rules about the
customers that respond Frequently and Sometimes.
• The rules could be used to predict needs of potential
customers.
Technique for Classification
 Decision-Tree Classifiers
Job
Engineer
Carpenter
Income
<30K
Bad
>50K
Good
Income
<40K
Bad
>90K
Good
Doctor
Income
>100K
<50K
Bad
Predicting credit risk of a person with the jobs specified.
Good
• Data mining also targets customers.
Assuming that past behavior is a good
predictor for the future. A large amount
of data is captured from a particular
person and companies share this
information. Credit companies have
taken advantage of this where they
target customers.
72
Uses of Data Mining
 Sales/ Marketing
 Diversify target market
 Identify clients needs to increase response rates
 Risk Assessment
 Identify Customers that pose high credit risk
 Fraud Detection
 Identify people misusing the system. E.g. People who have
two Social Security Numbers
 Customer Care
 Identify customers likely to change providers
 Identify customer needs
Problems with Data Mining
• Cost could be too high to justify data mining
• Coordination of several customers
departments could be problematic
or
• Customers could resent their privacy being
invaded and reject the offers that are
coming their way
• Erroneous profiles could be made of
people, stored, and not deleted. The police
could act on these profiles without meeting
74
the people
Ethical Issues
• Analysts should take the responsibilities
for considering the ethical aspects of any
data mining projects that are proposed.
• Length of time the material is kept
• Privacy safe guards should be installed
• Confidentially of the material
• The uses to which inferences are put
should be asked and considered with the
client.
75
• The opportunities for abuse are
apparent and must be guarded
against. For consumers, data mining
is a push technology and if
consumers do not want to be pushed,
data mining efforts could back fire.
76
Data Warehousing
Operational
databases
accountin
g
databases
Intern
al Data
source
s
Customer
databases
Extract
and
transform
Manufacturi
ng
databases
Extract
Filter
Transform
Classify
Aggregate
Summariz
e
Historical
databases
External
Data
sources
Data extraction
and
transformation
External
databases
Data
warehouse
s
Custom
er Data
Product
data
Sales
data
Integrated
Subject
oriented
Timevariant
Nonvolatile
Data
Data access
and analysis
Business
intelligence
OLAP
Data
Mining
Querying
Reportin
g
77
78