Download Databases - School of Engineering

Document related concepts

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Ingres (database) wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
History of Computing - Database
CSE
3002
Prof. Steven A. Demurjian
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Way, Box U-255
Storrs, CT 06269-3255
[email protected]
http://www.engr.uconn.edu/~steve
(860) 486–4818 (Office)
(860) 486-3719 (CSE Office)
HoCDB.1
Overview

Review the History of Databases

CSE 
3002

History of Databases
 Prof. Ying Ding, Indiana University
 info.slis.indiana.edu/~dingying/Teaching/S511/new/lectur
es/DatabaseOverview.ppt
Introduction to Databases





Steven Demurjian
CSE4701 Class Notes
Historical Perspective
40 Years of VLDB (Very Large Database)

Major database conference

http://vldb.org/2015/wp-content/uploads/2015/09/40years.pdf
Ethic in Databases



https://en.wikipedia.org/wiki/Database
Professional, Legal, and Ethical Issues in Data Management
http://www.cs.utexas.edu/~mitra/csSpring2011/cs327/lectures/New_S
lides/ch13.ppt
Databases Steve’s Done
HoCDB.2
Database Overview
Prof Ying Ding
School of Informatics and Computing
Indiana Univesrity
info.slis.indiana.edu/~dingying/Teaching/S51
1/new/lectures/DatabaseOverview.ppt
S511 Session 2, IU-SLIS
3
Database Management System
- manages interaction between end users and database
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
4
Database System Environment
 Hardware
 Software
- OS
- DBMS
- Applications
 People
 Procedures
 Data
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
5
Evolution of Data Models
• Timeline
1960s
1970s
1980s
1990s
2000+
File-based
Hierarchical
Object-oriented
Network
Relational
Web-based
Entity-Relationship
S511 Session 2, IU-SLIS
6
Database: Historical Roots
• Manual File System
– to keep track of data
– used tagged file folders in a filing cabinet
– organized according to expected use
• e.g. file per customer
– easy to create, but hard to
• locate data
• aggregate/summarize data
• Computerized File System
– to accommodate the data growth and information need
– manual file system structures were duplicated in the computer
– Data Processing (DP) specialists wrote customized programs to
• write, delete, update data (i.e. management)
• extract and present data in various formats (i.e. report)
S511 Session 2, IU-SLIS
7
File System: Example
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
8
File System: Weakness
• Weakness
– “Islands of data” in scattered file systems.
• Problems
– Duplication
• same data may be stored in multiple files
– Inconsistency
• same data may be stored by different names in different format
– Rigidity
• requires customized programming to implement any changes
• cannot do ad-hoc queries
• Implications
– Waste of space
– Data inaccuracies
– High overhead of data manipulation and maintenance
S511 Session 2, IU-SLIS
9
File System: Problem Case
CUSTOMER file
AGENT file
A_Name (15 char)
A_Name (20 char)
Carol Johnson
Carol T. Johnson
SALES file
AGENT (20 char)
Carol J. Smith
- inconsistent field name, field size
- inconsistent data values
- data duplication
S511 Session 2, IU-SLIS
10
Database System vs. File System
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
11
Hierarchical Database
• Background
– Developed to manage large amount of data for complex manufacturing
projects
– e.g., Information Management System (IMS)
• IBM-Rockwell joint venture
• clustered related data together
• hierarchically associated data clusters using pointers
• Hierarchical Database Model
– Assumes data relationships are hierarchical
• One-to-Many (1:M) relationships
– Each parent can have many children
– Each child has only one parent
– Logically represented by an upside down tree
S511 Session 2, IU-SLIS
12
Hierarchical Database: Example
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
13
Hierarchical Database Definition
CSE
4701
DBD
SEGM
FIELD
FIELD
FIELD
SEGM
FIELD
FIELD
SEGM
FIELD
FIELD
FIELD
SEGM
FIELD
FIELD
FIELD
SEGM
FIELD
FIELD
FIELD
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
@NAME
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
University
Courses
(Course#, SEQ), TYPE = CHAR, BYTES = 6
Title, TYPE = CHAR, BYTES = 20
Descrip, TYPE = CHAR, BYTES = 100
Prereq, PARENT = Courses
(PCourse#, SEQ), TYPE = CHAR, BYTES = 6
Title, TYPE = CHAR, BYTES = 20
Formats, PARENT = Courses
(Section#, SEQ, M), TYPE = INT, BYTES = 2
Quarter, TYPE = CHAR, BYTES = 10
Campus, TYPE = CHAR, BYTES = 15
Faculty, PARENT = Formats
(SSN, SEQ), TYPE = CHAR, BYTES = 9
Name, TYPE = CHAR, BYTES = 30
Ophone, TYPE = CHAR, BYTES = 7
Student, PARENT = Formats
(SSN, SEQ), TYPE = CHAR, BYTES = 9
Name, TYPE = CHAR, BYTES = 30
Gpa, TYPE = FLOAT, BYTES = 4
Chaps1&2-14
Hierarchical Graphical Representation
CSE
4701
Courses
Course#*
1
Title
1
n
n
Prereq
PCourse#*
Descrip
Title
n
Student
SSN#*
Name
Formats
Section#*
1
1
Quarter Campus
1
GPA
Faculty
SSN#*
Name
Phone
Chaps1&2-15
Hierarchical Database: Pros & Cons
• Advantages
– Conceptual simplicity
• groups of data could be related to each other
• related data could be viewed together
– Centralization of data
• reduced redundancy and promoted consistency
• Disadvantages
– Limited representation of data relationships
• did not allow Many-to-Many (M:N) relations
– Complex implementation
• required in-depth knowledge of physical data storage
– Structural Dependence
• data access requires physical storage path
– Lack of Standards
• limited portability
S511 Session 2, IU-SLIS
16
Network Database
• Objectives
– Represent more complex data relationships
– Improve database performance
– Impose a database standard
• Network Database Model
– Similar to Hierarchical Model
• Records linked by pointers
– Composed of sets
• Each set consists of owner (parent) and member (child)
– Many-to-Many (M:N) relationships representation
• Each owner can have multiple members (1:M)
• A member may have several owners
S511 Session 2, IU-SLIS
17
Network Database: Example
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
18
Network Database Definition
SCHEMA NAME IS University.
CSE
4701
RECORD NAME IS Student;
DUPLICATES ARE NOT
ALLOWED FOR SSN.
Name
; CHARACTER 30.
SSN
; CHARACTER 9.
Gpa
; FLOAT.
RECORD NAME IS Faculty;
DUPLICATES ARE NOT
ALLOWED FOR SSN.
Name
; CHARACTER 30.
SSN
; CHARACTER 9.
Ophone ; CHARACTER 7.
RECORD NAME IS Courses;
DUPLICATES ARE NOT
ALLOWED FOR Course#.
Course# ; CHARACTER 6.
Title
; CHARACTER 20.
Descrip ; CHARACTER 100.
RECORD NAME IS Formats;
DUPLICATES ARE NOT
ALLOWED FOR Section#.
Section#; FIXED 3.
Quarter ; CHARACTER 10.
Campus ; CHARACTER 15.
RECORD NAME IS Prereq;
PCourse#; CHARACTER 6.
Title
; CHARACTER 20.
SET NAME IS Requirements;
OWNER IS Courses;
MEMBER IS Prereq;
SET NAME IS COfferings;
OWNER IS Courses;
MEMBER IS Formats;
SET NAME IS QtrOfferings;
OWNER IS Formats;
MEMBER IS Courses;
SET NAME IS Takes;
OWNER IS Formats;
MEMBER IS Student;
SET NAME IS Teaches;
OWNER IS Formats;
MEMBER IS Faculty;
Chaps1&2-19
Network Graphical Representation
CSE
4701
Courses
Course#*
Title
Requirements
Prereq
PCourse#*
Descrip
COfferings
Title
Takes
Student
SSN#*
Name
GPA
QtrOfferings
Formats
Section#*
Quarter Campus
Teaches
Faculty
SSN#*
Name
Phone
Chaps1&2-20
Network Database: Pros & Cons
• Advantages
– More data relationship types
– More efficient and flexible data access
• “network” vs. “tree” path traversal
– Conformance to standards
• enhanced database administration and portability
• Disadvantages
– System complexity
• require familiarity with the internal structure for data access
– Lack of structural independence
• small structural changes require significant program changes
S511 Session 2, IU-SLIS
21
Relational Database
• Problems with legacy database systems
– Required excessive effort to maintain
• Data manipulation (programs) too dependent on physical file structure
– Hard to manipulate by end-users
• No capacity for ad-hoc query (must rely on DB programmers).
• Evolution in Data Organization
– E. F. Codd’s Relational Model proposal
• Separated the notion of physical representation (machine-view)
from logical representation (human-view)
• Considered ingenious but computationally impractical in 1970
– Relational Database Model
• Dominant database model of today
• Eliminated pointers and used tables to represent data
• Tables
– flexible logical structure for data representation
– a series of row/column intersections
– related by sharing common entity characteristic(s)
S511 Session 2, IU-SLIS
22
Relational Database: Example

Provides a logical “human-level” view of the data and associations
among groups of data (i.e., tables)
Customer_ID Customer_Account Agent_ID
1224
4556
1225
4558
Agent_ID
Customer_ID Last_Name
1224 Vira
1225 Davies
23
25
Last_Name
23 Sturm
25 Long
First_Name
Dyne
Tricia
First_Name
David
Kyle
Phone
334-5678
556-3421
Phone
Account_Balance
678-9987
1223.95
556-3342
234.25
S511 Session 2, IU-SLIS
23
Relational Tables - Rows/Columns/Tuples
CSE
4701
Chaps1&2-24
Relational Database Definition
CSE
4701
CREATE TABLE Student:
Name(CHAR(30)), SSN(CHAR(9)), Gpa(FLOAT(2))
CREATE TABLE Faculty:
Name(CHAR(30)), SSN(CHAR(9)), Ophone(CHAR(7))
CREATE TABLE Courses:
Course#(CHAR(6)), Title(CHAR(20)), Descrip(CHAR(100)),
PCourse#(CHAR(6))
CREATE TABLE Formats:
Section#(INTEGER(3)), Quarter(CHAR(10)), Campus(CHAR(15))
CREATE TABLE TakeorTeach:
SSN(CHAR(9)), Course#(CHAR(6)), Section#(INTEGER(3))
CREATE TABLE COfferings:
Course#(CHAR(6)), Section#(INTEGER(3))
Student(Name*, SSN, Gpa)
Faculty(Name*, SSN, Ophone)
Courses(Course#*, Title, Descrip, PCourse#*)
Formats(Section#*, Quarter, Campus)
TakeorTeach(SSN, Course#, Section#)
COfferings(Course#, Section#)
Chaps1&2-25
Relational Views

CSE
4701
Two Views Derived From Prior Tables
 Student Transcript View
 Course Prerequisite View
Chaps1&2-26
Relational Database: Pros & Cons
• Advantages
– Structural independence
• Separation of database design and physical data storage/access
• Easier database design, implementation, management, and use
– Ad hoc query capability with Structured Query Language (SQL)
• SQL translates user queries to codes
• Disadvantages
– Substantial hardware and system software overhead
• more complex system
– Poor design and implementation is made easy
• ease-of-use allows careless use of RDBMS
S511 Session 2, IU-SLIS
27
Entity Relationship Model
• Peter Chen’s Landmark Paper in 1976
–
“The Relationship Model: Toward a Unified View of Data”
– Graphical representation of entities and their relationships
• Entity Relationship (ER) Model
– Based on Entity, Attributes & Relationships
• Entity is a thing about which data are to be collected and stored
– e.g. EMPLOYEE
• Attributes are characteristics of the entity
– e.g. SSN, last name, first name
• Relationships describe an associations between entities
– i.e. 1:M, M:N, 1:1
– Complements the relational data model concepts
• Helps to visualize structure and content of data groups
– entity is mapped to a relational table
• Tool for conceptual data modeling (higher level representation)
– Represented in an Entity Relationship Diagram (ERD)
• Formalizes a way to describe relationships between groups of data
S511 Session 2, IU-SLIS
28
E-R Diagram: Chen Model
• Entity
–
represented by a rectangle with its name
in capital letters.
• Relationships
–
represented by an active or passive verb
inside the diamond that connects the
related entities.
• Connectivities
–
–
i.e., types of relationship
written next to each entity box.
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IUSLIS
29
E-R Diagram: Crow’s Foot Model
• Entity
– represented by a rectangle with its
name in capital letters.
• Relationships
– represented by an active or passive
verb that connects the related
entities.
• Connectivities
– indicated by symbols next to
entities.
• 2 vertical lines for 1
• “crow’s foot” for M
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IUSLIS
30
E-R Model: Pros & Cons
• Advantages
– Exceptional conceptual simplicity
• easily viewed and understood representation of database
• facilitates database design and management
– Integration with the relational database model
• enables better database design via conceptual modeling
• Disadvantages
– Incomplete model on its own
• Limited representational power
– cannot model data constraints not tied to entity relationships
» e.g. attribute constraints
– cannot represent relationships between attributes within entities
• No data manipulation language (e.g. SQL)
– Loss of information content
• Hard to include attributes in ERD
S511 Session 2, IU-SLIS
31
Object-Oriented Database
•
Semantic Data Model (SDM)
–
Modeled both data and their relationships in a single structure (object)
•
•
Object-oriented concepts became popular in 1990s
–
–
•
Developed by Hammer & McLeod in 1981
Modularity facilitated program reuse and construction of complex structures
Ability to handle complex data types (e.g. multimedia data)
Object-Oriented Database Model (OODBM)
–
–
Maintains the advantages of the ER model but adds more features
Object = entity + relationships (between & within entity)
•
•
consists of attributes & methods
– attributes describe properties of an object
– methods are all relevant operations that can be performed on an object
self-contained abstraction of real-world entity
– Class = collection of similar objects with shared attributes and methods
•
•
e.g. EMPLOYEE class = (employ1 object, employ2 object, …)
organized in a class hierarchy
–
e.g. PERSON > EMPLOYEE, CUSTOMER
– Incorporates the notion of inheritance
•
attributes and methods of a class are inherited by its descendent classes
S511 Session 2, IU-SLIS
32
Object-Oriented Database Declarations

CSE
4701
Specifying the Object Types Employee, Date, and
Department Using Type Constructors
Chaps1&2-33
Object-Oriented Database Declarations

CSE
4701
Adding Operations to Definitions of Employee and
Department:
Chaps1&2-34
Object Oriented DB Vendors/Products

CSE
4701








Cache (http://www.intersystems.com)
CommonSQL / UncommonSQL
db4o (DeeBeeFourOh) http://www.db4o.com (open source)
GOODS (http://www.garret.ru/~knizhnik/goods.html)
Objectivity/DB (http://www.objectivity.com/objectdatabase.shtml)
ObjectDesignInc
OzoneDb (http://ozone-db.org)
PLOB! (acronym for Persistent Lisp OBjects;
see http://plob.sourceforge.net/ )
XL2 (http://www.xl2.net)
Chaps1&2-35
OO Database Model vs. E-R Model
OODBM:
- can accommodate relationships within a object
- objects to be used as building blocks for autonomous structures
Database Systems: Design, Implementation, & Management: Rob & Coronel
S511 Session 2, IU-SLIS
36
Object-Oriented Database: Pros & Cons
• Advantages
– Semantic representation of data
• fuller and more meaningful description of data via object
– Modularity, reusability, inheritance
– Ability to handle
• complex data
• sophisticated information requirements
• Disadvantages
– Lack of standards
• no standard data access method
– Complex navigational data access
• class hierarchy traversal
– Steep learning curve
• difficult to design and implement properly
– More system-oriented than user-centered
– High system overhead
• slow transactions
S511 Session 2, IU-SLIS
37
Web Database
• Internet is emerging as a prime business tool
– Shift away from models (e.g. relational vs. O-O)
– Emphasis on interfacing with the Internet
• Characteristics of “Internet age” databases
–
–
–
–
Flexible, efficient, and secure Internet access
Support for complex data types & relationships
Seamless interfaces with multiple data sources and structures
Ease of use for end-user, database architect, and database administrator
• Simplicity of conceptual database model
• Many database design, implementation, and application development tools
• Powerful DBMS GUI
S511 Session 2, IU-SLIS
38
NoSQL
• NoSql is not literally “no sql”. They are non relational data stores.
• Next Generation Databases being non-relational, distributed, open-source
and horizontally scalable have become a favorite back end storage for
cloud community . High performance is the driving force.
NoSQL
• Pros
– open source (Cassandra, CouchDB,
Hbase, MongoDB, Redis)
– Elastic scaling
– Key-value pairs, easy to use
– Useful for statistical and real-time
analysis of growing lists of elements
(tweets, posts, comments)
• Cons
– Security (No ACID: ACID (Atomicity,
Consistency, Isolation, Durability)
– No indexing support
– Immature
– Absence of standardization
S511 Session 2, IU-SLIS
40
Introduction to Databases
CSE
4701
Prof. Steven A. Demurjian, Sr.
Computer Science & Engineering Department
The University of Connecticut
191 Auditorium Road, Box U-155
Storrs, CT 06269-3155
[email protected]
http://www.engr.uconn.edu/~steve
(860) 486 - 4818


The majority of these slides are being used with the permission of Dr. Ling
Lui, Associate Professor, College of Computing, Georgia Tech.
Some slides have been adapted from the AWL web site for the textbook
Chaps1&2-41
The Role of DBMS in Computing
CSE
4701
Chaps1&2-42
What is a Database System?
Web or
PC app
Mobile
app
REST API or
Web Services
CSE
4701
Chaps1&2-43
What is the Role of Database System?

CSE
4701





Pervasive in Almost All Applications and Every
Application Domain
Norm rather than Exception
Difficult to Imagine Application without Persistent
Store
Remember – Database is a Repository at Minimum
Database Management for Mobile Computing
Myriad of Architectures and Approaches:
From: http://java.sun.com/javaone/javaone98/sessions/T400/index.html
Chaps1&2-44
Database Concepts - Summary

CSE
4701
Schema vs. Data
 Database-Structured Collection of Data Describing
 Objects of Universe of Discourse being Modeling.
 A Database Consists of Schema and Data
 Schema: Describes the Intension (Type) of Objects
 Entity/Table/Relation: A portion of a Schema
 Data: Describes the Extension (Tuples) of Objects


Data Definition vs. Data Manipulation Languages
What is Metadata?
DML
DDL
define
Schema
(metadata)
Table
Data
Operate on data
according to the
schema Chaps1&2-45
What are Programming Analogs?

CSE
4701





Schema is Equivalent to a Class Library
 All of Different Types of Information
Entity/Table/Relation
 Data Attributes and Types
 Akin to a Class
Tuples Akin to Creating an Instance from Class
Key Difference - Entity/Table is Two Abstractions
 Structure like a Class
 Also Represents a Set of all Tuples
Meta-Data
 Akin to Java Reflection and Introspection
 Access to the Runtime Features of Objects
Let’s See Example
Chaps1&2-46
Classes for a Medical Application

CSE
4701

Data Types, Methods
Patient Inherits from Person and Creates a Single
Instance “John”
Substance
Id:Integer
name: String
statusCode: String
effectiveTime:Dat
e
repeatNumber:
Int
takesPrescribedMedication
Observation
Person
Id:Integer
statusCode: String
name: String
value: String
Id: Integer
name: name
address:
Address
bday: String
tel: String
Name
family-name: String
given-name: String
prefix: String
suffix: String
Address
hasMedicalObservations
Patient
Ethnicity: String
prefLang: String
race:String
Email: String
gender: String
getAllergies()
get_clinical_notes()
get_demographics()
get_medications()
get_immunizations()
Provider
deaNumber:
String
npiNumber:String
Ethnicity: String
race:String
Email: String
gender: String
street: String
locality: String
region: String
country: String
Chaps1&2-47
Database Entity Relationship Diagram

CSE
4701

Patient Entity represents Attributes of a set of Patients
Defines Type and the Collection
 Patient Entity is a Database Table with Structure
Like a Class
 However, Contains many Instances, e.g., Patients
“John”, “George”, “Jane”, etc.
statusCode
value
Ethnicity
id
effectiveTime
id
prefLang
Observation
race
Patient
address
Substance
name
id
name
tel
effectiveTime
bday
statusCode
repeatNumber
Chaps1&2-48
Database Tables
CSE
4701
Patient(pid, name, address, tel, bday, etc.)
Substance(sid, name, statusCode, etc.)
Observation(oid, value, statusCode, etc.)
PatientObservations(pid, oid)
PatientMedications(pid, sid)
Chaps1&2-49
An Example Database System

An Integrated Telephone Customer Information
System (Circa early 1980s)

What are Examples Today? Has Scale Increased?
CSE
4701
Chaps1&2-50
The OpenMRS Sample Database Schema

CSE
4701
99 Tables, Sample Database with 5000 patients and
500,000 observations
Chaps1&2-51
What are World Largest DBs? (2010)*
CSE
4701
*http://www.comparebusinessproducts.com/fyi/10-largest-databases-in-the-world
Chaps1&2-52
Available Database Systems/Platforms

CSE
4701







Ranging from Relational to Object-Oriented to RealTime to Embedded to Mobile
Long History of Database Systems
First Database Journal – 1976
 ACM Transactions on Database Systems
 Founded by David K. Hsiao (my doctoral advisor)
1st Issue – P. Chen on the Entity Relationship Model
2nd Issue
 System R – IBM’s First Mainframe DBMS
 Abstraction by S. Navarthe (our textbook author)
3rd Issue – The INGRES DBMS – DEC (Berkeley)
4th Issue – Functional Dependencies/Normal Forms
6th Issue – Abstraction and Generalization
Chaps1&2-53
Available Database Systems

CSE
4701




Microsoft SQL Server
IBM DB2
Oracle
MySQL
Emerging Mobile Platforms
 Berkeley DB
 Couchbase Lite
 LevelDB
 SQLite
 UnQLite
Chaps1&2-54
Databases for Mobile Platforms

CSE
4701


A wide Range of Emerging Products
 SQL Anywhere (Sybase)
 DB2 Everyplace (IBM)
 SQL Server Compact/Express (Microsoft)
 Oracle Lite
 MySQLMobile, Android PHP/MySQL Mobile
Features
 Embedded in the Mobile Device
 Offers DB Query Capabilities
 Synchronizes with Server Side
 Allows Local Storage on Mobile Device
Potential Topic for Project this Semester!
Chaps1&2-55
Databases for Mobile Platforms

CSE
4701




Oracle Berkeley DB
 Via SQL, Java Objects, or XML Documents
Couchbase Lite
 NoSQL – storing/retrieving data in format that is
not relational/SQL-based
LevelDB (written at Google)
 Open Source Library for Key/Value Pair Storage
and Retrieval
SQLite
 Manage in Memory and on Disk
UnQLite
 NoSQL Counterpart of SQLite
Chaps1&2-56
Database Market Share 1995
CSE
4701

Today’s market Share – the Top 3:
 Oracle:
44.4%
 IBM:
21.2%
 Microsoft: 18.6%

http://datadoghouse.typepad.com/data_doghouse/2007/05/database_market.html

What will be the Role of Open Source?
 MySQL (MS) and Innobase (Oracle on top of MySQL)
 Evans Data Corporation (http://www.evansdata.com/)

http://news.taume.com/Technology/Tech-Deals/Report-MySQL-Gains-25-percent-Market-Share-729
Chaps1&2-57
Market: Prerelational vs. Relational 1999

CSE
4701


Prerelational Revenue Shrinking about 9% Per
Year - Currently 1.8 Billion/year
Relational Revenue Growing about 30% Year Currently 11.5 Billion/year
Object-oriented Revenue about 150 Million/year
Chaps1&2-58
Database Market Share 2007
CSE
4701
Chaps1&2-59
Database Market Share in 2013
CSE
4701
Chaps1&2-60
Market Share 2014
CSE
4701
Chaps1&2-61
Professional, Legal, and Ethical Issues in Data
Management
Transparencies
http://www.cs.utexas.edu/~mitra/csSpring2011/cs
327/lectures/New_Slides/ch13.ppt
83
Objectives
 How to define ethical and legal issues in information
technology.
 How to distinguish between legal and ethical issues
and situations data/database administrators face.
 How new regulations are placing additional
requirements and responsibilities on data/database
administrators.
©Pearson Education 2009
84
Objectives
 How legislation such as the Sarbanes-Oxley Act and
the BASEL II accords impact data/database
administration functions.
 Best practices for preparing for and supporting
auditing and compliance functions.
 Intellectual property (IP) issues related to IT and
data/database administration.
©Pearson Education 2009
85
Legal and ethical issues and
database systems
 Organizations increasingly find themselves having to
answer tough questions about the conduct and
character of their employees and the manner in which
their activities are carried out.
 At the same time, we need to develop knowledge of
what constitutes professional and non-professional
behavior.
©Pearson Education 2009
86
Ethics in the context of information
technology
 Ethics - A set of principles of right conduct or a
theory or a system of moral values.
 Can consider ethical behavior as “doing what is
right” according to the standards of society. This,
of course, begs the question “of whose society” as
what might be considered ethical behavior in one
culture (country, religion, and ethnicity) might
not be so in another.
©Pearson Education 2009
87
Difference between ethical and
legal behavior
 Laws can be considered as simply enforcing certain
ethical behaviors. This leads to two familiar ideas:
what is ethical is legal and what is unethical is illegal.
 Consider –
 Is all unethical behavior illegal?
 Is all ethical behavior legal?
 Ethical codes of practice help determine whether
specific laws should be introduced. Ethics fills the gap
between the time when technology creates new
problems and the time when laws are introduced.
©Pearson Education 2009
88
Ethical behavior in information
technology
 A survey conducted by TechRepublic, an IT
oriented web portal maintained by CNET
Networks (techrepublic.com), reported that 57% of
the IT workers polled indicated they had been
asked to do something ‘unethical’ by their
supervisors (Thornberry, 2002).
 Examples include installing unlicensed software,
accessing personal information, and divulging
trade secrets.

©Pearson Education 2009
89
Legislation and its impact on the IT
function
 Securities and Exchange Commission (SEC)
Regulation National Market System (NMS)
 The Sarbanes-Oxley Act, COBIT, and COSO
 The Health Insurance Portability and Accountability
Act
 The European Union (EU) Directive on Data
Protection of 1995
 The United Kingdom’s Data Protection Act of 1998
 International banking – BASEL II Accords
©Pearson Education 2009
90
Securities and Exchange Commission (SEC)
Regulation National Market System (NMS)
 Concerns activities that appear ethical but are in fact
illegal.
 Presents an ‘order protection rule’ under which an
activity that is acceptable to one facet of the
investment community was deemed illegal under the
new regulation.
 Result of this regulation is that financial services firms
are now required to collect market data so that they
can demonstrate that a better price was indeed not
available at the time the trade was executed.
©Pearson Education 2009
91
The Sarbanes-Oxley Act, COBIT, and COSO
 Result of major financial frauds allegedly carried out
within companies such as Enron, WorldCom,
Parmalat, and others.
 US and European governments presented legislation
to tighten requirements on how companies form their
board of directors, interact with auditors, and report
their financial statements.
©Pearson Education 2009
92
The Sarbanes-Oxley Act, COBIT, and COSO
 Requires security and auditing of financial data and
has implications on data collection, processing,
security and reporting both internally and externally
to the organization.
 Concerns establishment of internal controls - A set of
rules an organization adopts to ensure policies and
procedures are not violated, data is properly secured
and reliable, and operations can be carried out
efficiently.
©Pearson Education 2009
93
The Health Insurance Portability and
Accountability Act
 Administered by Health and Human Services in US and
affects providers of healthcare and health insurance.
 Five main provisions of Act includes:





Privacy of patient information
Standardizing electronic health/medical records and
transactions between health care organizations
Establishing a nationally recognized identifier for employees to
be used by all employee health plans
Standards for the security of patient data and transactions
involving this data
Need for a nationally recognized identifier for healthcare
organizations and individual providers
©Pearson Education 2009
94
The European Union (EU) Directive on Data
Protection of 1995
 The official title of the EU’s data protection
directive is: ‘Directive 95/46/EC of the European
Parliament and of the Council of 24 October 1995
on the protection of individuals with regard to the
processing of personal data and on the free
movement of such data’ (OJEC 1995).
©Pearson Education 2009
95
The United Kingdom’s Data Protection Act
of 1998
 Presents eight data protection principles -
©Pearson Education 2009
96
International banking – BASEL II Accords
 Presents policies and framework that must be
enacted into law in each country and monitored by
national regulators.
 Framework presents three main ‘pillars’ 


Minimum capital requirements
Supervisory review process
Market discipline
©Pearson Education 2009
97
Establishing a culture of legal and
ethical data stewardship
 Senior managers such as board members, presidents,
Chief Information Officers (CIOs), and data
administrators are increasingly finding themselves
liable for any violations of these laws.
 Steps to consider include 

Develop an organization-wide policy for legal and ethical
behavior.
Professional organizations and codes of ethics.
©Pearson Education 2009
98
Intellectual Property (IP)
 Covers inventions, inventive ideas, designs, patents
and patent applications, discoveries, improvements,
trademarks, designs and design rights (registered
and unregistered), written work (including
computer software) and know-how devised,
developed, or written by an individual or set of
individuals.
 Two types of IP:


Background IP – IP that exists before an activity takes
place.
Foreground IP - IP that is generated during an activity.
©Pearson Education 2009
99
Intellectual Property (IP)
 Patents - provides an exclusive (legal) right for a set
period of time to make, use, sell or import an invention.
 Patents are granted by a government when an individual
or organization can demonstrate:
 the invention is new;
 the invention is in some way useful;
 the invention involves an inventive step.
©Pearson Education 2009
100
Intellectual Property (IP)
 Copyright - provides an exclusive (legal) right for a set
period of time to reproduce and distribute a literary,
musical, audiovisual, or other ‘work’ of authorship.
 Trademark - provides an exclusive (legal) right to use a
word, symbol, image, sound, or some other distinction
element that identifies the source of origin in
connection with certain goods or services another
make, use, sell, or import an invention.
©Pearson Education 2009
101
Database Steve’s Done

CSE
3002

Naval Postgraduate School, 1983-1987
 The Implementation of a Multibackend Database
System (MDBS): An Exercise in Database
Software Engineering
 Multiple Process Parallel Database System
Ohio State/UConn, 1982-1988
 The Multilingual Databases System
 Supports Multiple Data Models





Attribute-Based
Relation
Network
Hierarchical
Functional
HoCDB.102
What is MBDS?

CSE
3002 

MBDS is Multi-Process, Multi-Computer, Parallel
Database System
MBDS Composed of …
 Host for Issuing User Requests
 Controller to Interact with Host (and User)
 One or More Backend Database Processors
Goals of MBDS
 Suppose Request Takes 4 Minutes with One
Backend
 Improve Response Time by Increasing Backends
 Two Backends - Request 2+ Minutes
 Four Backends - Request 1+ Minutes
HoCDB.103
What is MBDS Architecture?
Database Blocks are Distributed Across All Backends
CSE
3002 Backend (BE) DB
Processors are Replicated
Database Controller
Sends Same Query
in Parallel to all BEs
Host
User
Database
Controller
Backend
Database
Processor
Backend
Database
Processor
BEs work in Parallel on
Each Query and Communicate for Join
Results are Sent to and Collected by
the DB Controller - then to the User
Backend
Database
Processor
HoCDB.104
Approach Distributes Data Across
Backends

CSE 
3002


Suppose System has 10
Backends
Consider a Number of Tables
 Inventory
 Customers
 Employees
 …
What Happens if Place
 One Table/Backend?
What Happens if you Distribute
…
 Table Across 10 Backends?
Backend
Database
Processor 2
Backend
Database
Processor 1
Backend
Database
Processor 10
HoCDB.105
What are MBDS Processes?
CSE
3002
Database
Controller
Request
Preparation
Post
Processing
Put Msg.
Get Msg.
Get Msg.
Put Msg.
Directory
Management
Record
Processing
Concurrency
Control
Disk I/O
Backend
Database
Processor
HoCDB.106
What are MBDS Messages?
CSE
3002
No.
1
2
3
4
6
12
15
16
21
22
23
Type
New Request
Results of Request
Number of Reqs in Transaction
Aggregate Operators (Sum, etc.)
Parsed Request to Backends
Backend Aggregate Operator Results
Ids for Accessing Database Indexes
Request and Disk Addresses
Ids for Accessing Database Records
Locks Obtained: Okay to Execute
Request ID of Finished Request
SRC
Host
PoPr
ReqP
ReqP
ReqP
RecP
DM
DM
DM
CC
RecP
DST
ReqP
Host
PoPr
PoPr
DM
PoPr
DMs
RecP
CC
RecP
CC
HoCDB.107
Sample Processing of Retrieve Request
CSE
3002
F15 From
Other
Backend
A1
Request
Preparation
D6
Put Msg.
B3
C4
K12
Post
Processing
K12
Get Msg.
E15 To Backend(s)
Get Msg.
Put Msg.
D6,F15
E15
Directory
Management
G21
K12
H22
Record
Processing
I16
Concurrency
Control
J23
Disk I/O
HoCDB.108
What are Synchronization Issues in MBDS?

CSE
3002
Coordination of Synchronous Behavior …
 Within Controller and Backend to Allow
Multiple Active Requests within
 Each Process
 Requests at Different Stages in Different Processes

Between Controller and Backends to Allow
 A Request to be Processed by All Backends
 A Request to be Processed by One Backend

Among Multiple Backends to Allow a Backend
 to Synchronize its Work on one Request with Other
Backends
 to Forward Results to Another Backend
HoCDB.109
Multi-Lingual Database System
CSE
3002
HoCDB.110
Different Data Models
CSE
3002
network
hierarchical
relational
Attribute-based
HoCDB.111
Attribute-based
CSE
3002
HoCDB.112
Relational
CSE
3002
HoCDB.113
Hierarchical
CSE
3002
HoCDB.114
Network
CSE
3002
HoCDB.115
Network
CSE
3002
HoCDB.116
Functional
CSE
3002
HoCDB.117
CSE
3002
HoCDB.118